Skip to main content

Research Data Management: Data Files

Organising Files

It is very easy for research data to become disorganised due to improper management of files and documentation. Therefore it is essential that the correct file structures and names are allocated to save time, minimise the risk of loss, enable data to be reused and also allow for effective citation of the data when the data is eventually reused.

Creating file names
If there is already a conventional way of naming files within the institution, then continue to abide by this. However if there is no specific way of naming, then construct your own set of instructions on how files should be named to ensure the best management practice. This should be consistent with all files to avoid confusion and to increase organisation. These names can include names, project numbers, experiment numbers, version numbers, dates etc. Some tips include:

  • Avoiding special characters or spacing in file names
  • Using capital letters and underscores insetad of full stops, commas etc.
  • Use the ISO8601 date format YYYYMMDD

Version control
It needs to be clear what stage the research is at in the lifecycle; therefore it is important to maintain version control throughout the life of the research. Various files are created as the research goes on, for example if files are updated or amended, new versions are created so changes can be identified. Final versions of files can have the ‘Read-Only’ function applied to confirm no changes can be made. For more detailed information on this please see:
Persistent identifiers
A digital object identifier also known as a DOI is used to identify digital content in an online environment, much like an ISBN, a DOI is a unique ambiguous identifier to allow access and identification to certain content. For the benefits of persistent identifiers please see:

File Formats

Choosing file formats

It is important you consider various formats when dealing with data files due to the compatibility of certain files when they are read by different software applications.

It is advised that you should use the features listed below to enable your research to be most accessible now and in the long term:

  • Common/popular usage by the relevant research community

  • Standard representation (ASCII, Unicode)

  • Unencrypted

  • Uncompressed

  • Open documented standard/publicly available technical specification

Selection checklist

  • Is there a risk that the file format will become obsolete in the short/medium term?

  • Is the format open?

  • Is the format specification publicly available?

  • Is the format suitable for extracting and discovering data or simply for viewing data?

  • Does the format compress the data? (Compression may cause damage)

  • Is the chosen format an accepted standard?

  • What formats will be easiest to share?

  • Are there any discipline-specific requirements?

  • What formats will be easiest to annotate with metadata?

    For more information please see: UK Data Service - Formatting your research data