Skip to main content

Research Data Management: Creation and collection

Creation and collection

Setting up consistent systems for organising and documenting your data will allow you, your research collaborators, and any future potential users to find, access and properly interpret your data.

Creating your data

Choosing file formats

For data to have long-term value, it needs to be usable. Either during or upon completion of your research project, consider whether the file formats you have use are suited to storage or reuse. You can ensure your research remains accessible by taking steps to make your data compatible with different pieces of software or different systems.

For sharing or long-term storage, it is best practice to select file formats that are:

  • open documented standards or have publicly available technical specifications, rather than proprietary
  • used commonly by your research community
  • able to extract and discover data, rather than simply displaying data
  • able to preserve the data without compression
  • shareable

The UK Data Service offers guidance on formats for long-term storage as well as specific recommended formats for different data types.

Organising your data

Creating an organising system

Careful organisation of your folders and files will make it easier to locate and track your research, and minimise the loss of time, and frustration, that comes with disorganisation. Consider where you will keep your work on the networked drive, whether you would prefer a deep or shallow hierarchy structure, how you will manage ongoing and completed work, and build in time to review and manage your folders on a regular basis.

The University of Cambridge offers useful tips on developing an organising system that works for you.

Creating file names

A consistent naming convention can help to ensure file management and documentation. If there is not already a file naming convention in place, you are encourage to agree a set of instructions with all of those collaborating on your research project. File names may include researchers' names, project numbers, experiment numbers, version numbers, or whatever best suits your shared research practice. Consistency, clarity and documentation should be the key values as these qualities will ensure that the data is able to be used, reused and cited appropriately.

The University of Edinburgh offers full practical guidance on naming conventions for records management.

Version control

It should be clear to each member of the research team what stage the research is at in the lifecycle, so it is recommended that you agree a version control strategy. You may want to use a systematic naming convention to identify different versions of a file, record changes made, regularly synchronise files or agree a single location for the storage of master versions.

The UK Data Service offers further guidance on developing a version control strategy and some clear examples of version control in practice.

Documenting your data

What is metadata?

Metadata is data about data, and is crucial to making sure that your research is discoverable, reusable and citable. Metadata can include basic information, such as author(s), date created, or file size, or more specific information, such as funder, methodologies used, linked publications. Without this descriptive information, it could be difficult, or even impossible, for others to find, access and properly interpret your research.

What metadata do I need to create?

Metadata standards are highly discipline-specific. The Research Data Alliance has collected a community-maintained directory of metadata standards, extensions, tools, and use cases, sorted by subject areas.

Creating "readme" metadata

A "readme" is a plain text file that provides information about a data file or dataset, in order to help ensure that the data can be properly understood and interpreted. It is good practice to include readme files even if you are not intending to share or publish your data publicly, as it will help you or other members of your research team if you wish to reuse the data at a later time.

Cornell University offer a guide to best practice and a "readme" template.