Data organisation¶

Think carefully about how you structure, name and version your files and folders. By doing so, you can easily locate and use the latest version of your files and limit the chance of accidentally removing or overwriting information. When you collaborate with others on files, good naming and version management become even more important.

Everyone has their own preferred way of organising data, but we have provided some useful tips below.

Other examples can be found on the web pages of the UK Data Archive and Wageningen University & Research.

Folder structure¶

The way the folders of your collection are structured depends greatly on your own preference. You could, for example, base the structure on time, the step in the research cycle, data type or subjects. Importantly, the structure should be consistent and clear. Whatever structure you choose, it is best practice to provide clarification by adding proper documentation, as described in our best practices for data documentation and metadata.

Regardless of how you set up your structure, try not to add too many levels. Usually, around four folder levels is enough. You should also not create too many parallel folders; for example, use a single Papers folder and not Paper_Nature, Paper_Science and Paper_Cell. You can avoid both by keeping in mind that folders should have general descriptions, that folders contain multiple files and that details can be contained in file names (see below).

Specific standards apply in some research fields or Radboud University research institutes. Adhering to such standards increases the reusability of your data collection. Making use of these standards is therefore encouraged. Some examples are listed below:

Astrophysics experiments: Standard for Documentation of Astronomical Catalogues
Climate and forecast: CF metadata conventions
Crystallography experiments: CIF (Crystallographic Information Framework)
Ecological data: EML (Ecological Metadata Language)
Geographical information: ISO 19115-1:2014
Microarray experiments: MIAME (Minimum Information About a Microarray Experiment)
Neuroimaging experiments: BIDS (Brain Imaging Data Structure)
Neutron, X-ray and muon science: NeXus
Proteomics experiments: MIAPE (Minimum Information About a Proteomics Experiments)
Social, behavioural and economic sciences: DDI (Data Documentation Initiative)
Statistical data: SDMX (Statistical Data and Metadata Exchange)

Naming¶

When naming your folders and files, it is most important to use a systematic naming convention that uniquely identifies folders and files. The contents of a folder or file should be clear without further elaboration.

Note that metadata of published DSCs and -optionally- archived DACs and RDCs, including the list of your collection's files, are made public. Therefore, do not include any personal data or other sensitive information in your file and folder names.

Try to avoid long and detailed names like data_questionnaire_long_version_without Q5. Instead, use short but meaningful names and follow with correct versioning, as described below. For example, data_collection for a folder and questionnaire_20210714 for a file. Keep the names short and add detail by versioning (see below).

When naming files and folders, try to avoid special characters, such as &, % and !.

We recommend to pick a naming strategy and stick to it. For example, you can use underscore in between words (paper_Nature_20210714), use capitals (PaperNature20210714) or any other method, but do not be inconsistent.

Versioning¶

It is important to correctly and consistently version your data, for example with pre- or postscripts. Try to avoid vague terms like _final or _final_2, but instead use version numbers (_v1.1) or dates (_20210714). When you write dates in the year-month-day order, they are chronologically shown if you order your folders by name.

Adding a version log can also be useful. In this kind of log, you can monitor every edit, who made it and how the new version is named. This can be especially helpful when collaborating with others.