Data documentation and metadata¶
Proper data documentation enables good findability and accessibility of data, guaranteeing that research data can be understood and used by current and future users (including your future self). It is important to add metadata and store the data in a structured and consistent way with appropriate data documentation. The documentation can be a description explaining what the data are, what you can do with them and how they can be used.
Best practice tips:
Fill out the metadata fields offered by the Radboud Data Repository (RDR) to improve findability of your dataset. These metadata fields include keywords, audience, associated data, publications, analysis tools and pre-registrations. In the case of keywords, you can use ‘free text’ keywords or make use of a standard thesaurus (such as MeSH or SFN keywords). Adhering to a standard is generally preferable to using ‘free text’ keywords and makes your dataset more interoperable. Should you want to make use of a standard that is not yet offered by the RDR, you can contact your data steward. You can also improve the re-usability of your dataset by filling out the description field properly. You can find more information about these metadata fields on the help page 'How to adjust metadata and invite colleagues to a collection'.
Include files to your collection that explain the dataset’s context and contain information on how the research was done. This can be done with version logs, notebooks or documents describing methodologies. Describe the who, what, why, where and how of the data. Always include:
Describe the project history, objectives and hypotheses
Describe the data collection methods. Elaborate on the study design, sampling, the data collection process, measuring instruments and tools, data management and file formats
Describe the data processing and analysis methods. Describe the cleaning, transforming, aggregating, and calculating procedures you performed on the data. Always explain which software (including version and developer’s web address) you used to analyse the data and/or compress files
Points 1 and 2 are often similar to the abstract of a scientific publication, whereas point 3 contains similar information to the methods section of a scientific publication. You can draw inspiration from your abstract and methods sections to add as contextual documentation.
In the RDR, you can add context information in the description field of your collection’s metadata, or you can provide the information in separate files.
Include files to your collection that describe the structure of the dataset. These are often readme.txt files or other documents that contain an overview of the various folders and files that make up the dataset. Describe which folder contains what, which files must be opened first, etc. In some scientific fields, standards exist that prescribe how your dataset should be structured. An example of this is the Brain Imaging Data Structure (BIDS: https://bids.neuroimaging.io/) for neuroimaging experiments (MRI, EEG, MEG, etc.). If there is a set of standards in your field, it is good practice to use it as it promotes the re-usability and interoperability of datasets.
Include files to your collection that describe the content of the dataset. These documents describe the dataset at the data level. These are often codebooks that explain the concepts and/or variables in question as well as their meaning and the values they represent.
If you provide documentation files in Data Sharing Collections (DSCs), make sure that you label them as documentation files so that they can be downloaded by anyone. This is especially important for collections that are not fully open access (i.e. Open access for Registered Users and Restricted access DSCs) where re-users must request access to your data. Documentation files help potential re-users to evaluate whether your dataset is suitable for their purpose before they request access.
For an example of good documentation practice, see this collection’s $README.txt file in the files tab. For more inspiration, see Cornell University’s clear help on how to write a README including a template README. The general information and sharing/access information of this template are not necessary to provide in a README if you use the RDR: those are obligatory metadata fields in the RDR.