Introduction to the RDR

Background

The Radboud Data Repository (RDR) is Radboud University’s solution to sharing, archiving and publishing research data acquired by researchers of the University and the Radboudumc. It has been designed to accommodate research data management workflows throughout the research life cycle. The repository ensures the long-term preservation of large datasets (no size-limit) and helps researchers adhere to the FAIR principles and Radboud University’s research data management policy.

The RDR was initially developed as a pilot project at the Donders Institute of Brain, Cognition and Behaviour. Several researchers, ICT developers and data stewards at the Donders Institute provided the requirements that resulted in the design and implementation of the Donders Repository. After several years of extensive use, the Donders Repository was made available to all Radboud University research institutes under the name Radboud Data Repository (RDR). The RDR continues to be developed and maintained by the University's Information & Library Services department.

Organisational Structure

Each research institute of Radboud University can choose to use the RDR. Research institutes that are currently affiliated with the RDR are listed here. Within the RDR, research institutes are accommodated as ‘Organisational Units’ (OUs). RDR users can be added to one or more of these OUs.

Data collections

A dataset in the repository is called a ‘data collection’ and consists of metadata and regular data. The metadata are based on the Dublin Core and DataCite schemas, while the data are represented in free form and consist of directories and files. A data collection is accommodated in one of the RDR's OUs.

To allow research data management throughout the research life cycle, the RDR offers three collection types that serve three goals. Be aware that not every OU in the RDR uses all three collection types. The implementation of collection types and what kind of research data should be stored where depends on the policy of the OU.

The goals of the three collection types are the following:

  1. Data Acquisition Collections (DACs) serve to promote internal reuse of research data and reproducibility of results. These collections archive data in their original form, meaning without manipulations that limit future analyses of the data. Data in these collections are kept internal for ethics and privacy requirements. They are not accessible to the general public: only to researchers who have been invited by the researcher responsible for the collection. The metadata of these collections are also internal by default: access is restricted to researchers who have been added to the OU. However, researchers can decide to make the metadata public.

  2. Research Documentation Collections (RDCs) serve to promote reproducibility and research integrity. Here, files documenting the research process -from acquisition to publication- are archived. These collections' data (and, by default, the metadata too) are kept internal, like the DACs.

  3. Data Sharing Collections (DSCs) serve to promote open access data sharing, allowing for sustainable data access and reusability of research data. They contain the data which published results are based on. DSCs' data and metadata are publicly available. Depending on the access level of the DSC, users can download the DSCs' data without signing in, upon signing a DUA, or upon approval of an access request.

Collectively, these three collection types support a researcher throughout an entire research project, as shown in the image below.

../_images/workflow.png

Raw data can be deposited into a DAC directly after data collection. The DAC can be updated every time new data is collected. The DAC serves as an environment to document and organise the data, a way to access the data from anywhere, and a backup . As soon as data collection for the project is finished, the DAC can be archived.

Soon after the first data have been collected, data analysis and processing begins. Lab notes, analysis scripts, processing pipelines, intermediate results, and all other files that document the research process can be deposited in an RDC. The RDC can be updated as necessary at milestones in the research project. The RDC offers an environment to document and organise the data, a way to access the data from anywhere, provides a backup, but also a safe manner to share (sensitive and/or personal) data with colleagues and collaborators from inside and outside of Radboud University. When the project is finished, the RDC can be archived.

Towards the end of the research project, the results of the project will be published in a scientific journal. The data underlying that publication can be deposited in a DSC as the researcher is preparing the manuscript. The DSC offers a way to provide double anonymous access to the reviewers of the submitted manuscript prior to publication of the data. When the manuscript is accepted for publication, the corresponding DSC can be published so that the data can be reused or the results in the paper validated by the scientific community.

Role-based access management

Before publication or archiving (see below), access to collections is based on various roles with specific rights in the RDR.

  1. The support administrator offers user support for the RDR. They are members of the University's Research Data Management support team and are responsible to provide user support for all users (RU and non-RU) of the RDR. To do that, they have access to data and metadata of all collections in the RDR and to all user profiles in the RDR. They will only access collections if there is a valid reason to do so, mindful of GDPR compliance and the sensitivity of the data.

  2. The research administrator has an administrative function in the research institute (OU). The research administrator initiates data collections and assigns a collection manager: the researcher responsible for the collection. The research administrator ensures that each collection has an up-to-date manager who can be reached. Research administrators have access to all of their OU’s collections for reasons relating to scientific integrity. They will only access collections if there is a valid reason to do so, mindful of GDPR compliance and the sensitivity of the data.

  3. The collection manager is the researcher responsible for the collection. The manager can invite colleagues to view or work on the collection, edit metadata and add, modify or delete the collection's data. Once the collection is ready to be published or archived, the manager can change the status of the collection to archived or published (according to the review steps described below).

  4. A collection contributor is a researcher who has been invited to work on the collection by the manager. The contributor can edit metadata as well as add, modify and delete the collection's data.

  5. A collection viewer is a person who has been invited to view the collection by the manager. A viewer can view and download the collection’s data.

Data archiving and publication

Data collections in the repository can have various states.

  1. Data collections are initiated in an editable state. The collection manager can invite colleagues from in and outside of Radboud University to work on the collection. Researchers can edit a collection’s metadata and add, modify and delete data files.

  2. When researchers have finished working on the collection, it should be reviewed. There are three states for review purposes: internal review, FAIR review and external review. The collection becomes read-only (i.e. 'frozen') in the review states so that it can no longer be edited, but it can easily be changed to editable again if the review process identifies changes that should be applied. The internal review state serves for internal reviews, namely reviews by all researchers working on the collection. During the FAIR review state a FAIR reviewer (an RDM expert of the university library's RDM support team) reviews the collection. External review serves for reviews by the journal to which the manuscript has been submitted. The last two review states are only available for DSCs (not for the internal DACs and RDCs).

  3. After the review process, the collection can be closed. DACs and RDCs are archived, while DSCs are published. In both cases, the collection is made read-only permanently, and a persistent identifier pointing to the collection becomes active. Changes can now only be applied by the creation of a new version of the collection. Data and metadata of archived DACs and RDCs remain internal. However, if the researcher chose to make the metadata public, the collection’s metadata become publicly available at the moment the collection is archived, and they are automatically registered in the University’s current Research Information Services (RIS). DSCs' metadata are always made publicly available upon publication and are also automatically registered in RIS. The data of published DSCs are accessible depending on the access level of the DSC selected by the collection manager.

The publication workflow for DSCs is displayed in the image below:

../_images/publish_flowstate.png

Who can access the RDR

Users are represented in the RDR by a user profile with attributes that are transferred from an external identity provider.

  1. Researchers at Dutch research institutes can log in via their SURFconext account. For researchers and students of Radboud University and Radboudumc this means they can use their U, S or Z number. Only researchers employed by Radboud University are eligible to become collection managers. Other researchers with SURFconext accounts can be made contributors or viewers upon invitation by the collection manager, and view the metadata of published DSCs. They can view and download the data of published DSCs depending on the access level of the DSC.

  2. Researchers without a SURFconext account can login using ORCID. These researchers are eligible to be collection contributors or viewers, can view the metadata of published DSCs and can view and download the data of published DSCs depending on the access level of the DSC.

  3. Researchers are also able to login with SURFconext by coupling their social media accounts (e.g. Linkedin, Facebook, Gmail) via EduID. These researchers are eligible to become collection contributors or viewers, can view the metadata of published DSCs and can view and download the data of published DSCs depending on the access level of the DSC.

  4. Anonymous users – those who do not login to the RDR – can only view the metadata of published DSCs and view and download the data of published “Open Access” DSCs (see Selecting an appropriate access level and license for your Data Sharing Collection).

User interfaces

The RDR has two user interfaces for accessing and managing datasets: a web-based portal to access and manage data and metadata and a WebDAV interface to manage data. Details can be found in our user manual.

Collection versioning

Published and archived collections are read-only and can no longer be edited. Sometimes, a researcher may wish to add data to a collection, correct an error in the collection or update the collection in another way. In that case, the repository allows a new version of the collection to be created. This is an editable copy of the published or archived collection. The collection managers and contributors can apply changes to this copy and publish or archive it as they did for the first version of the collection. All versions of the collection are available under the same persistent identifier, and they refer to each other in their descriptive metadata. The researcher is responsible for providing documentation on why the new version was necessary and what changes have been made with respect to the previous version.