Vision, Policies and Technical Aspects¶
On this page, we describe the vision and mission of the Radboud Data Repository (RDR) and highlight all of its relevant technical, legal and policy aspects. It is important that the RDR is as transparent as possible for its users, which is why information about the Repository’s IT architecture, strategic mission, legal framework and policies have been made available here. This page may help researchers, funding agencies, scientific journals, auditors and the directors of Radboud University’s institutes determine whether and how the RDR meets their requirements for archiving and publishing research data.
Mission & Scope¶
The RDR is an institutional digital repository to archive and share research data acquired, processed and/or analysed by researchers of Radboud University.
Our mission is threefold:
To preserve data on the long-term for re-use within Radboud University
To document the research process to support research integrity
To ensure the accessibility of digital research data to the scientific community, committing to the FAIR data principles and enhancing the impact of research executed at Radboud University
By research data, we mean all information that is or will be (1) generated as a part of the research process, and (2) the foundation of a scientific report. This definition of research data includes observational, experimental, simulated, derived or compiled data. Research data can take the form of measurements, documents and spreadsheets, but also laboratory or field notebooks, questionnaires, transcripts, codebooks, audiotapes and videotapes, photographs and films, models, algorithms, scripts, etc. Data generated by students as part of an internship do not belong in the RDR, unless they are of scientific value and serve as the foundation of a scientific report (other than the student's thesis).
The RDR supports researchers of Radboud University to adhere to the FAIR principles and Radboud University’s research data management (RDM) policy.
Radboud University is a multi-disciplinary university located in Nijmegen, the Netherlands. The University is committed to foster FAIR management of its research data. As of January 2021, Radboud University’s institutes and researchers are required to ensure at least the findability (F) and proper access management (A) of research data associated with scientific publications. The RDR offers the following features in line with this policy:
It makes data findable by assigning a persistent identifier (specifically a Digital Object Identifier; DOI) to each collection. Both externally shared collections (Data Sharing Collections; DSCs) and internally archived collections (Data Acquisition Collections; DACs, and Research Documentation Collections; RDCs) receive a DOI.
It makes data findable by providing rich metadata for each collection. These are according to the universally accepted DublinCore and DataCite standards and include the DOI, title, author, date, descriptive notes, applicable access restrictions and descriptions of the context, content and properties of the data (e.g. the audience, keywords, linked datasets, publications, analysis tools and pre-registrations).
It offers various options for access management to data and metadata. This is achieved by the various roles with different rights researchers can have in a collection (collection manager, contributor or viewer). Researchers can manage access to their published collections (DSCs) by selecting a proper access level and licence or Data Use Agreement.
Responsibilities & Preservation Plan¶
The responsibility with respect to data in RDR collections lies with the collection managers of those collections. They are responsible for the validity and authenticity of the content of the data collection and for ensuring that data are archived and published in compliance with national or international legal regulations (such as the GDPR), ethical standards, and publisher embargoes. The RDR does not carry any responsibility for the content, validity, and legal and ethical aspects of data collections. Note that within the Radboud University the chain of responsibility is such that the directors of the university’s research institutes are ultimately responsible for the storage, publication and archiving of data from research within their institute. That responsibility is often delegated to the principal investigator who is usually the collection manager in the RDR. Researchers can receive training and support to responsibly manage their research data from institute specific data stewards and the Research Data Management Support team. For more information on data responsibility within the Radboud University see the Radboud University’s RDM policy.
The RDR takes the following responsibilities:
The repository ensures that the deposited dataset is archived according to the FAIR principles to the best of its ability and resources.
The repository preserves data collections for at least ten years after publication of the dataset, consistent with Radboud University’s policy for storage and management of research data.
The repository shall, as far as possible, preserve the dataset unchanged in its original format, taking into account current technology and the costs of implementation. The repository has the right to modify the format and/or functionality of the dataset if this facilitates the digital sustainability, distribution, interoperability or re-use of the dataset.
Continued preservation depends on University funding. If, for some reason, this funding is no longer available, the University aims to store every digital object with a persistent identifier that currently resides in the RDR under equivalent technical and legal conditions.
To ensure long-term availability of the repository, the technical infrastructure is integrated as part of the regular IT infrastructure of the University.
The repository reflects several role-based responsibilities.
A support administrator is a member of the University's Research Data Management support team and is responsible for providing user support for all users (RU and non-RU) of the RDR. To do that, they have access to data and metadata of all collections in the RDR and to all user profiles in the RDR. They will only access collections if there is a valid reason to do so, mindful of GDPR compliance and the sensitivity of the data. All support administrators are expected to maintain confidentiality when processing data, in particular identifiable data.
A research administrator is a person who has administrative responsibility for one of the repository’s Organisational Units (OUs). Research administrators must be employed by Radboud University. Data collections can only be initiated by a research administrator. The research administrator checks the financial and legal aspects before initiating a collection and ensures that the data collection is properly set up with a collection manager, the researcher responsible for managing the content. A researcher is only eligible to be a collection manager if they are employed by Radboud University and in possession of the corresponding credentials (i.e. a U or Z number). The research administrator ensures that a data collection always has an up-to-date manager. This means that if a collection manager leaves the University or loses eligibility to be a manager in another way, the research administrator has to assign a new manager. Finally, the research administrator is responsible for access to all data collections in the corresponding OU. This is required for scientific integrity, so security checks regarding fraud, plagiarism and data construction can take place if needed. They will only access collections if there is a valid reason to do so, observing GDPR compliance and the sensitivity of the data. All research administrators are expected to maintain confidentiality when processing data, in particular identifiable data. To find out more about the responsibilities of research administrators, see our protocol for research administrators.
The collection manager is responsible for the validity and authenticity of the content of the data collection and for ensuring that data are archived and published in compliance with national or international legal regulations (such as the GDPR), ethical standards, and publisher embargoes. The collection manager must own all necessary rights required to deposit data in the RDR. The manager is responsible for managing access to the collection in a responsible manner by assigning contributors and viewers to the collection or by sharing the management responsibility by adding other managers. The manager is furthermore responsible for archiving and publishing the collection after a final review of the completeness and correctness of the data and of the compliance of the chosen data sharing conditions with legal and ethical constraints.
Once the data collection has been published or archived, the system administrator – one or more developers employed by Radboud University’s ICT Service Centre (ISC) – is responsible for long-term data availability. To accomplish this, the system administrator requires access to all collections in the RDR. Collections will only be accessed if necessary to fulfil the system administrator’s task and the system administrator is accountable for keeping up confidentiality when accessing data, personal data in particular.
The RDR offers various functionalities to improve findability of data collections.
Each collection (independent of the collection type) is identified by a unique, persistent identifier (DOI), which is findable using search engines like Google. The DOI also makes datasets citable. To improve exposure of data collections in the RDR, metadata are automatically exported to the RIS, the Current Research Information System (CRIS) of Radboud University. This makes collection metadata accessible in the Radboud Repository.
The RDR allows for various versions of each data collection. These versions can be found under the same DOI.
Findability is further enhanced by the use of extensive metadata fields. Some of the metadata fields are obligatory (title, authors, preservation time, keywords, target audience and DUA). The publication date, data size, version number, manifest, and DOI are automatically added as metadata. Researchers can also add references to related scientific publications, datasets, pre-registrations and analysis tools. In the case of published data collections (DSCs), metadata are visible to everyone, irrespective of restricted access conditions. Making metadata publicly available is optional for the archived internal collections (DACs and RDCs).
Metadata in the RDR follow the DataCite and Dublin Core metadata schemas. These metadata standards are broadly accepted and help to make data findable for both humans and machines. Metadata of published DSCs and of archived DACs and RDCs with public metadata are shared under a CC0 1.0 Universal licence. This includes the automatically generated ABOUT.txt, LICENSE.txt, and MANIFEST.txt files and files labelled as documentation files.
Access Management and Privacy concerns¶
Deposition of data in the RDR is in compliance with national and international standards and legislation regarding scientific integrity and privacy law (General Data Protection Regulation (GDPR), Dutch code for Scientific Integrity). Extensive online help and support from the RDM support team is provided to help train researchers with the policy and guidelines surrounding scientific integrity and data management. Researchers also receive help on how to anonymise and/or pseudonymise data before sharing.
The RDR offers extensive access management to collections through different roles researchers can have in a collection (collection manager, contributor or viewer). Access to all collections in an OU is automatically granted to research administrators of that OU, support administrators, and the RDR system administrator(s). This is required for scientific integrity purposes and for offering (technical) support. Researchers can regulate access to publicly shared collections by selecting an appropriate access level and licence and/or placing an embargo on a collection. The RDR offers several broadly acknowledged open access licences for research data and code. For potentially identifiable data, the RDR offers a university-specific DUA based on principles of Open access and the relevant European legislation (GDPR) as well as the Dutch code for scientific integrity. The RDR also offers a DUA for sharing research data under restricted access conditions. The conditions specified in this DUA have been agreed upon by the Radboud University legal department. Licences and DUAs in the RDR can be interpreted by both humans and machines.
If a user is not compliant with the licence or DUA of a published collection or with the project agreements of an archived collection, that user’s role in the collection will be revoked. In the event of data leakage (unauthorised access to personal data), Radboud University’s policy concerning the Data Breach Notification Duty (meldplicht datalekken) will be followed.
Data Integrity and Removal¶
To ensure data integrity, the RDR performs several technical, automated processes focused on version control and event tracking.
Firstly, the RDR tracks audit trails. The system logs all interactions of users with the data and metadata. The audit trail database is backed up daily. Each event that is logged contains information about 1) the user that initiated the interaction, 2) the timestamp, 3) the context or action (e.g. open, create, delete, update), and 4) the target object (e.g. data file, metadata attribute) of the action. All events are visible to the collection managers and contributors in the history tab on the web-based portal.
For all data files in the RDR, a SHA256 checksum is calculated and stored as attribute. Each data file is replicated to a second, off-site location and the success of the replication is confirmed by comparing the checksum on both sides. If the file is modified by a collection manager or contributor, the checksum is updated, and the data is replicated again. The list of checksums of all files in a collection can be downloaded from the web portal. This enables users to check data integrity after data transfer to or from an external system. The storage system for the data files performs regular data integrity checks.
As soon as a collection is published or archived by the collection manager, the collection becomes read-only, and the persistent identifier (DOI) pointing to that collection becomes active. A published or archived collection can be updated after creating a new version; a copy of the dataset is created, and the initial version remains read-only.
To ensure long-term compliance with the GDPR and save costs, the repository may remove data in published or archived collections after the retention period has expired. Even if data are removed, audit trails of ‘removed’ collections are always retained. Likewise, the metadata of published or archived collections are never removed.
In rare cases, a researcher may request that the research administrator deletes an entire data collection (data and metadata). This is only allowed of collections that 1) do not contain any data files 2) are still in the ‘editable’ state. This feature may be useful if the research administrator has made a mistake while initiating the collection, such as by using a wrong collection identifier or collection type. These collections are not actually deleted from the system, but are no longer shown to the users. The audit trail of the ‘deleted’ collection is always stored for scientific integrity purposes.
To guarantee the long-term accessibility of data files, the repository may perform data migrations, by moving data from one format to another, for example. In this case, the original data are always kept in a previous version of the data collection.
Data and Metadata quality¶
The RDR has several features to ensure data and metadata quality.
Firstly, some metadata fields are obligatory. This ensures that important dataset descriptors (e.g. title, authors, preservation time, keywords, target audience and DUA) are always provided.
Help is provided to researchers to compile the data collection through online help pages; the best practice pages on data documentation and metadata; data organisation; and preferred formats, and a helpdesk.
Data and metadata quality is also promoted through the implementation of review steps before a collection can be archived or published. These review steps allow for data collections to be thoroughly checked by collaborators and external reviewers (in the case of DSCs) before they are archived or published.
The RDR is classified as a highest security-grade data system suitable for preserving large volume, privacy sensitive (i.e. human) research data.
A strict security policy at the ICT Service Centre (ISC) is in place to ensure that valuable digital data in the RDR remain available (availability), do not become corrupted (integrity), and do not fall into the wrong hands (confidentiality).
Both data and metadata are stored on enterprise-grade storage systems. The database for metadata is backed up twice a day. Furthermore, two copies of the data are maintained by the system in different geographical locations to protect against data loss in case of a natural disaster.