New Federal Guidelines for Data Repositories: Moving Toward FAIR
In May, 2022, the Office of Science and Technology Policy (OSTP) of the White House released new guidance, “Desirable Characteristics of Data Repositories for Federally Funded Research,” codifying characteristics agreed to by Federal agencies involved in conducting, and/or funding scientific and scholarly research. This policy document is the work of the Subcommittee on Science (SOS) of the National Science and Technology Council (NSTC), with input from OSTP and public comment. There are a couple of important aims for this document: First, to provide a set of “general capabilities for researchers, agencies, and institutions to prioritize when selecting repositories to share research data” (pg. 2). Second, to make assurances that these guidelines are a start toward more streamlined federal policy, and, “that future steps will be needed to better coordinate data storage and management to make data from Federally funded research more findable, accessible, interoperable, and reusable (FAIR), as well as more equitable, inclusive, secure, and trustworthy” (pg. 2).
The characteristics cover aspects of oversight, service and policy, organizational stability and governance, digital object management, and technical infrastructure. Considerations of long term access and use are an essential element of the framework. The structure can be found in Table 1 (pp. 3-4) with the following headings
Summary of elements: This section outlines expectations for collection policies, services, and oversight. To the extent possible, data and metadata are to be available “free of charge.” Clear policies are to be publicly available on access and use limitations based on privacy, confidentiality, data sovereignty, and other governing factors. Organizations will have a retention policy, along with a business plan that includes risk management processes and a sustainability plan for the long-term management of datasets.
Digital Object Management
Summary of elements: This section underscores the primacy of disciplinary and community norms and standards. Repositories are expected to provide unique, persistent identifiers (PIDs) such that a record of the data will exist - even if that resource is decommissioned. Quality control processes are expected to be managed by dedicated curation experts (whether local or external staff). Metadata schema and other standards should align with discipline and community needs, and data should be made available in (file) formats that are “widely used.” A record of provenance is required. “Terms of reuse” are expected to be included with metadata to facilitate attribution and citation. This, along with the accessible PIDs, will support services that track usage and citation.
Summary of elements: This section specifies technological aspects aimed at ensuring security, stability, and durability of the hardware and software of the “service stack.” Authentication services will be implemented to check credentials of depositors; and, the repository will have the capability to connect depositor credentials with the PID for the data resource. “Longevity” plans are in place for the repository that include funding, and technology updates and refresh. Policies and technologies are in place for securing access, and to meet evolving cybersecurity requirements.
There is also guidance for repositories that serve “human data,” and need to implement and manage tools that support privacy and increased security for this special class of resource. The characteristics for these repositories include procedures to address “fidelity to consent,” increased security protocols, “limited use compliance,” controlled and auditable download, review of data use requests, plans to address security breaches of cyber intrusions, and procedures for “addressing violations of terms-of-use.” Notably, some of these characteristics for human data are likely to be well beyond the scope of non-governmental repositories; but, these lay the foundation for a more consistent federal ecosystem.
While not an exhaustive list, the set of Desirable Characteristics represents federal agency consensus around high level elements that will facilitate comparable services across a broad range of repositories and organizations. It is anticipated that these guidelines will be used by Federal agencies and awardees to identify data repositories (outside of federally managed or designated repositories) that support public access; and, for evaluation of data management plans in proposals for federal funding. Importantly, while the guidance aligns with existing repository standards, this policy clearly states that third party certification is not required.
Subcommittee on Science, NSTC, White House Office of Science and Technology Policy. (2022). Desirable Characteristics of Data Repositories for Federally Funded Research. May, 2022. Accessed August 12, 2022. Available: https://repository.si.edu/handle/10088/113528