How Data Licenses Can Help Make Your Research Data More FAIR
If you are a researcher at a non-federal institution, such as a college or university, who has recently received grant funding from a federal agency, you have undoubtedly been required to make your research products available for “public access.” Non-federal funding organizations are also encouraging (or requiring) this. While there are a number of strategies for facilitating public access to research products, there is one that greatly increases the ability of both humans and machines to access research data, and also helps to apply the FAIR (Findable – Accessible -- Interoperable -- Reusable) Data Principles to your data, especially the “Accessible” and the “Reusable” principles. Specifically, this strategy encompasses assigning a machine-actionable license to your data that clearly states the conditions for access and reuse, and thereby, supports data sharing. Choosing an appropriate license for your data can be complicated as there are many factors to consider; however, here is a relatively simple use case that describes steps to take and decisions to make:
A geoscience researcher working for a university in the University of California (UC) system is the Principal Investigator (PI) for a grant received from the National Science Foundation (NSF) to conduct field research. The researcher is ready to publish a paper on the findings from the project and intends to publish a “data paper” as well to make the data underlying the research findings available for public access and reuse. What are the next steps for the PI/researcher?
First, determine what data needs to be licensed. In this use case, per NSF’s Public Access Policy (1), the PI/researcher is expected to publish any findings (i.e., in a peer-reviewed manuscript or published paper) and to share the primary data created or gathered as part of the project. Any data used for reference or background information for the research project (e.g., geospatial coordinates to denote location) are commonly and publicly available, and therefore, considered “facts” which are not subject to copyright restrictions. Unique methodologies, and data and graphic visualizations resulting from original processing or analysis done as part of the project are generally eligible for both copyright and licensing protections. Different funding agencies will have different requirements which can usually be found on the agency website or in the award documentation.
Second, ascertain who can make the call about licensing choices. The researcher in the use case is an employee of the UC system, where most work done by the PI/researcher is considered “Scholarly & Aesthetic work.” Under current policy (2), UC transfers the copyright it owns to the academic authors who prepared the works. UC policy further states that researchers must assign a usage license that conforms to NSF requirements since that is the source of the extramural funding for the project (3). The PI/researcher can confirm these facts and discuss the licensing options available per their institution’s IP and data policies by contacting their local UC Sponsored Research Department, or the Scholarly Communications expert in their University Library.
Third, select a data repository in which to deposit the data for current and future access and stewardship. In this use case, the authors’ final version of their published paper(s) need to be deposited into the NSF-PAR (Public Access Repository) per NSF Public Access Policy. Data related to the paper can be deposited into a discipline or domain data repository such as the Biological and Chemical Oceanography Data Management Office (BCO-DMO) or a general data repository such as Dryad – both of which provide long-term access and stewardship of the deposits. Note that several Federal agencies are planning changes in Public Access and data sharing policies that would minimally require the deposit of metadata records for data into federal Public Access repositories, but such a deposit is voluntary at this time (4).
Journal publishers may also have requirements related to the repository in which the underlying data are to be stored. For example, the publishers represented by the American Geospatial Union (AGU) require that any data (and software) underlying papers in the journals be deposited into a “community accepted, trusted repository, as appropriate, and preferably with a DOI” (digital object identifier), among other requirements such as creating an availability statement that includes descriptions of access conditions and licensing/permissions needed to use the data (5).
Fourth, find out what the licensing options are for the data repository where you’ve chosen to house your data. Both discipline (or domain specific) and general repositories may only support one type of license. For example, Dryad, a general repository affiliated with many universities and publishers, will only accept files with licensing terms that are compatible with the Creative Commons Zero waiver (CC0), a public domain license that waives all rights associated with copyright, and allows most uses of the data (copying, modifying, distributing and performing) without asking permission (6). By contrast, data deposited to the BCO-DMO requires that the deposited data (and associated information) be licensed under a Creative Commons Attribution 4.0 International License (CC BY) which allows the sharing and adapting of the deposited data by giving appropriate attribution (credit), along with a link to the license (7). These two licensing options are probably the most commonly chosen in situations like the described use case, and both meet the NSF public access requirements when they include machine-actionable links (URLs) in the recommended metadata descriptions for papers at the time of deposit and publication. The licensing options available for a data repository are usually found on the repository’s website.
NOTE: For a helpful blog post from the UC Office of Scholarly Communication discussing the difference between CC0 and CC By, see https://osc.universityofcalifornia.edu/2016/09/cc-by-and-data-not-always-a-good-fit/.
Use Case Summary: After taking these steps, the UC PI/researcher in this use case is ready to publish findings from their NSF-funded research, deposit the underlying data into Dryad using a CC0 license, and receive the DOIs required to submit the manuscripts and underlying data to an AGU journal. While there may be other requirements for public access and FAIR implementation from NSF, the journal publisher, and the data repository, the PI/researcher can be confident that many NSF recommendations for public access and making their research products more FAIR will have been met much more quickly by the assignment of an appropriate data license.
The use case here illustrates the basic steps needed to acquire a license for data. For more information about other kinds of Creative Commons licenses and how they can impact “open science” of which public access is a significant part, see Fact Sheet on Creative Commons & Open Science.
Software that has been created for a research project is out of scope for this use case; nevertheless, software and code are critically important to the effective reuse of research findings. As a result, both are subject to the requirements for public access, and benefit from licensing. For a discussion on how to choose a license for scientific software, see Software Deposit: How to choose a software licence.
(1) https://www.nsf.gov/pubs/policydocs/pappg20_1/pappg_11.jsp, Chapter XI, E.4.
(3) Per the NSF’s FAQ’s on public access, #37 : “The Federal Government has a non-exclusive, irrevocable, worldwide, royalty-free license to exercise or authorize others to exercise all rights under copyright to use a federally-funded work for Federal purposes. The Federal Government license includes the right to have the copyrighted material included in a repository where the public can search, read, download, and analyze the material in digital form.” (https://www.nsf.gov/pubs/2018/nsf18041/nsf18041.jsp#q1).
(4) EarthCube plans to offer a webinar on this topic in the near future; in the meantime, for more information on NSF’s plans, see “Dataset Pilot.”
(5) Complete AGU requirements for authors of data and software are located at https://www.agu.org/Publish-with-AGU/Publish/Author-Resources/Data-and-Software-for-Authors.
(6) See more information about Dryad requirements at https://datadryad.org/stash/faq.
(7) See more information about BCO-DMO’s requirements for submission at https://www.bco-dmo.org/terms-use.