EarthCube Promotes FAIR Data and Data Resources for the Geosciences Community
A Position Paper on EarthCube adoption/promotion of principles embodied in the FAIR acronym for current and future activities. See also accompanying compilation of FAIR details, resources, citations and publications at [LINK]
Authors: Rubin, K.H., Kelbert, A., Stamps, D.S., Meier, O., Koskela, R. and the EarthCube Leadership Council others?
3 April 2019, Adopted 10 April 2019
- Position Statement - briefly defines what is FAIR
- Target Audience
- Current status - EarthCube alignment with FAIR
- Recommendations and Opportunities for EarthCube as a whole, members, and individual geoscientists
The Leadership of EarthCube affirms the importance of the ideas embedded within the FAIR Principles for Data and Data Resources[a], such that Data and Data Resources (e.g., tools, models, code/software, metadata for physical samples) are Findable, Accessible, Interoperable and Reusable (“FAIR”), and will promote the adoption of FAIR principles by EarthCube and members of the broader geoscience community in their activities, projects, and infrastructure developments.
The aforementioned position statement (1) summarizes the consensus view of EarthCube regarding the importance and aspirational value of FAIR principles; (2) provides information to raise awareness among the EarthCube Community and the broader Geoscience Community about FAIR; (3) encourages and recognizes geoscientists who participate in implementing suitable FAIR practices in their individual and collaborative research workflows; (4) promotes FAIR as community best practices for data and data-related resources (for instance, tools, models, code/software, metadata for physical samples, data repositories); and (5) recommends specific steps EarthCube can take now and in the future to enhance programmatic outcomes by the adoption of FAIR. Related, but not explicitly stated in FAIR practices, are the EarthCube goals (see earthcube.org) to support Open Data (e.g., LINK] and Sustainable data and data resources where possible.
EarthCube is committed to making FAIR information available to all people, especially geoscience professionals that collect and analyze data and create data resources, as well as data infrastructure developers and providers.
This EarthCube Position Statement on FAIR principles addresses the activities of EarthCube staff and membership; EarthCube Science Support Office (ESSO) activities, meetings, and special functions; and the role of EarthCube and its members in their larger communities. In addition, this best-practices policy applies to all participants in EarthCube programs, partners, contractors, vendors and other contractual groups that interact with EarthCube, whether they are academic, governmental, non‐profit, or industry.
Sound data management practices by individuals and organizations are key to new knowledge generation and discovery. Data and knowledge integration through community reuse of data can lead to broad scientific advances on complex, multidisciplinary research topics. Scientific advances in many disciplines are greatly facilitated by open access to readily available and well-documented data, including observations, interpretations, metadata, and data resources. Those resources include models, analysis and utility software, catalogues, vocabularies and specifications, which together create infrastructure for access, manipulation, and interpretation of data. An organization adopting FAIR data principles greatly facilitates access to, and reuse of, data and data resources.
EarthCube’s mission is to enable geoscientists to better understand and predict the complex and evolving Earth system by fostering a community-governed, common cyberinfrastructure to collect, access, analyze, share, visualize, and archive all forms of data and resources, and the recording and sharing of research workflows, all using advanced technological and computational capabilities. EarthCube’s long-term vision is a community-driven, dynamic cyberinfrastructure that supports standards for interoperability, infuses advanced technologies to improve and facilitate interdisciplinary research, and helps educate scientists in the emerging practices of digital scholarship, data and software stewardship, and open science.
FAIR principles are aligned with this mission and vision, and provide a framework for the community to work towards attaining these goals. Therefore, FAIR represents best practices that EarthCube wishes to promote to its own community and to the broader geosciences community. Since EarthCube promotes the development of infrastructure and supporting knowledge for finding and using geoscience data, it is functionally aligned with many of the goals of FAIR principles, and should endeavor to develop and promote the use of these principles and related existing resources (see [LINK]) in all of its activities.
Several organizations are actively promoting FAIR for scientific data. One way that EarthCube is particularly well-suited to contribute to this effort is by extending and applying these same principles to data resources, so that all components of a data analysis workflow are also findable, accessible, interoperable and reusable.
EarthCube should enhance its use of FAIR principles in all of its activities, thus maximizing the benefit to its individual members, to the organization, and to the Geosciences Community it supports.
The EarthCube community is dedicated to maintaining an organizational climate where diverse scientific ideas and research workflows are supported and enhanced by scientific infrastructure and data resources that are based on FAIR principles (learn more in the EarthCube FAIR Resources document [LINK]). Other related benefits deriving from the practice of FAIR data and data resources include streamlined knowledge transfer and ready access to hands-on examples/exercises that illustrate and explain scientific practices to the public and policy makers.
Since first appearing in 2016, there is now a fairly substantial literature on FAIR principles and their application (see “References” in [LINK]). Furthermore, various organizations have developed lists of definitions and actions in support of FAIR, some of which can be found at the FAIR resources web page on the EarthCube.org website [LINK[b]]. This section provides a high level summary of the main principles and what these mean for a working geoscientist.
Some of the key directives within FAIR are summarized here:
The FAIR Data Principles are guiding principles for ensuring that openly shared data are Findable, Accessible, Interoperable, and Reusable [LINK]. They were published in Nature Scientific Data in the article “FAIR Guiding Principles for scientific data management and stewardship” (Wilkinson et al., 2016) and have been adopted by numerous organizations including FORCE11, the European Commission, the American Geophysical Union, Nature Publications, and the National Institutes of Health. There is general concurrence on each of the component objectives, but ongoing discussion in the broader data science community on some of the details. The four guiding principles are paraphrased below, and detailed in the FAIR resources document [LINK]
- Findable: data / data resources are assigned globally unique and persistent identifiers; are described with rich metadata that includes the unique identifier of the data they describe, and are registered or indexed in a searchable catalogue
- Accessible: data / data resources and metadata are retrievable by their identifiers using a standardised communications protocol that is open, free, universally implementable, and allows for an authentication and authorization procedure when needed. Metadata remain accessible, even when the data are no longer available
- Interoperable: data / data resources and metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation, use language that follows FAIR principles, and includes qualified references to other metadata
- Reusable: data / data resources and metadata are richly described with a plurality of accurate and relevant attributes, are released with a clear and accessible data usage license, are associated with detailed provenance, and meet domain-relevant community standards.
What do these principles mean for the typical working geoscientist? In short, they are the codification of the major components of a scientific workflow for any research or academic activity that requires multiple data types, individual data sets and data from multiple sources, or the passing of data between multiple tools to analyze, model, or visualize trends within the data set. FAIR allows scientists to more easily discover and use data for a wide variety of applications, including those that repurpose data for applications conceived of after the original data collection.
Current status - EarthCube Alignment with FAIR
EarthCube does not currently have standardized information about each of its constituent projects and activities as to their alignment with the underlying principles of FAIR, or to the general FAIR landscape across the main science domains of the NSF Geosciences Directorate. There is significant activity within many EarthCube projects in this regard. EarthCube activities have recently focused more on the "F" and "A" (making data and resources findable and accessible) rather than I and R (making data interoperable and reusable), thus a systematic survey of current and past funded projects is needed to clarify and uncover areas that need more attention in the future.
In its priority document for 2018 and beyond [LINK], the EarthCube Leadership Council identified three priorities, two of which directly address FAIR principles. These are Registries for data and data resources, and a Workbench environment. The third, Scientist Engagement/Science Advancement is also indirectly aligned with FAIR insofar as FAIR itself promotes those advancement goals. In response, during 2018-19, ESSO and the LC stood up several pilot project efforts to initiate the delivery of a new platform called “GeoCODES”. This new infrastructure will establish protocols, metadata specifications, and intuitive software tools for these activities and includes several individual component projects: “P418/P418 GUI/P419” (data center holdings registry and discovery tools) and a “Resource Registry” (more info about GeoCODES at [LINK]). These primarily address making data and data resources findable, accessible (by being registered) and interoperable (by conforming to a standard metadata model).
A significant issue now, and in the future, is the ability to align EarthCube Community and Governance goals, with respect to FAIR data and data resources, with those of the National Science Foundation’s data policies; these policies are aspirationally aligned with FAIR, but neither well defined nor uniformly enforced. Further complicating matters is the fact that the Geosciences Directorate (GEO) at NSF has its own data policy [LINK], as do divisions under GEO, with differences between them. It is useful to note that before the implementation of these various policies over the past decade, the data landscape at NSF was much less organized, and efforts thus far at the Foundation to promote public access of data have greatly enhanced availability of data generated by NSF-funded research.
Finally, two related topics that are not part of FAIR but closely align with it are Sustainable and Open-Access practices for data and data resources, as shown in the EarthCube Data Triangle figure. Open Data (OD) is free online availability of data and data resources, with clearly identified usage policies and proper citation. Sustainability is a development or practice that meets present needs with practices that can continue into the future and without compromising the needs of future generations as applied to data and data resources. This means operating infrastructure for access and use of data that can continue into the future and not jeopardize future data use. Implementation of both efforts require resources and mindsets that differ from the traditional academic science business model. EarthCube has an opportunity to sit at the nexus of FAIR, OD, and Resource Sustainability to help provide workable solutions to geoscientists and leverage off of existing efforts globally to each of these fronts.
EarthCube is well-positioned to align and rationalize the practical applications of these various policies to help facilitate the finding, accessing, interoperability, and reuse of data and data-related products arising from current and past NSF-GEO and NSF-CISE/OAC (Directorate for Computer & Information Science & Engineering, Office of Advanced Cyberinfrastructure) funded research. As new initiatives and Big Ideas [LINK] come online at NSF, EarthCube can evaluate how they might form opportunities for EarthCube to further promote both FAIR principles and the overarching goals of the program.
Recommendations and Opportunities
The recommendations described here are intended to generate tangible outcomes for the geosciences community driven by EarthCube and individual scientist efforts.
Implementation by EarthCube
Implementation of the following recommendations is a major element in achieving the goals of this Position Statement. It includes elements that EarthCube, as an organization, and particularly its leadership (the Leadership Council and the ESSO) aspire to, and elements that individual geoscientists (in their EarthCube‐related activities and otherwise) are likewise challenged to adopt as best practices. By setting and following a high benchmark for FAIR, EarthCube can promote to the Geoscience community the importance of these principles and the practicalities they present for enhancing individual and group workflows, and thus knowledge discovery. Some of those elements are:
- Evaluate the Current Status—There is a lack of quantitative and qualitative understanding of the current status of FAIR implementation in EarthCube. To address this lack of detailed knowledge, EarthCube should embark on an activity to catalog all past and present NSF-funded EarthCube activities as to which aspect(s) of FAIR they address, and at what level. The Leadership Council will develop a metric (e.g., LINK) for evaluating “FAIRness” and distribute a short survey to current and past project leads, to ascertain the extent to which their project addresses F, A, I and/or R. Key partners in this activity are the CDF (Council of Data Facilities) and individual project principal investigators. The goal is to integrate this activity with the EarthCube Resource Registry initiative. [Other initiatives, if needed, will be stood up in a timely fashion. EarthCube will stand up a FAIR task force with membership from across the various governance groups and community at large to help with this effort.]
- Enhance the Current Status —EarthCube is committed to addressing organizational gaps or deficiencies identified in the status analysis. For instance, the establishment of a workbench environment would further advance the goals of making data and resources Interoperable and Reusable, and should be pursued as soon as possible (e.g., all data and metadata for data sets and data resources are released with a clear and accessible data usage statement, have detailed provenance, and meet domain-relevant community standards, as per R1.1, R1.2, and R1.3 of the FAIR principles) -see appendix
- Assess— EarthCube will define a benchmark that measures its adherence to FAIR principles and promote its application to EarthCube activities.
- Align with and enhance other NSF initiatives—EarthCube will continue to promote FAIR to NSF on behalf of the community to help spur positive change at the Foundation. This alignment will take time, but it starts with a clear statement of geoscience community needs in this regard to help drive development, adoption, implementation and enforcement of a more coherent and uniform data policy to the aforementioned funded research, to the benefit of all involved. Furthermore, EarthCube will continue to explore the extent of the alignment of NSF-wide initiatives on behalf of the geosciences community.
- Promote Sustainable and Open-Access practices along with FAIR practices — EarthCube will, where possible, promote these two related concepts to its projects and to its broader community for adoption in all of their scientific activities that generate data and data resources.
- Communicate—EarthCube is committed to establishing a plan for sharing this EarthCube position and related education materials throughout the membership, and to promote to geoscientists in general the value of following FAIR in their research.
- Work through the CDF (Council of Data Facilities) to promote adoption of FAIR to domain data repositories - A key element of individuals practicing the FAIR principles is having a repository home that supports the researcher in these practices (e.g., landing pages, citation support, rich metadata).
- Watch and Coordinate with National and International FAIR Efforts - EarthCube will maintain contact with developing Initiatives such as GO FAIR [LINK] (an international governing body for FAIR implementation), the Research Data Alliance (RDA) [LINK], the Committee on Data for Science and Technology (CODATA) [LINK], and the Enabling FAIR Data consortium (AGU, ESIP, RDA, and various publishing partners) [LINK] to stay aligned with and learn from emerging ideas and best practices as multiple groups work to “Change, Train, and Build” around these principles.
Opportunities for EarthCube Members
There are many opportunities for EarthCube Members to help Implement FAIR practices internally. We recommend the following actions to increase the involvement of geoscientists in adoption and promotion of FAIR principles.
- EarthCube members should seek opportunities to effectively communicate the value of FAIR principles.
- EarthCube members should seek to implement FAIR principles in their research and cyberinfrastructure development. For instance:
- Promote awareness of the value of the Data Management Plan (DMP) and how it can be used to encourage FAIR practices.
- Identifying how FAIR principles can be implemented in research proposal and project Data Management Plans.
- Identify or create searchable resources for data and metadata registration, as per FAIR recommendation F4 (“(Meta)data are registered or indexed in a searchable resource”)
- Identify or create domain-relevant community standards, as per FAIR recommendation R1 (“Meta(data) are richly described with a plurality of accurate and relevant attributes”)
- EarthCube members should participate in professional forums and townhall meetings for open community discussions on the importance of FAIR principles.
- EarthCube, via ESSO, should provide readily accessible print, web, and personnel resources to members that support geoscientists’ efforts to promote FAIR (best practices, contact points, publications, events, FAQs, etc.). Considerable expertise and resources are available to members through EarthCube’s website [LINK] and the FAIR literature resources document [LINK][c].
- EarthCube should raise awareness of FAIR principles by publishing articles on the links between FAIR practices and geoscience research outcomes.
- EarthCube and its members should draw upon the rich diversity of EarthCube and the geosciences community as a resource for individuals when selecting organizing committees, invited speakers, and nominees for offices and special prizes.
- EarthCube should raise awareness of FAIR among early career professionals via training, outreach, and community engagement.
- EarthCube should recognize geoscientists practicing FAIR principles through activities such as hosting an EarthCube Distinguished Lecturer that promotes FAIR practices.
Opportunities for Individual Geoscientists
There are many opportunities for individual Geoscientists within their research and scholarship activities to broaden the adoption of FAIR principles in the Earth Science. EarthCube encourages the following actions:
- Participate in discussions and events relating to FAIR principles and their implementation.
- Develop a FAIR plan for your own Research program and publications. Including your field or laboratory research, all members of your research group, and all parts of your research workflow.
- Go beyond the NSF data requirements and incorporate FAIR principles in your project data management plans and project reporting.
- Give talks to colleagues and students to promote FAIR practices
- Partner with educators in your local area to promote FAIR practices
- Partner with funding agencies and their representatives to develop programs to promote FAIR practices in the research they support
- Find a trusted repository that practices (or is actively moving towards) the FAIR principles as well as other operational best practices.
The FAIR guiding principles (from https://www.go-fair.org/fair-principles/). Although they seem well-defined and prescriptive, research communities are still actively discussing and refining some of the underlying concepts and how they apply to different research disciplines. We present them as a guide to the sorts of practices that can promote access and use of data.
The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers; machine-readable metadata are essential for automatic discovery.
F1. (Meta)data are assigned a globally unique and persistent identifier
F2. Data are described with rich metadata (defined by R1 below)
F3. Metadata clearly and explicitly include the identifier of the data they describe
F4. (Meta)data are registered or indexed in a searchable resource
Once the user finds the required data, she/he needs to know how can they be accessed.
A1. (Meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 The protocol is open, free, and universally implementable
A1.2 The protocol allows for an authentication and authorization procedure, where necessary
A2. Metadata are accessible, even when the data are no longer available
The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (Meta)data use vocabularies that follow FAIR principles
I3. (Meta)data include qualified references to other (meta)data
The ultimate goal of FAIR is to optimize the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.
R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (Meta)data are released with a clear and accessible data usage license
R1.2. (Meta)data are associated with detailed provenance
R1.3. (Meta)data meet domain-relevant community standards
Table 1. Outline of FAIR Data Principles. After FAIR Principles published at GO FAIR: https://www.go-fair.org/fair-principles/ (some descriptions abbreviated and some spellings adjusted).