CYBERINFRASTRUCTURE JARGON

Containers: Ever tried to run someone else’s software tool, and found that you don’t have the right operating system or code libraries? Containers are virtual computing environments that hold an operating system and related process code to provide consistent execution of software tools. This supports both replication and reproducibility of analytical processes. Containers can be stored, shared, and re-used (e.g. Docker); or spun-up to run “on the fly” (e.g. Kubernetes).

FAIR Data: FAIR stands for “Findable - Accessible - Interoperable - Reusable,” and the FAIR Data Principles frame actions to create FAIR data, such as the use of persistent identifiers, application of standards for describing data, and making licensing or reuse requirements open and transparent (DOI: 10.1038/sdata.2016.18). FAIR data are interoperable when machines can identify relationships to other datasets or tools, and carry out functions (such as merging datasets).

Notebooks (or computational notebooks) are digital documents that include human readable prose and computer code designed to interact with data sources and execute processing and analytic tasks. They are used to create better scientific documentation, improve reproducibility, and to share methods; existing notebook types include Jupyter and R Studio. The Geoscience Paper of the Future (DOI: 10.1002/2015EA000136) recommends the use of notebooks.

Schema.org is a lightweight standard for describing data on a web page to support better web searches. Schema.org can be implemented (encoded) using several markup tools such as JSON-LD or RDFa. Both GeoCODES and the Google dataset search are based on schema.org. Science-on-schema.org (/github.com/ESIPFed/science-on-schema.org) is a description of best practices for publishing scientific data using schema.org.

Science gateways provide online access to shared, integrated cyberinfrastructure resources for specific scientific and educational communities. These resources can include data, tools, collaboration aids, high performance computing, and guidance.

Workflows: Scientific workflow tools help scientists describe and share steps in a scientific process, usually with respect to data analysis. They are used to improve consistency, reduce manual work, assist in training, and share processes with others. The EarthCube ASSET project (https://www.asset-project.info/) developed a tool which can capture both computational and physical steps in a workflow; notebooks are also increasingly used to capture computational workflows.