Cyberinfrastructure Jargon

CYBERINFRASTRUCTURE JARGON

API: An Application Programming Interface allows machines to talk to machines. Instead, or in addition to, having an interface for a person to use a tool or retrieve data from a portal, it provides a structured, defined way for other software to use that resource. APIs allow for the development of distributed ecosystems of interacting tools and other digital resources.

Containers: Ever tried to run someone else’s software tool, and found that you don’t have the right operating system or code libraries? Containers are virtual computing environments that hold an operating system and related process code to provide consistent execution of software tools. This supports both replication and reproducibility of analytical processes. Containers can be stored, shared, and re-used (e.g. Docker); or spun-up to run “on the fly” (e.g. Kubernetes).

FAIR Data: FAIR stands for “Findable - Accessible - Interoperable - Reusable,” and the FAIR Data Principles frame actions to create FAIR data, such as the use of persistent identifiers, application of standards for describing data, and making licensing or reuse requirements open and transparent (DOI: 10.1038/sdata.2016.18). FAIR data are interoperable when machines can identify relationships to other datasets or tools, and carry out functions (such as merging datasets).

HPC: High Performance Computing aggregates computing power using supercomputers and computer clusters to deliver much higher performance than can be found in a typical desktop computer in order to solve large scale problems. HPC systems process huge volumes of data, run complex models or perform intensive analyses at very high speeds.

JSON: JavaScript Object Notation is an open format that uses structured text to store and transmit data, readable by humans or machines. It is text-based and uses labels or “keys” to define content; tags can be defined ad hoc, or more specifically using a “JSON schema”. It is widely used because it is simple, flexible, and compact. JSON files have the filename extension .json.

Machine Learning: is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms build a model based on sample or training data to illuminate patterns, make predictions or decisions. Machine Learning algorithms are used in a variety of applications; for example, the EarthCube project ICEBERG uses ML to identify and count seals and penguins in satellite photos. Other applications include face recognition, speech to text transformation, and text analysis.

Notebooks (or computational notebooks) are digital documents that include human readable prose and computer code designed to interact with data sources and execute processing and analytic tasks. They are used to create better scientific documentation, improve reproducibility, and to share methods; existing notebook types include Jupyter and R Studio. The Geoscience Paper of the Future (DOI: 10.1002/2015EA000136) recommends the use of notebooks.

Schema.org is a lightweight standard for describing data on a web page to support better web searches. Schema.org can be implemented (encoded) using several markup tools such as JSON-LD or RDFa. Both GeoCODES and the Google dataset search are based on schema.org. Science-on-schema.org (/github.com/ESIPFed/science-on-schema.org) is a description of best practices for publishing scientific data using schema.org.

Science gateways provide online access to shared, integrated cyberinfrastructure resources for specific scientific and educational communities. These resources can include data, tools, collaboration aids, high performance computing, and guidance.

Workflows: Scientific workflow tools help scientists describe and share steps in a scientific process, usually with respect to data analysis. They are used to improve consistency, reduce manual work, assist in training, and share processes with others. The EarthCube ASSET project (https://www.asset-project.info/) developed a tool which can capture both computational and physical steps in a workflow; notebooks are also increasingly used to capture computational workflows.

XML: Extensible Markup Language is an open format that uses structured human readable text to store and transmit data, readable by humans or machines. , It is text-based and uses labels or “tags” to define content; tags are defined specifically for the need using an “XML schema”. JSON and XML can be used in similar ways; XML is more verbose, and more syntactically restrictive than JSON.