Driven by data-rich use cases that span geodesy, geodynamics, seismology, and ecohydrology, the BALTO project will enable brokered access to diverse geoscience data, including data that have been collected/organized by individual scientists in novel or unusual forms, also known as “long-tail” datasets.
In BALTO, “brokering” means Web services that match diverse data-usage needs with heterogeneous types of source-data. This matching addresses form and semantics, which includes protocols, data structures, encodings, units of measure, variable names, and sampling meshes. The BALTO broker will employ an extensible hub-and-spoke architecture: its hub will combine well-established, open-source, data-as-service software (from OPeNDAP) with the Geoscience Standard Naming (GSN) ontology to establish canonical representations for brokered datasets; each spoke—called an accessor—comprises (source-specific) data-access software along with metadata mappings that yield GSN-compliant variable names.
Accessors for widely-used datasets (including many from NASA, NOAA, USDA, IRIS and UNAVCO) will require modest adaptations of existing software and will be maintained in a core suite of supported software. In contrast, BALTO will foster community engagement—through workshops and courses—in developing and supporting accessors for long-tail datasets. Key examples of long-tail accessors will be demonstrated in the three BALTO use cases, each of which requires research-study data to be combined and aligned with datasets from operational collections.