We are establishing the GeoDataspace framework, a cyberinfrastructure that will assist scientists and communities to create and maintain collections of geounits that pertain to a specific research project. For example, a GeoDataspace for solid earth might accumulate a set of geounits representing different simulation runs performed by different people, with different models, and with different inputs and parameters. Once created, geounits will provide a single handle to various model-related data items and source codes, offering benefits of shareability, reusability, and reproducibility during model development, testing, and validation.
Contact: Tanu Malik (tanum at ci.uchicago.edu)
Sharing and repeating geoscience applications is crucial for verifying claims, reproducing model/simulation runs, and promoting reuse of complex geoscience model applications. However, geoscientists lack effective mechanisms that enable easy sharing and efficient repeatability. It is not unusual for geoscientists to spend vast amounts of time and effort to capture, manage and organize the various data elements that a typical geoscience application or model requires to operate: the input files, processing and manipulation scripts, manifests, or databases that must be assembled and organized appropriately for the model to function correctly. The goal of the GeoDataspace project is to make it easy for geoscientists to share and repeat their geoscience model applications in an efficient and effective way. The GeoDataspace system captures models and data in an integrated way, encapsulates them as a single shareable package, and allows the user to share/publish the package for wider community use or self-preserve it for further analysis.
Sharing, repeating, and reproducing geoscience applications and models is inherent to every geoscience domain. Currently, we have engaged with geoscientists in four domains: (i) Solid Earth, (ii) Hydrology, (iii) Space Science, and (iv) Surface Dynamics, who have a critical need for using the GeoDataspace system to create shareable geounits of their respective modeling software. In Solid Earth, geounits are being created of the GPlates software, which ingests mantle seismic images obtained from the EarthScope project, stored in the IRIS Data Management Center, and merges the mantle convection modeling code (from the Computational Infrastructure for Geodynamics, CitcomS) with data assimilation software. In Hydorology, geounits are being established for the data–processing pipeline of the iRODS-enabled VIC model. In Surface Dynamics we are engaging with CSDMS to create shareable packages of coupled models, and in Space Science, we plan to track data flows in models developed by the CCMC Center at NASA Goodard Space Flight Center. In addition we are engaging with geoscientists at the governance level, the “big science” partners at NCAR and with single investigators at the “long tail of science” to understand their needs for sharing, repeating and reproducing scientific modeling software.
Creating and establishing a geounit involves: (i) capturing, (ii) encapsulating, and (iii) cataloging. In the capture phase all the required data, processes, and environment variables are tracked and recorded comprehensively. In the encapsulate phase, the system creates a shareable package of the various recorded entities along with reference executions of how the data was consumed by different process and which data was generated. In the catalog phase, sufficient metadata from the data files is added to the shareable package for publishing the package and disseminating it to the wider community.
GeoDataspace is powered by Globus services, which provides a set of cloud-based services for scientific data management. Globus endpoints exist on more than 8000 nodes, and Globus drives the operation of end system software from the cloud, avoiding many end-user complexities of configuration and installation.