Finalized:Wednesday, May 27, 2015
Author(s):Praveen Kumar, Mostafa Elag, Luigi Marini, Rui Liu, Pieshi Jiang, Scott Peckham, Leslie Hsu
Often, scientists and small research groups collect data that are targeted to address specific scientific issues and have limited geographic or temporal range. However, a large number of such collections together constitute a large database that is of immense value to the Earth Sciences disciplines. Complexity of reusing these data collections encompass heterogeneity in dimensions, coordinate systems, scales, variables, providers, users and scientific contexts. These data have been defined as long-tail data. Similarly, we use “long-tail models” to characterize a heterogeneous collection of models and/or modules developed for targeted problems by individuals and small groups, which together provide a large valuable collection. Complexity of linking these models in a workflow incorporate differing variable names and units for the same concept, run at different time steps and spatial resolution, use differing naming and reference conventions, etc. Ability to integrate “long-tail” models and “long-tail” data across the geoscience fields will provide a transformative opportunity for the interoperability and reusability of communities’ resources, where not only models can be combined in a workflow, but each model will be able to discover and (re)use data in application specific context of space, time and scientific questions. This capability is essential to represent, understand, predict, and manage heterogeneous and interconnected Earth system processes and activities by harnessing the complex, heterogeneous, and extensive set of distributed resources. Because of the staggering production rate of long-tail models and data resulting from the continued advances in computational, sensing, and information technologies, an important challenge arises: how can geoinformatics science bring together all these resources seamlessly, given the inherent complexity among model and data resources that span across various geoscience domains. Here, we will present a semantic-based framework to support “long-tail” models and data integration. The framework builds on existing technologies including: (i) SEAD (Sustainable Environmental Actionable Data -http://sead-data.net/), which supports curation and preservation of long-tail data during its life-cycle; (ii) CSDMS (Community Surface Dynamics Modeling System - http://csdms.colorado.edu/wiki/Main_Page), which “componentizes” models by providing plug-and-play environment for models integration. In addition, we will describe methods of integration with three ongoing EarthCube initiatives that focus on increasing the interoperability among models and data: GeoSoft, Earth System Bridge, and Sediment Experiment Network (SEN).
Poster presented at the 2015 All Hands Meeting.
Kumar, P., Elag, M., Marini, L., Liu, R., Jiang, P., Peckham, S., Hsu, L. (2015), A Geo-Semantic Framework for Integrating Long-Tail Data and Models. Presented at EarthCube All Hands Meeting, Washington, DC, 27-29 May 2015. http://earthcube.org/document/2015/Geo-Semantic-FrameworkThis material is based upon work supported by the National Science Foundation under Grant No. ICER 1440315. Opinions, findings, conclusions or recommendations expressed are those of the authors and do not reflect the views of the NSF.