Funded Projects

Project Titlesort descending PI and Co-PI Group Page | NSF Abstract
A Broker Framework for Next Generation Geoscience (BCube)
SiriJodha Khalsa, National Snow and Ice Data Center
Stefano Nativi, Ruth Duerr, Jay Pearlman, Francoise Pearlman

To address complex Earth system issues such as climate change and water resources, geoscientists must work across disciplinary boundaries, which requires them to access data outside of their fields. This award brings together an internationally recognized team of geo- and social-scientists, cyberinfrastructure experts and educators to explore how expert systems can mediate interactions and improve access between scientific fields. The initial focus is on hydrology, oceans, polar and weather, with the intent to make the technology applicable and available to all the geosciences. The team’s social scientists and educators will research how technology can improve knowledge exchange between scientific communities.

BCube

Abstract
A Cognitive Computer Infrastructure for Geoscience
Christopher Re, Stanford University
Miron Livny, Shanan Peters

Today, access to information is often less of a problem than our ability to discover, process and use it.  Geoscience currently lacks a sustainable cyberinfrastructure that can efficiently and with high precision and accuracy find, extract, and organize data that are critical to advancing many areas of science and leveraging current and past investments in data acquisition.  We are developing a geoscience-oriented trained computing system, powered by world-class high throughput computing infrastructure, that will serve as a cross-disciplinary tool for finding, extracting, and organizing “dark data” from the text, tables, and figures of hundreds of thousands of documents.  Early results indicate that our system can perform tasks of data identification and extraction reliably and at a fraction of the time and cost of humans.  We hope to produce a dependable EarthCube building block that can serve many scientific communities.

Cognitive Computer Infrastructure

Abstract
A Geo-Semantic Framework for Integrating Long-Tail Data and Models
Praveen Kumar, University of Illinois at Urbana-Champaign
Scott Peckham, Leslie Hu

This Building Blocks project offers a unique and transformative approach to integrate existing and emerging long-tail model and data resources. The goal of this research is to develop a framework rooted in semantic techniques and approaches to support long-tail models and data integration. The vision is to develop a decentralized knowledge-based platform that can be easily adopted across geoscience communities comprising of individual and small group researchers. This project will focus on integrating two long-tail resources, the Community Surface Dynamic Modeling System (CSDMS) and Sustainable Environment Actionable Data (SEAD) as an example of closing the loop from model queries back to data sources.  In addition, the concepts and tools developed will be developed to allow automated coupling of models and data coming from different contributors.

Geosemantics

Abstract
A Scalable Community Driven Architecture
Stanislav Djorgovski, California Institute of Technology
Daniel Pilone

The purpose of this project is to create a concept that incorporates findings and requirements from ongoing EarthCube activities as well as other cross-agency Earth Science informatics efforts. The architecture models will be a roadmap for building an extensible and sustainable EarthCube system that facilitates new science and inspires substantive participation of a broad spectrum of geoscientists. The project team is led by a group of computer scientists from National Aeronautics and Space Administration (NASA) and private industry that has extensive experience and a proven track record leading the architecture, design, and development of complex and data intensive science data systems. They will specifically focus on the development of a guiding architecture report and specification. The results of this EarthCube architecture study can intellectually contribute to other scientific and agency efforts as they are studying new architectural models to address scientific data management and discovery in the big data era.

Scalable Community Driven Architecture

Abstract
Advancing Biogeoscience Community Standards and Cyberinfrastructure via Critical Zone Domain Engagement in Synthesis Science
Emma Aronson, UC Riverside
Emilio Mayorga, Aaron Packman

This project lays the groundwork for the whole-earth analysis and simulation capability envisioned through EarthCube, by bringing critical zone scientists together with hands-on training to test available cyberinfrastructure tools with comprehensive multiparameter datasets spanning a wide range of scales. The project will further improve access to the products of critical zone research by promoting the sharing, standardization, synthesis and analysis of biogeochemical and metagenomic data via EarthCube cyberinfrastructure, enabling a broader array of geosciences communities to shape future EarthCube activities and outcomes. Project findings and products will be used to inform future EarthCube development through the activity of the PIs and collaborators in the EarthCube Science Committee and Technology and Architecture Committee, as well as through the broader engagement of critical zone scientists into EarthCube activities.                

Advancing Biogeoscience Community Standards and Cyberinfrastructure via Critical Zone Domain Engagement in Synthesis Science

Abstract
Advancing netCDF-CF for the Geoscience Community
Ethan Davis, UCAR
Charles Zender, Aleksandar Jelenak, David Arctur, Nicholas Bond

The Climate and Forecast (CF) metadata conventions for netCDF (netCDF-CF) are currently used widely by weather forecasters, climate scientists, and remote-sensing researchers to include metadata along with scientific data, and numerous open source and commercial software tools have been developed to explore and analyze data sets that use the conventions. This project will work to extend the existing metadata conventions in ways that will broaden the range of earth science domains whose data can be represented. By broadening the applicability of an established convention, this effort seeks to extend its benefits to related earth science domains and reduce the amount of effort scientists must expend decoding and reformatting datasets created by other research groups. This, in turn, will leave researchers more time for analysis, leading to a better understanding of the physical processes the data describe.

Advancing netCDF-CF

Abstract
An EarthCube Oceanography and Geobiology Environment Omics Research Coordination Network (ECOGEO RCN)
Edward DeLong, University of Hawaii

The network elements of the ECOGEO Research Coordination Network reach across diverse communities of ocean scientists, geoscientists, computer scientists and bioinformaticians, and will forge new cross-disciplinary connections. Although the scientific goals of the proposed ECOGEO community network are diverse, common challenges and requirements for big data analyses are identified across this broad community. The ECOGEO RCN will establish a virtual network with a goal of coordinating communication and collaboration in omics, data sharing and analysis as well as providing cyber-training for upcoming ocean and geoscience students and young professionals. A large part of the work within ECOGEO will focus on the identification of data standards, ontologies and methods or sharing data, and development of scenarios to inspire the next generation of researchers.

Oceanography and Geobiology Environmental 'Omics

Abstract
An Expanded Implementation of Cloud-Hosted Real-Time Data Services for the Geosciences (CHORDS)
Michael Daniels
V. Chandrasekar, D. Sarah Stamps, Branko Kerkez, Sara Graves

CHORDS makes it simple for small research teams to make their real-time data available to the research community in standard formats. By following a few simple steps, a CHORDS portal can be created for a research team and their data can be streamed into it very easily. CHORDS can also be used to access data from large NSF research platforms like radars, aircraft, ships and operational networks of sophisticated instrumentation.

CHORDS

Abstract
Bringing EarthCube to the Science User
Karen Stocks, UC San Diego
Stephen Diggs

The SeaView project is building connections between the rich data resources in five major oceanographic data facilities - R2R, BCO-DMO, CCHDO, OBIS and OOI* - and an ocean data visualization and exploration tool, the Ocean Data View (ODV), having over 40,000 registered users. To do so, it is leveraging from NSF investments into two previous EarthCube projects: CINERGI, as the registry where users will find and access data they wish to use, and GeoLink, which is providing interconnections between existing data resources, allowing users to explore a knowledge base of related information. SeaView leverages existing resources to provide improved tools and data access in support of fundamental research in two areas of ocean sciences: deep water hydrography and marine community-environment interactions. Both feed directly into research identified by the 2015 Decadal Survey of Ocean Sciences as of the highest priority. It will also engage scientists from currently underrepresented communities in EarthCube.

SeaView

Abstract
Building a Sediment Experimentalist Network (SEN)
Wonsuck Kim, University of Texas at Austin
Leslie Hsu, Brandon McElroy

This RCN will form a Sediment Experimentalist Network (SEN) to help integrate the efforts of sediment experimentalists and build a knowledge base for guidance on best practices for data collection and management. The project will also facilitate cross-institutional collaborative experiments and communicate with and educate the research community about data and metadata standards for sediment-based experiments. This effort aims to improve the efficiency and transparency of sedimentary research for field geologists and modelers as well as experimentalists. Major outcomes will be the creation of a knowledge base, coordination of experimental collaboratories, and integration of educational efforts and data standards development with tools for propagating new technology and methods.

SEN

Abstract
Building Interoperable Cyberinfrastructure (CI) at the Interface Betweeen Paleogeoinformatics and Bioinformatics
Mark Uhen, George Mason University
Edward Davis, John Williams, Russell Graham, Jessica Blois, Alison Smith

This project brings together six paleobiological databases so that they share a single set of Internet-based commands by which researchers and the public can easily access fossil records from all of Earth history. By coordinating with other emerging efforts in geological and biological data sharing, best practices, and protocols, we ensure that data will be freely available to all, enabling new scientific syntheses and discovery, more powerful educational opportunities, and general exploration of the history of life on Earth.  This project establishes a Paleobiological Data Consortium, consisting of leaders of cyberinfrastructure resources in the paleobiosciences and allied disciplines, with the goal of sharing best practices and protocols among the geoinformatic and bioinformatic communities.                

Earth-Life Consortium (ELC)

Abstract
C4P: Collaboration and Cyberinfrastructure for Paleogeosciences
Kerstin Lehnert, Columbia University
Christopher Jenkins, Mark Uhen, John Williams

This project establishes and operates the EarthCube Research Coordination Network (RCN) Collaboration and Cyberinfrastructure for Paleogeosciences to advance the role of cyberinfrastructure in unraveling large-scale, long-term evolution of the Earth-life system through the study of the geological record. This RCN intends to foster collaboration among paleogeoscientists, paleobiologists, bioinformaticists, stratigraphers, geochronologists, geographers, data scientists, and computer scientists with an aim to dramatically improve the application of modern data management approaches, data mining technologies, and computational methods to better analyze data within the paleogeosciences and other domains and disciplines.

C4P

Abstract
Cloud-Hosted Real-Time Data Services for the Geosciences (CHORDS)
Michael Daniels, UCAR
V. Chandrasekar, Sara Graves, Branko Kerkez, Frank Vernon

The Cloud-Hosted Real-time Data Services for the Geosciences (CHORDS), a real-time data management infrastructure will provide a system to archive, navigate and distribute real-time data streams via the Internet, and employ data and metadata formats that adhere to standards, which simplify the user experience.

CHORDS

Abstract
Community Inventory of EarthCube Resources for Geosciences Interoperability (CINERGI)
Ilya Zaslavsky, San Diego Supercomputer Center

This Building Blocks project focuses on constructing a community inventory and knowledge base on geoscience information resources to meet the challenge of finding resources across disciplines, assessing their fitness for use in specific research scenarios, and providing tools for integrating and re-using data from multiple domains. The project team envisions a comprehensive system linking geoscience resources, users, publications, usage information, and cyberinfrastructure components. This system would serve geoscientists across all domains to efficiently use existing and emerging resources for productive and transformative research.

CINERGI

Abstract
Coral REef Science and CYberinfrastructure NeTwork (CReSCyNT)
Ruth Gates, University of Hawaii

This project develops the Coral Reef Science & Cyberinfrastructure Network (CReSCyNT). The coral reef community has exceptionally diverse data structures and analysis requirements necessary to forward integrative science. It is therefore an exemplar for cyberinfrastructure-enabled advances to other geosciences communities. CReSCyNT will develop a network of disciplinary and technology nodes focused on coral reef science and extending to the broader ocean and geoscience communities. The objectives of the RCN are to identify the needs, best practices and challenges facing the coral reef domain and how those might be translatable to the broader geoscience community.

CRESCYNT: Coral Reef Science and Cyberinfrastructure Network

Abstract
Cross Domain Observational Metadata Environmental Sensing Network (X-DOMES)
Janet Fredericks, Woods Hole Oceanographic Institution
Michael Botts, Carlos Rueda, Felimon Gayanilo, John Graybeal, Krzysztof Janowicz

Working with environmental sensor manufacturers and researchers, the X-DOMES project will develop tools and social and technical infrastructure to facilitate the creation of data about data (metadata). Metadata describes not only who, when and where the observations were made, but also it must document how an observation came to be (provenance). By taking this knowledge out of manuals and human-readable documents, the X-DOMES model creates metadata that can be treated like data -- discoverable and searchable, making it ready to be incorporated into automated archival and processing for quality assurance and validation methods.  The X-DOMES pilot project will provide a suite of tools, built upon community-adopted standards of the Open Geospatial Consortium (OGC) and World Wide Web Consortium (W3C) to demonstrate and facilitate the generation of documents that are discoverable and accessible on-line and/or directly from onboard sensor descriptions. The project will also demonstrate mechanisms to associate the data with the metadata through standards-based web services.

X-DOMES

Abstract
Abstract
CyberConnector: Bridging the Earth Observations and Earth Science Modeling for Supporting Model Validation, Verification, and Inter-comparison
Liping Di, George Mason University

This Building Blocks project will 1) significantly increase research productivity in the Earth science modeling community, 2) enable the effective use of the existing Sensor Web data and Earth Observations through open Web interfaces and metadata standards, 3) foster collaborations among Earth system modelers, geospatial information scientists, and information technologists, and 4) enhance infrastructure for Earth science research and education. CyberConnector is intended to facilitate automatic preparation and feeding of customized data and derived products into Earth science models.  This project seeks to make model creation and inter-comparison significantly easier, by starting with available data and expanding the CyberConnector infrastructure to many different Earth models.

CyberConnector: Bridging the Earth Observations and Earth Science Modeling for Sup-porting Model Validation, Verification, and Inter-comparison

Abstract
Deploying Web Services across Multiple Geoscience Domains
Tim Ahern, Incorporated Research Institutions for Seismology
Michael Gurnis, Mohan Ramamurthy, Suzanne Carbotte, Ilya Zaslavsky

This Building Blocks project intends to extend the promotion of simple web services to simplify the task of discovering, accessing and using data from multiple sources. Investigators will promote the use of this system to manage data from the long tail of science and make it discoverable, removing it from the domain of "dark data". The project will extend its approach of exposing data sets through web services to those managed by non-NSF data centers both within the United States as well as international data sets by providing resources to stand up web services to expose the data holdings of other centers. This building block is an effort to engage EarthCube cyberinfrastructure in developing, establishing and adopting international standards to allow geoscientists to focus on science and increase productivity.

GeoWS - Geoscience Web Services

Abstract
Developing a Data-Oriented Human-Centric Enterprise Architecture for EarthCube
Chaowei Yang, George Mason University
Chen Xu

This EarthCube conceptual design project is to develop a data-oriented and human-centric EarthCube enterprise architecture for achieving the goal of EarthCube as a community-driven activity to transform the conduct of geoscience research and education. The proposed EarthCube enterprise architecture will have geoscientists and domain experts at its center and facilitate them to communicate and collaborate through data sharing, and ultimately bring geosciences forward in a holistic fashion. This project seeks to design a conceptual architecture that can bring geoscientists, computing scientists, and social scientists together to collaborate on networks of data, technology, applications, business models, and stakeholders.

Data-Oriented & Human-Centric

Abstract
Development of an Integrated Data System for the Geological Field Sciences
J. Douglas Walker
Warren Alexander, Diane Kamola, Basil Tikoff, Marjorie Chan, Frank Spear, Allen Glazner

The project will develop a Data System for parts of the Geological Field Sciences that closely follows the existing workflows and vocabulary of the field geologist. The Data System will seamlessly incorporate the data from different sub-disciplines.

Development of an Integrated Data System for the Geological Field Sciences

Abstract
Digital Crust: An Exploratory Environment for Earth Science Research and Learning
Ying Fan Reinfelder, Rutgers University, New Brunswick
Shanan Peters, Ilya Zaslavsky

This Building Blocks project develops the Digital Crust, an online workspace that allows for the linking, access, and extraction of all data related to the Earth’s crust, with the goal of facilitating the creation of 4D data products from those linked data sets. The platform can serve as a resource to bring together geoscientists working on separate aspects of the Earth system, by bringing their data/ideas together and by providing an environment to view the Earth from different perspectives. Hosting multiple data types and sometimes conflicting interpretations and hypotheses of Earth processes will promote community discussion and debate on Earth processes that will foster interaction, collaboration, and data/idea sharing among scientists who might otherwise never have met. Digital Crust also seeks to be a resource for educators and other interested in introducing students to the Earth sciences, and to expose gaps in data-knowledge with regards to the Earth’s crust.

Digital Crust

Abstract
Digital Rocks Portal: A Sustainable Platform for Sharing, Translation, and Analysis of Volumentric Data of Porous Media
Masa Prodanovic, U Texas at Austin
Maria Esteva, Richard Ketcham

This project will continue the development of a sustainable, open and easy-to-use repository called the Digital Rocks Portal (https://pep.tacc.utexas.edu/). The portal will: a) organize the images and related experimental measurements of diverse porous materials; b) improve access to porous media analysis results for a wider community of geoscience and engineering researchers not necessarily trained in computer science or data analysis; and c) enhance productivity, scientific inquiry, and engineering decisions founded on a data-driven basis. An important contribution of the project will be the development of a business model for sustaining long-term preservation of important datasets obtained from research investments.     

Digital Rocks Portal: A Sustainable Platform for Sharing, Translation, and Analysis of Volumentric Data of Porous Media

Abstract
Earth System Bridge: Spanning Scientific Communities with Interoperable Modeling Frameworks
Scott Peckham, University of Colorado at Boulder
Gary Egbert, Cecelia DeLuca, David Gochis, Jennifer Arrigo

This EarthCube Building Blocks project will draw from significant disciplinary and interdisciplinary expertise in the development, implementation and support of geoscientific modeling architectures and in the adoption of community standards in model development and data management. This team will integrate existing model architectures, model coupling standards, and data standards into a set of open-source Earth System Bridge building blocks that will transform the process of Earth system model coupling, and bridge the present technological gap.

Earth System Bridge

Abstract
EarthCube Data Discovery Hub
Ilya Zaslavsky
Karen Stocks, Amarnath Gupta, Jeffrey Grethe, Bernhard Peucker-Ehrenbrink, Ruth Gates

The EarthCube Data Discovery Hub will be developed as a comprehensive data discovery and content enhancement system, which will leverage improved and community-curated metadata descriptions and integrate previously unregistered information sources. The project will further extend, improve and operationalize the inventory catalog developed in an earlier CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability) project, which currently includes over 2 million metadata documents from multiple sources.

EarthCube Data Discovery Hub

Abstract
EarthCube Integration and Test Environment
Sara Graves, University of Alabama in Huntsville
Chaowei Yang, Stanislav Djorgovski

The ECITE federated Integration and Test (I&T) environment has the potential to scale and extend to an NSF wide integration and test capability that can serve as a model and example for other agencies. The ECITE team will actively engage EarthCube and the wider geosciences community in definition of requirements, design, and testing of the system. ECITE will consist of a seamless federated system of scalable and location independent distributed computational resources (nodes) across the US. The nodes will provide compute and storage resources requiring minimal system administration. ECITE is an important step in ensuring that EarthCube components will be able to work together to provide a successful framework that can continue to evolve to meet the needs of the geosciences community. This research addresses timely issues of integration, test and evaluation methodologies and best practices with a strong interoperability theme to advance disciplinary research through the integration of diverse and heterogeneous data, algorithms, systems and sciences. The results and findings will provide guidance for EarthCube evolution and future integration efforts. Access to the resulting platform will enable and encourage the EarthCube community to develop prototypes, try out new technologies, and to share ideas, concepts and experiments                

EarthCube Integration and Testing Environment (ECITE)

Abstract
EC3: Earth-Centered Communication for Cyberinfrastructure - Challenges of Field Data Collection, Management, and Integration
Matty Mookerjee, Sonoma State University
Thomas Shipley, Basil Tikoff, Amy Ellwein, Jim Bowring

This RCN will facilitate digitization of geological field data in an effort to develop communication between the cyberinfrastructure community and those involved in field-based, solid earth geoscience. Researchers will take steps to document what exists currently for field data collection; assemble a community for discussing and exploring field data collection issues, specifically targeting young investigators; motivate distinct communities to work together on common issues associated with digitization; and evaluate what is missing in the creation of open and accessible data. To facilitate this, the RCN will conduct a series of workshops and townhalls at national meetings (GSA, AGU, AAPG) to foster community awareness, catalog resources, and investigate data collection and sharing scenarios.

EC3

Abstract
Enabling Scientific Collaboration and Discovery through Semantic Connections
Matthew Mayernik, UCAR
Linda Rowan, Dean Krafft

This Building Blocks project brings together the National Center for Atmospheric Research (NCAR), UNAVCO, and Cornell University to understand how to improve the processes of collaboration and resource sharing in the geosciences by demonstrating and encouraging the adoption of structured information systems rooted in common standards. Using two large geoscience research programs as case studies, this effort will demonstrate how semantic web and linked data technology can play an essential role in the coordination and organization of scientific virtual organizations and their products, thereby accelerating the pace of scientific discovery and innovation. Using the open source VIVO software, this project will seek to interlink information and data across platforms and research projects in an ontology-based standard data format (RDF), utilizing well developed web identifier and vocabulary structures.

EarthCollab

Abstract
Engaging the Greenland Ice Sheet Ocean (GRISO) Science Network
Fiammetta Straneo
David Sutherland, Twila Moon

The goal of this project is to substantially enhance multi-disciplinary activities and collaboration through the establishment of the GRISO (Greenland Ice Sheet Ocean) Science Network - an international, multidisciplinary, open network of scientists and cyberinfrastructure experts.

Engaging the Greenland Ice Sheet Ocean (GRISO) Science Network

Abstract
Enhancing Paleontological and Neontological Data Discovery
Dena Smith, U Colorado, Boulder
Christopher Norris, Mark Uhen, Jocelyn Sessa, Roy Nelson

This project will develop software tools to connect three established, well-supported, and critically important data sources: the Paleobiology Database (PBDB, paleontological, literature based), iDigPaleo (paleontological, specimen based) and iDigBio (neontological, specimen based). This project will allow users of any one of these databases to access and query the others at the same time, returning a much richer, combined set of data to the user. The development of this system will allow scientists to ask and answer new research questions affecting fields as diverse as biogeographic/niche modeling, systematics, functional morphology, evolutionary biology, ecology, climatology, conservation biology, oceanography, and petroleum geology. This project will fundamentally change the nature of the research questions that can be addressed by the scientific community.

Enhancing Paleontological and Neontological Data Discovery

Abstract
Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences
Ilya Zaslavsky, San Diego Supercomputer Center
Amarnath Gupta

This EarthCube conceptual design project is to generate an innovative EarthCube enterprise architecture that fosters inter-community research collaboration and data exchange across the geosciences. The main feature of this EarthCube enterprise architecture is that it enhances existing patterns of data and information exchange through the ability to evolve by factoring in the impact of maturing movements of data and collaboration. This project seeks to integrate traditional cyberinfrastructure components with other CI components that support scholarly communication, self-organization, and social networking in order to create an enterprise architecture that enables more comprehensive, data-intensive research designs and knowledge sharing within EarthCube.

Transformative Research & Collaboration

Abstract
GeoDataspace: Simplifying Data Management for Geoscience Models
Tanu Malik, University of Chicago

When developing, testing, validating, and comparing models, particularly coupled models, the number of such data elements and the complexity associated with their management soon outgrows human memory capacity. The unfortunate consequence is that researchers often narrow the scope of a model analysis, compromise research quality, or conduct analysis within restricted teams. This Building Blocks project will demonstrate a mechanism to overcome this challenge in the scientific community.  GeoDataspace seeks to develop methods of defining, sharing and accessing the collections of metadata needed to define the files used in a given computational model, and details of the model run.  It is the goal that this approach will simplify model use and enhance sharing, reuse, and the reproducibility of models, data and computation properties.

GeoDataspace

Abstract
GeoLink: Semantics and Linked Data for the Geosciences
Robert Arko, Columbia University
Pascal Hitzler, Thomas Narock, Douglas Fils, Mark Schildhauer, Cynthia Chandler

This Building Blocks project brings together significant geosciences holdings in the ocean, earth and polar sciences to demonstrate how innovative technologies can be robustly applied to these facilities to enhance the capabilities for scientists to discover and interpret relevant geoscience data and knowledge. The end product, GeoLink, will lower barriers to cross-repository data discovery and access, while respecting and preserving repository autonomy and heterogeneity.  GeoLink seeks to develop a cyberinfrastructure that allows searching and browsing of content from multiple data sources through semantic integration of digitally published “Linked Open Data.”

GeoLink

Abstract
GeoSciCloud: Deploying Multi-Facility Cyberinfrastructure in Commercial and Private Cloud-based Systems
Timothy Ahern
Chad Trabant, Charles Meertens, Frances Boler

This project will help NSF/EarthCube identify the most suitable IT environment in which the EarthCube should deploy and support shared infrastructure.

GeoSciCloud: Deploying Multi-Facility Cyberinfrastructure in Commercial and Private Cloud-based Systems

Abstract
GeoSoft: Collaborative Open Source Software Sharing for the Geosciences
Yolanda Gil, University of Southern California
Scott Peckham, Christopher Duffy, Erin Robinson

The goal of this Building Blocks project is to create a system for software stewardship in geosciences that will empower scientists to manage their software as valuable scientific assets. It will significantly improve the adoption of open data and open software initiatives, improve reproducibility, and advance scientific scholarship. Scientific software stewardship requires a combination of cyberinfrastructure, social infrastructure, and professional development infrastructure. This project seeks to facilitate the publishing, dissemination and stewardship of scientific software, with the goal of improving the adoption of open data and open software initiatives, as well as furthering scientific scholarship and collaboration.

OntoSoft

Abstract
GeoTrust: Improving Sharing and Reproducibility of Geoscience Applications
Tanu Malik
Ian Foster, Scott Peckham, David Tarboton, Anthony Castronova, Jonathan Goodall, Eunseo Choi, Asti Bhatt

GeoTrust will develop sandboxing-based systems and tools that help scientists effectively isolate computational artifacts associated with an experiment, use languages and semantics to preserve artifacts, and re-execute /reproduce experiments by deploying the artifacts, changing datasets, algorithms, models, environments, etc.

GeoTrust: Improving Sharing and Reproducibility of Geoscience Applications

Abstract
Integrated GeoScience Observatory
Asti Bhatt, SRI International
Tomoko Matsuo, J. Michael Ruohoniemi, Yolanda Gil, Tanu Malik

The Integrated Geoscience Observatory is a pilot project that creates an online platform for integrating data and associated software tools contributed by separate geoscience research communities, into a unified toolset that brings them together. The vision is to expand the individual domains of geoscience research toward study of the whole Sun-Earth system, and in so doing to uncover the system level effects critical to the habitability of planet Earth. The Observatory creates an integrated package of software tools contributed by researchers with specific capabilities, and designed to enable integration of diverse observational data. Features of the toolkit include: (A) linking diverse data sets from multiple data repositories and automatically mapping them to a common user-specified coordinate grid; (B) implementing the well-known Assimilative Mapping of Ionospheric Electrodynamics (AMIE) procedure for assimilation of this data to yield a global picture; and (C) utilization of the EarthCube building blocks GeoSoft, for communicating ontology, and GeoDataspace, for attributing credit to contributors through publication of processed data. The toolset can be accessed and used either through a web-based computing environment, or through download packages for local installation, with a nearly seamless transition between the two.                

Integrated GeoScience Observatory

Abstract
Integrating Discrete and Continuous Data
David Maidment, University of Texas at Austin
Ethan Davis, Alva Couch, Damiel Ames

This project builds upon previous work focused on integrated discovery of common information themes including precipitation in discrete data from the CUAHSI hydrologic information system and continuous data from the Unidata THREDDS data server. Investigators will advance that work by exploring the creation of new technologies for publishing and discovery of information through the Global Earth Observation System of Systems (GEOSS) Common Infrastructure, the definition of a Common Information Model for discrete and continuous data, development of shared software tools for using this Common Information Model, and extension of the concepts to similar information in the Polar, Ocean and Solid Earth Sciences.

Discrete & Continuous Data (DisConBB)

Abstract
Interdisciplinary Earth Data Alliance as a Model for Integrating EarthCube Technology Resources and Engaging the Broad Community
Kerstin Lehnert, IDEO, Columbia University
Leslie Hsu, Vicki Ferrini, Suzanne Carbotte

Using Interdisciplinary Earth Data Alliance (IEDA) as a testbed, the project will adopt elements of three EarthCube technologies that have been or are being developed by EarthCube Building Block projects: CINERGI (Community INventory of EarthCube Resources for Geoscience Interoperability), GeoWS (Geoscience Web Services), and GeoLink (Semantic Web technology), to build the architecture of the alliance, thereby testing, validating, and potentially improving these technologies. The technical work will focus on creating a flexible and scalable framework that will allow a growing number of partner systems to plug into shared capabilities such as applications for integrated data discovery and data submission and contribute their resources. Architectural changes at IEDA will go hand-in-hand with the transition of IEDA's organizational structure toward the envisioned multi-institutional alliance.                

Alliance Testbed Project (ATP)

Abstract
IS-GEO: Intelligent Systems Research to Support Geosciences
Suzanne Pierce

The EarthCube Research Coordination Network for Intelligent Systems for Geosciences (IS-GEO RCN) will catalyze collaborations to enable advances in our understanding of Earth systems through innovative applications of intelligent and information systems to fundamental geosciences problems.

IS-GEO: Intelligent Systems Research to Support Geosciences

Abstract
iSamples: The Internet of Samples in the Earth Sciences
Kerstin Lehnert, Columbia University

Across many Earth Science disciplines, research depends on the availability of representative samples collected above, at, and beneath Earth’s surface, on the moon and in space, or generated in experiments. These samples are fundamental references that are studied to generate new knowledge about the earth and the entire universe and a deeper understanding of the processes that created and shaped it, the availability of natural resources and the risk of natural hazards. The EarthCube Research Coordination Network iSamplES (Internet of Samples in the Earth Sciences) seeks to define, articulate and address the needs and challenges of maintaining, cataloguing and sharing physical samples as a critical part of the geoscience domain.  Existing resources that facilitate these goals will be investigated, and best practices and standards will be recognized and encouraged across the Earth Science community as a cyber-infrastructure foundation.

iSamples

Abstract
Laying the Groundwork for an Ocean Protein Portal
Mak Saito
Danie Kinkade

The ability to connect protein distributions in the oceans, with their implied chemical functionality, with chemical and biological ocean datasets has the potential to enable a broad array of microbial and biogeochemical discovery. The proposed cyberinfrastructure will benefit the broader biological and chemical oceanographic communities through making microbial protein data widely accessible and enabling connections between biogeochemical data and the enzymes that catalyze their reactions. This foundational infrastructure would be accessible to life scientists from other (non-ocean) domains as well. The portal will also contribute to the interpretation of GEOTRACES trace metal and isotope ocean full depth ocean via the incorporation of future protein datasets and linkages to the chemical datasets in BCO-DMO. This project will provide BCO-DMO with an opportunity to research and employ new techniques for efficient data mining combined with semantic technologies that will enable better data discovery and access for its community. In addition, working closely with the proteomics community will allow BCO-DMO data managers to gain working experience with this emerging data type.Specific goals include:

  1. creating text-based and sequence-based search capability of processed protein datasets,
  2. creating a geospatial global map visualization as well as table output of the query protein?s occurrence in the oceans,
  3. providing an Ocean Data View compatible table export of geospatial and temporal distributions of the queried protein,
  4. providing an analytic capability to answer the question the taxonomic origin of the protein components, building on our METATRYP Python software and including a lowest common ancestor analysis for each peptide component of the queried protein sequence,
  5. creating linkages between protein datasets and relevant environmental datasets within BCO-DMO, and
  6. creating a repository for processed and raw ocean protein data.
Laying the Groundwork for an Ocean Protein Portal

Abstract
Leveraging Semantics and Crowdsourcing in Data Sharing and Discovery
Thomas Narock, University of Maryland
Timothy Finn

A wide spectrum of maturing methods and tools, collectively characterized as Semantic Web Technologies, is enabling machines to complete tasks automatically that previously required human direction. For the Geosciences, Semantic Web Technologies will vastly improve the integration, analysis and dissemination of research data and results. This EAGER project will conduct exploratory research applying state-of-the-art Semantic Web Technologies to support data representation, discovery, analysis, sharing and integration of datasets from the global oceans, and related resources including meeting abstracts and library holdings.  A key contribution will be semantically enabled cyberinfrastructure components capable of automated data integration across distributed repositories.

Leveraging Semantics and Crowdsourcing in Data Sharing and Discovery

Abstract
LinkedEarth: Crowdsourcing Data Curation and Standards Development in Paleoclimatology
Julien Emile-Geay, USC
Nicholas McKay

There is currently no universal way to share paleoclimate data between users or machines, hindering integration and synthesis. The platform will be combined with editorial and community-driven processes which will result in a system that has the potential to engage a broad user base in geoscientific data curation. The proposed framework will lower barriers to participation in the geosciences, enabling more "dark data" to join the public domain using community-sanctioned protocols. The pilot project will facilitate the work of hundreds of paleoclimate scientists, accelerating scientific discovery and the dissemination of its results to society.

LinkedEarth

Abstract
Magnetosphere-Ionosphere-Atmosphere Coupling
Jesper Gjerloev, Johns Hopkins University
Gary Bust, Robin Barnes, Ethan Miller, Brian Anderson

This project will create a new, unique set of high-latitude electro-dynamic, ionospheric-thermospheric-­‐magnetospheric cyberbased tools and products that will be available to the entire geosciences community. In combination, the data products from this project will allow the derivation of a first principle electromagnetic solution for the auroral ionosphere. Project will develop a new set of data resources for the geoscience community in the form of a complete electromagnetic solution of the auroral ionosphere and will focus on developing the "Data Infrastructure for Communities" component of the EarthCube Integrative Activities. The project will thus allow access to not only the desired derived products, but also provide support for other modeling efforts by allowing access to the database of input data and intermediary products. The system will also be designed to be extensible, allowing additional data products and models to be integrated into the system. The system will fully support existing standards that are used in the broader geosciences community such as the Data Access Protocol (DAP).The research undertaken in this proposal will enable transformative research in two otherwise separated fields: magnetosphere-­‐ionosphere and neutral atmosphere. 

Magnetosphere-Ionosphere-Atmosphere Coupling

Abstract
Optimal Data Layout for Scalable Geophysical Analysis in a Data-intensive Environment
Hongfeng Yu, U Nebraska, Lincoln
Kwo-Sen Kuo

The team will work closely to develop, evaluate,and deploy the computer infrastructure to improve the performance and scalability of geophysical analysis for scientific discovery and education. By making the system available to other researchers, it will facilitate the development of new scalable solutions. Interactive geosciences applications will be used as an effective means to promote students interest in science and engineering studies, and to attract and retain students for geosciences community growth. The innovation and the basis of our technique approach are to develop an optimal data layout algorithm for indexing and placing massive heterogeneous observation data across distributed devices of a cluster. The new data layout is tailored to the spatial-temporal characteristics of Earth observation data, and can directly account for advanced compute techniques, including non-volatile storage resources and GPU- and Manycore-based computing nodes, and support high-throughput and high-resolution exploration of large-scale data. 

Optimal Data Layout for Scalable Geophysical Analysis in a Data-intensive Environment

Abstract
Polar Data Insights and Search Analytics for the Deep and Scientific Web
Chris Mattmann
Siri Jodha Khalsa, Ruth Duerr

This project develops an NSF EarthCube Building Block focused on Polar Data Science. The system will build upon work in Information Retrieval and Data Science and upon existing investment from NSF Polar, EarthCube, and from DARPA and NASA in this area. The system will collect, analyze, and make interactive the wealth of textual and scientific Polar data collected to date across the Deep web of scientific information -- scientific journals, multimedia information, scientific data, web pages, etc.

Polar Data Insights and Search Analytics for the Deep and Scientific Web

Abstract
Research Coordination Network for High-Performance Distributed Computing in the Polar Sciences
Shantenu Jha, Rutgers University
Heather Lynch, Jaroslaw Habrzyski, Lynn Yarmey

This project will support advances in computing tools and techniques that will enable the Polar Sciences Community to address significant challenges, both in the short and long-term. The impact of this project will be in the improvements in the ability to utilize advanced cyberinfrastructure and high-performance distributed computing to fundamentally alter the scale, sophistication and scope of polar science problems that will be addressed. This project will not implement those changes but will identify and lay the groundwork for such impact across the Polar Sciences. The Project will identify primary barriers to the uptake of high-performance and distributed computing and will help alleviate them through a combination of community based solutions and training. The project will also produce a roadmap detailing a credible and effective way to meet the long-term computing challenges faced by the Polar Science community and possible plans to effectively address them. This project will establish mechanisms for community engagement which include gathering technical requirements for polar cyberinfrastructure and supporting and training early career scientists and graduate students.                

Research Coordination Network for High-Performance Distributed Computing in the Polar Sciences

Abstract
Software Stewardship for the Geosciences
Yolanda Gil, University of Southern California
Chris Mattmann, Scott Peckham, Erin Robinson, Christopher Duffy

This building blocks project is to research and develop a system whereby geoscience and environmental software are generated effectively by geoscientists themselves, so that the software can be captured, curated, managed, and made available to all parties upon request. This project will begin this process by building partnerships between computer scientists, software developers, and scientists across all geoscience domains with the goal or creating a software ecosystem and a culture of software stewardship that will empower geoscientists and others to make their software accessible and manage it as a valuable scientific asset.

Software Stewardship for the Geosciences

Abstract
Specifying and Implementing ODSIP, A Data-Service Invocation Protocol
David Fulker, OPeNDAP
Mohan Ramamurthy, Steven Businger, Brian Blanton, Peter Cornillon

This EarthCube building blocks project intends to build ODSIP (Open Data Services Invocation Protocol) in order to provide an array of open specification in client/server libraries. This project also seeks to provide a system in which EarthCube can be built effectively around clients and servers that employ common and conceptually rich protocols for data acquisition.

ODSIP

Abstract
That dot is a world! Drilling down from a statistics scatterplot to pre-populated case Notebooks
Yuan Ho
Brian Mapes, Mohamed Iskandarani

This project will develop and utilize capabilities for its scientists to "drill down" into abstract statistics about the flows of the atmosphere and ocean, to build a library of Notebooks with clear views of the actual weather systems (in the atmospheric part of the work) or Gulf of Mexico ocean eddies (in the ocean part of the work).

That dot is a world! Drilling down from a statistics scatterplot to pre-populated case Notebooks

Abstract
The Power of Many: Ensemble Toolkit for Earth Sciences
Shantenu Jha
Jeroen Tromp, Matthieu Lefebvre, Guido Cervone, Michael Mann

The study of hazards and renewable energy are paramount for the development and sustainability of society. Similarly, the emergence of new climatic patterns pose new challenges for future societal planning. Geospatial data are being generated at unprecedented rate exceeding our analysis capabilities and leading towards a data-rich but knowledge-poor environment. The use of advanced computing tools and techniques are playing an increasingly important role in contributing to solutions to problems of societal importance. This project will create specialized computational tools that will enhance the ability of scientists to effectively and efficiently study natural hazards and renewable energy. The use of these tools will support novel methods and the use of powerful computing resources in ways that are not currently possible.

The Power of Many: Ensemble Toolkit for Earth Sciences

Abstract