Making data discovery more efficient, comprehensive and user-friendly is a critical challenge articulated by geoscientists as well as researchers in other fields. While information discovery portals and search engines have been developed for many data repositories, and systems that simultaneously search multiple resources have been created, cross-disciplinary data discovery remains a serious issue. It becomes especially acute with rapid increases in the volume and diversity of observations, reflecting different components of the Earth System and collected by multiple research groups, government organizations and commercial companies. The main goal of the EarthCube Data Discovery Hub project is to greatly reduce the time and effort necessary to locate and evaluate geoscience information resources across disciplines, and increase the value of investment in data generation by promoting data reuse and reducing duplication of effort. Project outcomes will benefit a wide group of scientists by providing them a user-friendly and powerful gateway to information resources across multiple data facilities and community contributions, and mechanisms for improving the system to answer their research queries in a consistent manner. The project will also benefit geoscience research in several ecosystems that are being used as examples to test the data tools being developed, including rivers, coral reefs and other marine ecosystems, and the critical zone where rock, soil, water, air and living organisms interact.

The EarthCube Data Discovery Hub will be developed as a comprehensive data discovery and content enhancement system, which will leverage improved and community-curated metadata descriptions and integrate previously unregistered information sources. The project will further extend, improve and operationalize the inventory catalog developed in an earlier CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability) project, which currently includes over 2 million metadata documents from multiple sources. The key technological innovations include: pioneering the development of an automated cross-domain metadata augmentation and curation pipeline enabled by a large integrated geoscience ontology; mechanisms for ?deep registration? of geoscience data from different sources based on a novel data type registry; an online use case management system; and a methodology for processing several types of complex geoscience queries that cannot be answered by existing systems. In addition, the project will support scientific progress in several representative cross-disciplinary research scenarios, using the contexts of river geochemistry, coral reef and other marine ecosystem analysis, and critical zone science. The project will implement innovative community engagement mechanisms, including community annotation of automatically curated metadata, iterative improvement of geoscience ontology based on community feedback, and joint development of cross-disciplinary use cases semantically aligned with data descriptions.