The measurement of the biological inventory of proteins within an organism, known as proteomics, has emerged as an important new biological methodology within the past decade. When this new technology is applied to environmental communities of microbes, typically called metaproteomics for its inclusion of a biological community, it has shown potential to significantly improve the understanding of ocean ecology and biogeochemistry by allowing a broad diagnosis of ecosystems across space and time. In this manner the measurement of proteins and the enzymes that catalyze throughout the ocean basins has great potential as a tool for ocean scientists interested in the chemistry and biology of the oceans. Yet being a relatively new data type, based in mass spectra rather than DNA sequence, proteomic datasets have their own specific informatics complexities. Currently these datasets are not easily accessed by the broader biological and chemical oceanographic communities. The researchers will develop an Ocean Protein Portal that will enable non-expert users to interrogate these large and complex ocean protein datasets.

The ability to connect protein distributions in the oceans, with their implied chemical functionality, with chemical and biological ocean datasets has the potential to enable a broad array of microbial and biogeochemical discovery.The proposed cyberinfrastructure will benefit the broader biological and chemical oceanographic communities through making microbial protein data widely accessible and enabling connections between biogeochemical data and the enzymes that catalyze their reactions. This foundational infrastructure would be accessible to life scientists from other (non-ocean) domains as well. The portal will also contribute to the interpretation of GEOTRACES trace metal and isotope ocean full depth ocean via the incorporation of future protein datasets and linkages to the chemical datasets in BCO-DMO. This project will provide BCO-DMO with an opportunity to research and employ new techniques for efficient data mining combined with semantic technologies that will enable better data discovery and access for its community. In addition, working closely with the proteomics community will allow BCO-DMO data managers to gain working experience with this emerging data type.Specific goals include: 1)

creating text-based and sequence-based search capability of processed protein datasets, 2) creating a geospatial global map visualization as well as table output of the query protein?s occurrence in the oceans, 3) providing an Ocean Data View compatible table export of geospatial and temporal distributions of the queried protein, 4) providing an analytic capability to answer the question the taxonomic origin of the protein components, building on our METATRYP Python software and including a lowest common ancestor analysis for each peptide component of the queried protein sequence, 5) creating linkages between protein datasets and relevant environmental datasets within BCO-DMO, and 6) creating a repository for processed and raw ocean protein data.