View the ODSIP Forum

Specifying and Implementing ODSIP, A Data-Services-Invocation Protocol

Science Challenges

The impedance induced by remote data usage is growing as geoscience becomes more data-intensive and multidisciplinary. This derives primarily from three factors.

  1. Distance between data sources and users introduces latency and Internet reliance. Trans-discipline and international studies often entail large separations.
  2. Growing data volumes exacerbate distance factors, even to the extent that data-to-computer latencies can make certain studies impractical.
  3. Variations in form and format have long been obstacles data use between scientists, and the multidiscipline character of modern science heightens the impedance. This is compounded by distance, which can create language barriers and/or decrease the likelihood of finding data expertise down the hall

Technologies to surmount the above are themselves limited by the same distance and volume factors: transforming a dataset into something useful (i.e., addressing variations in form) may entail so much processing that it is practical only in close proximity to the data source (where latencies are minimal); even subset-selection (to reduce data-transfer volumes) may entail similar computation. However, data providers often do not consider processing to fall within their missions.


Technical Approach

Trends toward data-access as a (Web) service have already yielded progress in overcoming the above challenges. In particular, use of whole-file data-transfer (FTP, e.g.) is being supplanted by services (such as the Data Access Protocol or DAP) that offer certain filtering (i.e., subset-selection) services as an integral part of data-access.

ODSIP extends this concept to fully embrace the invocation of new server functions within the DAP framework. In other words, this project stretches the notion of data-as-service to prototype several pre-retrieval operations that lower the impedance of remote data usage. Naturally, the focus is on operations that reduce the volumes of data transferred to users.

Along with ODSIP capabilities for calculating variables not present in the source data, such as statistical summaries, an important set of server functions will be prototyped to enrich the DAP tools for selecting subsets from the source (or the calculated) data.

The ODSIP outcome will include a nascent algebra of pre-retrieval operations prototyped for use in the scientific contexts discussed below. These prototypes are expected to show major reductions in data-transfer demands and hence significantly increased usability of important remote data.


Science Drivers

Among the many areas of Geoscience study impeded by remote data-usage challenges, three have been selected as drivers for the ODSIP project. The investigator team includes scientists active in these fields, and the project efforts will yield prototype server functions targeted to each of their needs. In turn, the needs of these investigators embody sufficient mathematical breadth to indicate wide applicability of the ODSIP concept, once success has been achieved with the prototypes.

The three prototypes will be developed for:

  1. Climate-Model Downscaling — Joining climate runs with weather-model ensembles, estimate the probabilities of fine-scale events critical to native-Hawaiian well being.
  2. Storm-Surge Prediction — From huge (triangular-mesh) models, give officials in North Carolina simplified info to help anticipate/prepare for coastal emergencies.
  3. Analysis/Synthesis of Sea-Surface-Temperature (SST) Fronts — Using data from dissimilar satellites, estimate front locations and retrieve proximate imagery.


Benefits to Scientists

ODSIP is meant to help scientists undertake previously impractical studies, reducing—by orders of magnitude—the volumes of data that must be moved from source repositories to end-user premises. By embedding these new functions in DAP systems well established as secure and effective, an ODSIP premise is that data providers will welcome the prospect of offering pre-retrieval computation in exchange for reduced data transmission.