BALTO Kick-off Meeting
2-3 October 2017
Virginia Tech Geosciences (Sarah)
Study lithosphere-Asthenosphere boundary dynamics, computational modeling and Kinematics GNSS/GPS in Madagascar
Volcano-tectonic interactions in Tanzania
5 GPS station around volcano recording real time data (GNSS/GPS)
Seismic hazards in the Caribbean, Florida, and China (post-seismic relaxation)
Investigation of long tectonic velocity field
Different components of vectors are assessed.
VT Project Description and Refinement of Project Deliverables
How deeper manter plume interacts with Geodes, Geodynamic modeling, and Seismology
Tectonic setting: Eastern Branch, East African Rift System, and GeoPRISMS primary focus site
Geodetic Observations (‘18, ‘19, ‘20): long-tail data (2.5-3 years) GNSS/GPS data from 10 stations in Kenya and Uganda.
Metadata associated with Long-tail dat is compatible with BALTO accessors and are available with UNAVCO
Implement/promote data with UNAVCO archive data
Seismic Anisotropy: long-tail data (seismic anisotropy)
Metadata associated with long-tail data is compatible with BALTO accessors and are available with IRIS
implement/promote use of IRIS metadata for seismic anisotropy standards for easy access
Combination of long-tail data and other IRIS archive data
NSF Computational Infrastructure of Geodynamics : open source, community mantle convection code ASPECT, plug-in with accessors, teach users how to utilize BALTO with ASPECT, and make publicly accessible
Virginia Tech Biological Systems Engineering (Zach)
Capture complex field processes
Model development: saturation excess and infiltration excess processes
Greenhouse gas modeling: N2O extremely sensitive to pH
Refine soil datasets to support the capture of sub-field scale processes.
Soil genesis are explained by topography
Biofilters- remove excess nitrogen from rich groundwater via bioreactors
University of Colorado Geosciences (Anne)
CU Geoscience Project and Refinement of Project Deliverables
Hikurangi Ocean Bottom Investigation of Tremor and Slow Slips (HOBITSS)
Study area: offshore Northern Hikurangi Subduction Margin
Slow slip- earthquakes that takes up to 2 weeks to happen
Interseismic coupling and slow slip vary along the Hikurangi subduction margin
Shallow slow slip happens every one to two years and lasts about 2 weeks
Seafloor geodesy using pressure gauges to show vertical deformation OBS for tremor, seismicity, and passive imaging of SSE source
Pressure sensor tells you that you have a certain water mass. If the height of water changes, the pressure of seafloor changes.
Pressure Gauge data proved that slow slip occurs to the trench (2-3 km beneath seafloor)
HOBITSS records abundant seismicity to locate earthquakes, look at tremors, along with a few others.
Land (real-time) and offshore (sensors placed and then picked up a year later) data
Tectonic tremors occur mainly after slow slip event and updip of the area of the highest SSE slip
Extensible architecture for the data server
Modular implementation: Hyrax
minimizes data transfer using 'constraint expressions' to subset remote data
Hub and spoke model for data that enables N to M translation with the minimal number of software modules
What OPeNDAP does:
coordinates between other DAP server developers
runs developer workshops and promote development, protocol enhancements, and features
NASA systems require information to get access their data
Knowledge with Java and C++
Extending the brokering capabilities of Hyrax
3.4 there are 4 types of brokering;
DAP clients talk to WCS server: Client access to data provided by WCS, WCS 1.x or WCS 2.x, WFS-moves data around, WMS-moves images around
DAP server proxy: functionality deliverable. BALTO Hyrax insurance to access DAP servers, leverage BALTO’s cataloging service
Any client, DAP/WxS server: Workshop deliverable. Build on existing workshops material in efforts to develop tutorials
Cut the WxS roxy capability on MapServer (high risk)
WCS Client and DAP Server; support from NASA’s satellite data. Most WCS clients are limited to WCS 1.x (limited data model support)
New data formats
University of Colorado CIRES (Scott)
Geoscience Standard Names Ontology
Coupling can help with naming
Variables are very important. Every data gives value to variables
Level up terms to make them understood more broadly
Built on Resource Description Format (RDF) and SPARQL
CU INSTARR Deliverables (Scott)
Halfway data is not helpful and creates holes in the system
Fill in the holes of missing data
Give guidelines to the user to prevent from leaving hole in data insertion
Broker: Possibly include forms to where individuals could potentially comment on metadata from submitters to help them.
People are not being clear in regards to their metadata, how could this be improved?
Data Model: how data is processed/stored
Zach’s use-case 1st presentation
How is water-use management evaluated
Hydrologic Resource Concept
Many defined as the coincidence of soil type and landuse
Incorporate relevant processes to understand/define HRC
Climate and weather data
Data: elevation land use soils historic daily weather forecasts climate
Weather data: when is the short time weather forecast better than using the closest weather station? (DAN)
Example in Tesuque Cr., Santa Fe, NM
Climate Data Bias Correction: empirical adjustment of the distribution of variables originating from regional climate model simulations
Sources of data in several different formats
GRIB, Text, OGC, RMarkdown
Explains soil depth and texture distributions
Take sparse data and interpolate
Questions address: water quality, nitrogen cycle,
Data requirements needed
Input data requirements; output may be valuable to others beyond his field
Is part of his project to produce output aligned with standards?
No reason that we can’t ingest that data
If time allows:
VT BSE Stream Lab to CUAHSI WDC
Drones to Archive
Generic DataLoggers Archive/Service
Cran (quality controlled centralized repository; Perl) - a central library for all R packages in contrast to Python, which is not quality controlled
SYNERGIES BETWEEN PROJECTS
Sarah: Are there similarities in data?
James: Different data than typically his group has worked with. Point data vs. coverage data. Zach is using coverage data. From a software point of view those are 2 real differences. The metadata will be common amongst all of them. Point data is possible.
Sarah: Freeform capability of OPeNDAP?
James: If data are comma separated in textfile can use Freeform too. Can also use .csv files. No programming needed to get served. Any structure. Can configure server to make any available. It would be useful from the software end to see example datafiles. Reading data from IRIS and UNAVCO. What are the protocols? Unlikely there will be any great surprises.
Did IRIS develop special additional web APIs?
OGC protocols? How important is that part of the proposal text? Zach would use OGC. Does anyone use client software that uses the OGC protocols (WFS, WCS)?
Daniel: There is kind of a mix between different protocols. Some of our use case has already been demonstrated in the BCube project. Available through WCS. Common construct/formats to be fed and read, we found netcdf did most of what we needed. Netcdf vector, time-series, or spatial layers is compatible with WCS. They worked with it because one of their use-cases was heavily interfaced with ARCGIS, which can work with netcdf. With netcdf they can inject a good portion of the metadata. Is a data format like that a good one to work with
James: Available through WCS, if using it with ArcGIS, then WCS 1.0 or XX or but not 2.0. ArcGIS can connect with DAP Servers, but it cannot read points data. Also, netcdf vector, time-series, or spatial data. Raster time series and point time series. I wouldn’t necessarily think that netcdf is the optimal way. If we were to serve in a meaningful way a netcdf time-series they would need to write a new handler.
Dan: Vector were pushing for WFS. At vector level the project ended and the broker never actually supported vector transformation from shape files to W?S. Along with requirement to have high level programmer to develop accessors, was necessary to come up with an open broker, that was the solution to build data accessors with much simpler languages. That’s where Dave and he started talking.
James: WFS: was trying to read time-series data …
Dan: They were able to work with and read from WFS, having the broker to access WFS did not happen. Currently is works with netcdf data. Is
Dan: he suggests the purpose is to be able to use shape files, soils data, all of the data files that they presented. The shape files themselves are not in OGC standard. Nope - James. They are there because there is a lot of data in that format. What needs to be brokered is … we’ll rely on OPeNDAP’s expertise. We need to point our language to broker
James: some of our data is becoming available already, others are in files. Broker will form a single access point to get data already available but also the files. Need to take a hard look at developing data handles vs. existing protocols. The thrust of BALTO is brokering. Developing handlers to access certain types of files adds a certain kind of depth. In order for it to be useful the other new software has to be updated as well. It may be a better way forward to use existing servers that already protocols. We should push toward working with web APIs and less new software to access individual file types. Protocol brokering is generally much more tricky. Developing specialized software becomes much more tricky for sustaining the project. I suspect a lot of the file formats already have good data servers. We are building a broker that can talk to many different things. UNIDATA done a great job building an opendap server (DAP protocol). Would use a DAP client to talk to that DAP servers.
In addition, when he was talking about not building our own software to access ESRI files, rather use XX server that makes ESRI shape files availble using WFS. In that case we would get those files through the ESRI files and then we broker the WFS into DAP.
Dan: This has to be We’re not handling … we’re not in the business of storing file services, pointing to data, transforming data file, and making it accessible
DAVE: Should we attempt to paint a (single) picture of all data pertinent to BALTO, a picture that tells, for each source and output dataset:
How the dataset may be accessed as a (brokered) Web service.
What the dataset contains (consistent with Scott’s ontology).
Which analysis/viz tools may be employed against the dataset.
How the dataset sheds light on or reflects results of a BALTO use case.
Why/how brokering enhances the value or utility of the dataset.
Brokering is pointing to 2 data and making it available on the site so we can grab it.
DAN: Think of it as downloading it. The difference is downloading manipulate process vs typing the files process. Splitting the hairs.
James: User doesn’t understand the underlying mechanics
Dan: Maybe implement GDAL and implement WCS 2 - James they have a server already. WCS 2 has it’s own
*Sarah: Figuring out what to produce in the first three months through synergy and what is feasible at this point. VT can code metadata to a couple of velocity solutions that is in line with Scott’s ontologies
Dan: We need to be able to point to the aggregate of all datasets.
Focus of the BCube project was the ability to broker long-tail data.
Dave: I think it’s wishful thinking that datasets are going to be so thoroughly described that they can be used anywhere. People should be guided to work through accessors and transformers. They may be more 1 off situations that can be adapted.
Dan: Did not mean to suggest it’s automated rather something non programmers can build accessors to their datasets while pushing that to people that are making access to their data sets or to other data sets … metadata requirements that Scott can shine light on … how to provide supporting metadata.
*$15000/year student can be able to build an accessor than an non-programmer. We should get a better term for non-programmer.
*Sarah: Can we start with specific examples drawn from the use cases we have? We can work with Scott on what ontology to use. Goal of IRIS and UNAVCO is to make data available to the community. However, they are not available through web-services.
Dave: Sarah’s idea supported.
IRIS has the tomography files and UNAVCO has the velocity files.
*Dan: Different ways to add metadata to files without modifying it. We could use existing technologies and just transfer the files. This can be done within 3 months. No software needs to be written.
*Sarah: Other tabular datasets we can access from IRIS. We do want to align with existing Earthcube infrastructure.
Dave: Explore the ways data are currently available.
What the broker does: look at the way unavco and iris already made the data
*FIRST QUARTER OBJECTIVES
Find out how these data are currently accessed:
shape files for soils
seismic velocity models
New Zealand GeoNet
seismology at USGS)
James, Dan, Nathan to discuss how
We will figure out how they are structured in a computer science sense
2) We will work with UNAVCO and IRIS to include the measurement quality metadata (VT)
Great points about integrating Scott Peckham’s Hub and Spoke Model
3) We may be able to add the metadata into the brokering system
4) A dynamically generated catalog of what is being brokered
DAVE: Are we going to have a brokering system running? Dan?
Nathan: What about the metadata that already exists in the netcdf files or tabular files in the IRIS/UNAVCO catalog?
How the 3 types of data are currently available
How they are structured in computer sense
*Tabular velocity data, making sure we use ontologies compatible with brokering.
Dave: Does this project include a brokering software system in the cloud?
*Sarah: Other objectives: A catalog of what has been brokered.
Daniel: David agreed to join the first discussion with the BALTO discussion with the earthcube community...heavily centered on compiling the notes.as opposed to an overview of the proposal itself...contrasting that with the outcomes of the previous
Familiarity with the EarthCube structure:
Follow up: an earthcube structure
Sarah: general funded project requirements
*Daniel: place to do outreach..failings of earthcube: closed community to those funded by earthcube..explore how to get out into scientific community and get people using our tool and other tools
With that... chime into discussion with ways we can sell ourselves onto the open market
Most science is not repeatable...if we can create an environment where all our cases can be released so that anyone can rerun them would be an ultimate goal
Zach: we have to be creative in ways we can release our cases
Sarah: more people getting involved in earthcube, there are ways science committee is working toward getting more people through webinars, sci-tech match ups...these are efforts in increasing non earthcube scientists into the community
Daniel: an outreach or science?
Sarah: science community effort...cookbook so someone can reproduce it
Anne: it would be difficult to be everything online
Daniel: if a new student in your group wants to do this, how do they adopt the prior research and use it to go forward?
Anne: we obtain it from whoever we are collaborating with or wherever we get it from say github...maybe we have it in house.
Daniel: one of the communities that are mature at distributing their codes: weather community..how could the unidata community help a community that doesn't distribute their code. What is your recommendation towards the end goals?
Dave: Brokering might be a contributor towards reproducibility. Brokering work should shorten the path from concept to realization, easing access to data should be a realistic and achievable goal.
Big problem: the amount of software.
Daniel: as people are doing their used cases we want to capture the workflow..possibility of contributing to resources such as khan academy/short courses to the access the data to run the used cases.
Dave: people who attended the workshops became our sales force because they went back to their universities and became implementers of our systems. They picked up skills on how to be creative.
*Sarah: Come back to this idea at another bi-weekly meeting
*Daniel: Earthcube are light on papers that reference significantly using the tools on the used cases...build a template that integrates the various use cases.
Sarah: discussing EarthCube Funding Project (powerpoint)
Would the information in the catalog be registered?
Nathan: Not sure...if the source in the information are changed, they would be gone
Daniel: EC registry would be BALTO. Initial registry will be BALTO itself.
One of the uses for the broker...using the broker as a virtual data server for watershed modeling community...they’re maintaining a virtual server
Sarah: Is there an EC approved repository?
Nathan: continue using github
Dave: what repository do we have in mind? To what extent would the repository become a primary source?
Sarah: UNAVCO is an approved repository for geodetic data
Daniel: since we are brokering data we are not ..
Dave: if a configuration includes the translation of a variable name as in Scott’s ontology
Is github a configuration
Daniel: if we separate a configuration from the broker software then we now have to maintain going forward both of the systems….separating the two decreases the ability of this going forward.
One system should be pointing at the correct version of the other system.
Dave: separation is necessary
Anne & Zach: participation in Science Committee - yes
James & Dan: participation in Technology and Architecture Committee - yes
Daniel: nominated Dave for Liaison committee
Dave: does not want to commit yet
*Dan: will step up for Liaison committee but requires Dave’s support in monthly bases.
*Sarah: whoever has information from these committee will report in Bi-weekly meetings
Dan: presenter of Synergies
-Roughly 142 EC related awards
*James: knows Tanu...assigned to look over their work
Dan: A lot of good projects we can help out with...we want to find lower hanging fruits
Sarah: on CHORDS team
Dan: anyone on any other EC funded/related project? Paper coming out specifically addressing long-tail data. Hopefully have enough data points for graph.
ODSIP introduced the use of brokering for Sarah’s research
We are reusing technology methods or ideas that came out of other projects
Asked everyone to fill in past EC projects in Synergies document
*We want to discuss with other members of the community that might be submitting grants to integrate into future proposals.
Are there any groups we can work with?
Did anyone else have any thoughts? We want to stick ourselves into other solicitations
*Sarah: we should figure out update on new proposals with interactions with committees
*Letters of collaboration with IRIS and UNAVCO
Dan: reporting back would be good so they keep us in their thoughts
What out of the 142/136 other projects external to BALTO can we find synergies with?
Does not fund external work but does fund meetings. It would be an ideal alignment to work with them because they are trying to find loads of random data without trying to manipulate the data. 80% in data wrangling and 20% do the research.
Provide tutorials in this meeting as this group goes forward on data access
Anyone we should talk to about low hanging fruits?
We want to work with Scott Peckham as much as possible
Simon - very energetic about it going forward
Yolanda - fun to work with
Sarah: …. Integrating continuous and discrete data
James: wonders about how data discovering projects might help us centralizing a catalog
Dan: Ruth is helpful with that
James: wonders about the geoTrust, it's related to simplifying data management
Data management does mean making data available for discovery
Dan: Given the history, find someone to reach out to Tanu
James: knows Ian Foster
*Dan: updates on notes going forward in meetings...come up with interactive description on these synergies with the EC committee.
Sarah: suggested to wait on elevator speeches of synergies until we have something more concrete
There are blantly projects that we should work towards collaborating with
Other programs people are familiar with or know exist or like/dislike?
Dave: it seems that an interesting thing might be to pose the following questions. If one finds data sets that seem to be of interest but the metadata are not well matched to the need of our community, could deep learning systems be helpful in suggesting translations of the data sets?
Dan: are you familiar with IS-GEO?
Dave: does not know much
Sarah: end of kick-off
*Find first quarter objectives:Three types of data: geodetic velocities-UNAVCO, tomography data-IRIS, anisotropy data (sks splitting), Crustal and lithospheric structure in NETCDF.
Nathan: shade files?...something to look into
James: many virtual machines in Amazon cloud
Dan: does that fall within our funding?
*Sarah: set up a separate call to talk about technical plans with Dan, Dave, and James
*Anne: get New Zealand Geonet - GPS times series and seismicity catalog. USGS seismicity catalog
*Sarah: explore how that data is available...add to the list
VT will work with UNAVCO and IRIS. We maybe able to add the metadata into our use case.
Take time to digest...First bi-weekly meeting on October 18th
*Daniel: Kyera send out reminder on who is assigned what task?
*Sarah: everyone should go in the notes and correct what needs to be corrected
Put action beside an action found.
*Daniel: setup a spreadsheet of assigned tasks
Sarah: bi-weekly meetings at 3:30 EST.
Anne: asked if first bi-weekly meeting could be next week.
Sarah: not possible.
*Set up Broker in development environment.
General discussion of previous brokering project: