It is common to hear that it is optimal to perform computations necessary for the operation of corporations in the "cloud" and it is true that many commercial companies are moving their information technology into that environment. Scientific data centers funded by the NSF have unique constraints that they must accommodate. Funding is limited and costs of managing data centers using cloud technology can be quite costly. Additionally, government funded research organizations typically have much smaller IT staffs than do corporations. The impact of managing IT operations in the cloud is not identical between large corporations and NSF funded data centers. In the GeoSciCloud project, two medium-size NSF funded data centers plan to deploy data collections along with cloud-based services in different environments in order to assess the feasibility and impact. These environments include:
- Commercial cloud environments such as those offered by Amazon, Google, and Microsoft and
- NSF supported large computing facilities that are just beginning to offer services that have characteristics of cloud computing
The operation of these infrastructures in these two cloud environments will be compared to current in-house environments and assessed.
This project will thereby help NSF/EarthCube identify the most suitable IT environment in which the EarthCube should deploy and support shared infrastructure. The potential reliability and cost-savings are excellent motivating factors.
IRIS and UNAVCO operate data centers with several hundred terabytes of data and services that match our community's needs and requirements. Each organization currently operates its own infrastructure. GeoSciCloud tasks will include moving subsets of our archives, as a test, into commercial cloud and XSEDE cloud environments where we will compare and contrast several aspects of working in different infrastructures. GeoSciCloud partners will also deploy key services developed under the GeoWS building block to enable access to data sets by domain scientists.
GeoSciCloud will help EarthCube compare and contrast the three environments (XSEDE, Commercial Cloud, and current infrastructure) in the following areas:
- Gain an understanding of issues related to the ingestion of large data sets into the cloud and curating the data in a cloud environment.
- Compare processing times for real world requests for data by practicing domain scientists
- Test elasticity of the cloud for doing large amounts of digital signal processing of seismic data and reprocessing GPS solutions for long periods of time.
- Compare the speed of data egress from multiple environments including tests of using higher access systems such as Grid-FTP.
- Compare overall costs of operating in the three environments
- Document what the best practices are that emerge from the GeoSciCloud test that should be promoted within EarthCube.
- Perform conversion of data held in domain formats to more widely used formats such as HDF5 for improved interoperability.
- Test the reliability of streaming real time data into the cloud.
GeoSciCloud will also explore providing some infrastructure in support of other EarthCube partners so that multiple data centers can cohabitate within the GeoSciCloud. IRIS and UNAVCO will commit to ultimately demonstrate the utility of shared infrastructure and how it can improve the efficiency and economics within EarthCube and specifically shared infrastructure in a cloud environment.