Finalized:Wednesday, April 1, 2015
Author(s):Meng, H., R. Kommineni, Q. Pham, R. Gardner, T. Malik, and D. Thain
Computational reproducibility depends on the ability to not only isolate necessary and sufficient computational artifacts but also to preserve those artifacts for later re-execution. Both isolation and preservation present challenges in large part due to the complexity of existing software and systems as well as the implicit dependencies, resource distribution, and shifting compatibility of systems that result over time—all of which conspire to break the reproducibility of an application. Sandboxing is a technique that has been used extensively in OS environments in order to isolate computational artifacts. Several tools were proposed recently that employ sandboxing as a mechanism to ensure reproducibility. However, none of these tools preserve the sandboxed application for re-distribution to a larger scientific communityaspects that are equally crucial for ensuring reproducibility as sandboxing itself. In this paper, we describe a framework of combined sandboxing and preservation, which is not only efficient and invariant, but also practical for large-scale reproducibility. We present case studies of complex high-energy physics applications and show how the framework can be useful for sandboxing, preserving, and distributing applications. We report on the completeness, performance, and efficiency of the framework, and suggest possible standardization approaches.
Meng, H., R. Kommineni, Q. Pham, R. Gardner, T. Malik, and D. Thain, 2015: An invariant framework for conducting reproducible computational science. Journal of Computational Science, 9, 137–142, doi:10.1016/j.jocs.2015.04.012This material is based upon work supported by the National Science Foundation under Grant No. 1343813, 1343816. Opinions, findings, conclusions or recommendations expressed are those of the authors and do not reflect the views of the NSF.