Your data are unruly
Talk to any researcher and most will tell you they spend too much time ‘wrangling’ their data. They must tediously reformat, visualize, organize, and essentially herd their data just to be able to use it. In fact, an estimated 80% of a scientist’s working time is devoted to this wrangling of data. Inventing ways of doing this more quickly and efficiently will allow researchers to get to the infinitely more interesting work of deriving meaning and magnitude from the outcomes of their scientific analysis. Bringing us closer to reusable and accessible data, the goals of EarthCube and the FAIR data principles.
An estimated 80% of a scientist’s working time is devoted to wrangling their data.
Many of EarthCube’s projects aim to enable just that. They wrangle and herd large, complex and often disparate datasets. Among other things, by using EarthCube tools scientists can more quickly access, organize, and reformat data. This brings data closer to FAIR data principles for the researchers, so that they don’t have to find ways to do it, and then spend their time implementing those methods by hand. Following the “herding” metaphor, they can get all the sheep in one place quickly, like a sheepdog, so to speak, so that the rancher can do the shearing.
But every scientist has their own unique workflow and data to herd, and so needs a slightly different set of tools. To help them, EarthCube’s challenge is to understand their workflow, their herd, and to know the dogs, the EarthCube tools, that are available.
How do they do it?
ASSET (Accelerating Scientific workflowS using EarthCube Technologies) is an EarthCube project that acts as a catalyst for scientists to map their workflows. They connect scientists with the most appropriate data tools that EarthCube has to offer, and they use their process and the knowledge they’ve gained to improve both.
Using a prototype of the ASSET workflow sketching tool, researchers can get a bird’s eye view of their own operations.
Using a prototype of the ASSET workflow sketching tool that they unveiled at their recent ASSET Clinic (1st-3rd of August, 2018; Boulder, Colorado), researchers can get a bird’s eye view of their own operations. The tool provides both the guidance and freedom necessary to methodically asses and visualize one’s unique workflow into a tangible, intelligible map, and it also includes icons for EarthCube tools. This aspect is key because it allows tool’s icons to be embedded within their workflow sketch. A few participants of the workshop noted they had reached profound new perspectives, simply by using the sketching tool to describe their processes.
Once the workflows are mapped out, the sketch can be usefully and easily assessed. The sketches can be used to identify areas where there is an opportunity to enhance. They can answer questions like, where can the workflow be made faster, better, and more interoperable? Will this bring it closer to FAIR data principles, how? These answers lead to the matchmaking process.
Match made in a workshop
ASSET uses the sketches to determine which EarthCube tools a researcher could use to reduce their personal ‘data wrangling’ load. They are team of developers who come from the EarthCube community. They know the tools intimately enough to understand what each one has in common with each workflow. Combining their knowledge and experience with the sketches, they can see how each can help the other succeed. Then they can pair tools and workflows.
The ASSET Team combines their knowledge and experience with the sketches so they can pair tools and workflows.
This isn’t the first time the ASSET team has exhibited their matchmaking skills. Mike Daniels, one of the team’s leaders and a member of the EarthCube Leadership Council. He led a breakout session at the 2018 EarthCube All Hands Meeting that showed ASSET off. The session followed an inspired ‘speed-dating’ structure through which ASSET helped pair science teams with data expert teams based on their tools. ASSET’s ability to understand each of these groups, and to effectively pair them up further demonstrates their understanding of the nuance present in both scientific workflows and data tools, as well as their dedication to creatively forming effective alliances.
In it for the long-haul; a perpetual self-learning system
The people behind ASSET make the lives of researchers easier by pairing them with the right tools. But, they do more than just matchmake. ASSET intends to improve the reproducibility (The “R” in the FAIR Data Principles) and efficiency of research overall. The ASSET process is designed in such a way that it will learn from its own operation. By collecting more and more workflows, the team will form a new perspective on world of scientific data analysis. They’ll identify the most common problems in data wrangling, and determine which ones need the most complex solutions. Conversely, they’ll also pinpoint which tools are the most useful, where the gaps are, and what future solutions will need to look like.
At their workshop in August, ASSET collected the mapped workflows of 30 researchers with differing backgrounds. They will use these workflows as a pilot framework against which to assess various EarthCube tools. It will work in the opposite direction as the sketching tool, taking EarthCube data tools and evaluating them to determine where and to whom they can be most useful.
Giving it a modern look
ASSET’s next step is to make the interface available as an application. It will be part of the Global Risk, Resilience, and Impacts Toolbox (GRRIT), where scientists can share it and easily use it.
These new capabilities will continuously improve the efficiency of future scientific analysis; especially if used with the FAIR data principles in mind. This, for many projects, could reduce the amount of required data wrangling in the first place. By reducing time-to-science, FAIR data principles and the EarthCube tools that get us to them will allow scientists more time to do meaningful analysis. They will allow for more accurate, more clear experiments. They will alleviate overly-dense and complex methods, making research more reproducible across science. These benefits are not specific to the geoscience community.
Sharing what works is what works
Once ASSET has come into its own, it can be customized for any field with large data analysis demands. ASSET’s momentum is propelling it through innovation. It is preparing our young scientists and our infrastructure for the scientific journals of the future.
Come along for the ride and stay in the loop
Working intimately with research workflows has led ASSET to develop its high level of utility. The team dove deep into EarthCube and asked themselves, “are we getting the tools to the people who need them?” then found a way to make sure that is happening. By spreading that use through the GRRIT toolbox, they’ve also made them accessible to you.
(Main contact): Cindy Bruyère, National Center for Atmospheric Research
Scott Peckham, Colorado University Boulder, Institute of Arctic and Alpine Research
Yolanda Gil, University of Southern California Information Sciences Institute
Mike Daniels, National Center for Atmospheric Research
James Done, National Center for Atmospheric Research