Project Jupyter Detailed in Recent IEEE Journal Article
By: Matt Asay & Kimberly Mann Bruch
Longtime EarthCube community members Brian Granger and Fernando Perez have recently published Jupyter: Thinking and Storytelling With Code and Data in IEEE's Computing in Science and Engineering journal. The article details the open-source software Jupyter as well as sub-projects such as JupyterLab, JupyterHub, and Jupyter Widgets.
The article specifically describes how Jupyter is much more than a software and how it allows for both interactive computing and computational narratives.
Jupyter has a layered ecosystem that
1) is founded on a diverse community of users and contributors who
2) establish open standards and protocols, by
3) building extensible software, which is
4) deployed in services and enables the authoring and sharing of content. These layers build on each other, are each irreplaceable, and drive innovation.
Credit: Perez and Granger
"What we are seeing today in the acceptance of open source software, which falls under the umbrella of open science, is that we never know which other parts of the scientific world may have a common need with us," said Perez, faculty scientist at UC Berkeley and Lawrence Berkeley National Laboratory. "By building tools that can be shared openly and that are interoperable, we accelerate the cycle of discovery - from a purely pragmatic stance, we want better science faster and we want better discoveries that have impact."
Jupyter has revolutionized data science, and it started with a chance meeting between two students: Jupyter makes it easy for data scientists to collaborate, and the open source project's history reflects this kind of communal effort
Fernando Pérez and Brian Granger met the first day they started graduate school at University of Colorado Boulder. Years later in 2004, they discussed the idea of creating a web-based notebook interface for IPython, which Pérez had started in 2001. This became Jupyter, but even then, they had no idea how much of an impact it would have within academia and beyond. All they cared about was "putting it to immediate use with our students in doing computational physics," as Granger noted.
These things take time
Today Pérez is a professor at University of California, Berkeley, and Granger is a senior principal at AWS, but in 2004 Pérez was a postdoctoral student in Applied Math at UC Boulder, and Granger was a new professor in the Physics Department at Santa Clara University. As mentioned, they first met as students in 1996, and both had been busy in the interim. In 2001 Pérez started dabbling in Python and, in what he calls a "thesis procrastination project," he wrote the first IPython over a six-week stretch: a 259-line script now available on GitHub ("Interactive execution with automatic history, tries to mimic Mathematica's prompt system").
In 2004 Pérez visited Granger in Santa Clara, where they discussed open source and interactive computing as well as the idea to build a web-based notebook. This notion came into focus as an extension of some parallel computing work Granger had been doing in Python and Pérez's work on IPython.
Years (and a great deal of activity) later, in 2009, Pérez was back in California, this time visiting Granger and his family at their home in San Luis Obispo, where Granger was now a professor. It was spring break, and the two spent March 21-24 collaborating in person to complete the first prototype IPython kernel with tab completion, asynchronous output and support for multiple clients.
By 2014, after a great deal of collaboration between the two and many others, Pérez, Granger and additional IPython developers co-founded Project Jupyter and rebranded the IPython Notebook as the Jupyter Notebook to better reflect the project's expansion outwards from Python to a range of other languages including R and Julia. Pérez and Granger continue to co-direct Jupyter today.
Theory of scientific revolutions
"What we really couldn't have foreseen is that the rest of the world would wake up to the value of data science and machine learning," Granger stressed. It wasn't until 2014 or so, he went on, that they "woke up" and found themselves in the "middle of this new explosion of data science and machine learning." They just wanted something they could use with their students. They got that, but in the process they also helped to foster a revolution in data science.
How? Or, rather, why was it that Jupyter has helped to unleash so much progress in data science? Rick Lamers explained:
Jupyter Notebooks are great for hiding complexity by allowing you to interactively run high level code in a contextual environment, centered around the specific task you are trying to solve in the notebook. By ever increasing levels of abstraction data scientists become more productive, being able to do more in less time. When the cost of trying something is reduced to almost zero, you automatically become more experimental, leading to better results that are difficult to achieve otherwise.
Data science is...science; therefore, anything that helps data scientists to iterate and explore more, be it elastic infrastructure or Jupyter Notebooks, can foster progress. Through Jupyter, that progress is happening across the industry in areas like data cleaning and transformation, numerical simulation, exploratory data analysis, data visualization, statistical modeling, machine learning and deep learning. It's amazing how much has come from a chance encounter in a doctoral program back in 1996.
EarthCube supports Project Jupyter via National Science Foundation awards 1928406 and 1928374.
EarthCube is a community-driven activity sponsored by the National Science Foundation to transform research in the academic geosciences community. EarthCube aims to create a well-connected environment to share data and knowledge in an open, transparent, and inclusive manner, thus accelerating our ability to better understand and predict the Earth’s systems. EarthCube membership is free and open to anyone in the Geosciences, as well as those building platforms to serve the Earth Sciences. The EarthCube Office is led by the San Diego Supercomputer Center (SDSC) on the UC San Diego campus.
Kimberly Mann Bruch, San Diego Supercomputer Center Communications, firstname.lastname@example.org
Lynne Schreiber, San Diego Supercomputer Center EarthCube Office, email@example.com
San Diego Supercomputer Center: https://www.sdsc.edu/
UC San Diego: https://ucsd.edu/
National Science Foundation: https://www.nsf.gov/