Jupyter Community Workshop:
Jupyter for Science User Facilities and High Performance Computing11-13 June 2019Berkeley, California
This is a three day workshop for Jupyter developers, high-performance computing (HPC) engineers, and staff from experimental/observational science (EOS) facilities. The main purpose of the workshop is to foster a new collaborative community that can make Jupyter the pre-eminent interface for managing EOS workflows and data analytics at HPC centers. EOS scientists need Jupyter to work well at their facilities and HPC centers, and this workshop will help us address the technical, sociological, and policy challenges involved. The workshop itself will include presentations, posters, and two half-day hackathon/breakout sessions for collaboration.
During the workshop, participants will be invited to begin collaborating on a survey white paper that documents the current state of the art in Jupyter deployments at various facilities and HPC centers. The document will include deployment descriptions, maintenance and user support strategies, security discussions, use cases, and lessons learned. A forward-looking summary provided at the end of the white paper will tie together common threads across various facilities and highlight areas for future research, development, and implementation. We will aim to have the paper completed and published to arXiv within three months of the end of the workshop.
Advances in technology at EOS facilities (e.g. telescopes, particle accelerators, light sources, genome sequencers), in robust high-bandwidth global networking, and in HPC have resulted in an exponential growth of data for scientists to collect, manage, and understand. Interpreting these data streams requires computational and storage resources greatly exceeding those available on laptops, workstations, or university department clusters. Funding agencies increasingly look to HPC centers to address the growing and changing data needs of their scientists. These institutions are uniquely equipped to provide the resources needed for extreme scale science. At the same time, scientists seek new ways to seamlessly and transparently integrate HPC into their EOS workflows.
Jupyter’s software ecosystem provides many of the missing pieces scientists need to manage HPC-enabled EOS workflows in real time, at small and large scales. Staff and researchers at HPC and EOS facilities have begun developing open source components and best practices to adapt Jupyter, JupyterHub, and JupyterLab to their specific computational environments. Jupyter has successfully gained a foothold at these institutions, but how can its position become accepted, be expanded, and made stronger? Realizing the vision of interactive supercomputing for data-intensive science requires:
- Building community around Jupyter at HPC centers and EOS facilities. We need to share code, knowledge, best practices, and lessons learned in order to make Jupyter a viable interface to HPC/EOS facility resources. We cannot all do the work separately. This workshop at this point in time should serve to kick off new inter-facility collaborations around Jupyter.
- Developing and sharing innovative reusable tools that enable Jupyter to function well and within guidelines for appropriate use in HPC/EOS facility computing environments. At the same time, these tools should be easily customizable so that each facility can expose its unique resources to Jupyter users. We must also seek out technologies that can help HPC centers and user facilities support and complement Jupyter.
- Socializing Jupyter as an emerging principal way of interacting with HPC/EOS facility resources in the next decade. Developing persuasive evidence-based talking points around Jupyter as a secure and productive use model, as well as highlighting major advances enabled by Jupyter, should help project and facility management make informed decisions about supporting Jupyter for their users.
Key Topics for Discussion and Collaboration
- What are the key reusable, community-managed Jupyter software components (spawners, authenticators, contents managers, kernel managers, extensions, widgets, etc.) that help Jupyter adapt to HPC and EOS facility use cases?
- In what ways can Jupyter streamline access to centralized data stores and HPC resources like manycore architectures, data management, data transfer, and efficient I/O?
- Is there an abstract model of HPC and EOS facility resources usable by Jupyter that is general enough to be widely re-usable, but also easily extensible to leverage each center’s unique capabilities and features?
- What components are missing from the Jupyter ecosystem for HPC/EOS that could be sketched out or even prototyped during the workshop hackathons?
- What technologies (containers, software defined networking, edge services) best support Jupyter in HPC centers and how can we facilitate their adoption, deployment, and maintenance?
- Keeping open standards, protocols, and architecture foremost in mind, what relationships with industry partners are available or needed that can respond to our needs with Jupyter?
- What deployment strategies and use cases have worked? Which ones have failed and what can we learn from those experiences?
- How can staff supporting Jupyter at HPC centers support one another in their efforts? What training is needed to keep Jupyter alive and well in HPC?
- What work is needed to make sure Jupyter use complies with security and appropriate use policies at HPC centers (e.g. securing kernel messages, SSL, “instrumented” Jupyter)? Could Jupyter itself be an avenue to promoting or encouraging security and appropriate use?
- What tools, widgets, magic commands are needed to streamline user experience around scalable Jupyter (e.g. Dask, IPyParallel, MPI) and eliminate SSH-tunnel hacks?
- What are best practices at HPC centers and EOS facilities for managing JupyterHub plug-ins and customized deployments (spawners, authenticators, extensions, widgets)?