Difference between revisions of "EGI Notebooks"
Line 1: | Line 1: | ||
EGI Jupyter is an 'as a Service' environment | The more you go in data analysis, the more you understand that the most suitable tool for coding and visualizing is not a pure code, or SQL IDE, or even simplified data manipulation diagrams (aka workflows or jobs). From some point you realize that you need a mix of these all – that’s what “notebook” platforms are, with Jupyter being the most popular notebook software out there. | ||
EGI Jupyter is an 'as a Service' environment based on the Jupyter technology, providing a browser-based, scalable tool for interactive analysis of data. The environment provider users with notebooks where they can combine text, mathematics, computations and rich media output using [http://jupyter.org/ Jupyter technology]. EGI Jupyter is a multi-user service and can scale to multiple servers based on the [[Federated_Cloud_user_support#What_is_the_EGI_cloud.3F|EGI cloud service]]. | |||
= Unique Features = | = Unique Features = |
Revision as of 15:15, 8 February 2018
The more you go in data analysis, the more you understand that the most suitable tool for coding and visualizing is not a pure code, or SQL IDE, or even simplified data manipulation diagrams (aka workflows or jobs). From some point you realize that you need a mix of these all – that’s what “notebook” platforms are, with Jupyter being the most popular notebook software out there.
EGI Jupyter is an 'as a Service' environment based on the Jupyter technology, providing a browser-based, scalable tool for interactive analysis of data. The environment provider users with notebooks where they can combine text, mathematics, computations and rich media output using Jupyter technology. EGI Jupyter is a multi-user service and can scale to multiple servers based on the EGI cloud service.
Unique Features
EGI Jupyter provides the well-known Jupyter interface for notebooks with the following added features:
- Integration with EGI Check-in for authentication, login with any EduGAIN or social accounts (e.g. Google, Facebook)
- Persistent storage associated to each user, available in the notebooks environment.
- Customisable with new notebook environments, expose any existing notebook to your users.
- Runs on EGI e-Infrastructure so can easily use EGI compute and storage from your notebooks.
Service Modes
We offer different service modes depending on your needs:
- Individual users can use the centrally operated service from EGI. Users, after lightweight approval, can login, write and play and re-play notebooks. Notebooks can use storage and compute capacity from the access.egi.eu Virtual Organisation. Request access via EGI marketplace
- User communities can have their customised EGI Jupyter service instance. EGI offers consultancy and support, as well as can operate the setup. Contact support@egi.eu to make an arrangement. A community specific setup allows the community to
- use the community's own Virtual Organisation (i.e. federated compute and storage sites) for Jupyter
- add custom libraries into Jupyter (e.g. discipline-specific analysis libraries)
- have fine grained control on who can access the instance (based on the information available to the EGI Check-in AAI service).
- (under development) BinderHub mode that allows to recreate notebooks from existing repositories making the code immediately reproducible by anyone, anywhere. While under development, this option does not have persistent storage and does not require authentication, there is ongoing work to integrate with the modes described above. Alpha instance available at https://binderhub.fedcloud-tf.fedcloud.eu
Data Management
Persistent storage for the notebooks is available at /persistent
linked from the notebooks home directory. This is backed up by a NFS server managed as persistent volume in Kubernetes and automatically mounted at every notebook users create. Access to other kinds of persistent storage, specially for community specific instances that can be tailored to your specific needs and available storage systems.
Getting your data in
Your notebooks have full outgoing internet connectivity so you can connect to any external service to bring data in for analysis. We are evaluating integration with EOSC-hub services for facilitating the access to input data, with EGI DataHub as first target. Please contact support@egi.eu if you are interested in other I/O integrations.
Deposit output data
As with input data, you can connect to any external service to deposit the notebooks output.
Access to other services
Notebooks running on EGI Jupyter can access other existing computing and storage services. The centrally operated EGI Jupyter instance is using resources from the access.egi.eu Virtual Organisation. We are open to suggestions on which services you would like to access to create guidelines and extend the service with tools to ease these tasks.
Bring your custom notebooks
Adding new notebooks to the EGI Jupyter just requires a working Docker image accessible from a public repository that follows these rules:
- It must install
JupyterHub v0.8
- It must not run as user
root
, user with uid 1000 is recommended - It must use
$HOME
as notebook directory
If you have such image, let us know so we can add it to the configuration.
Once binder integration is complete, you will be able to import any notebook just by providing the URL of a repository which contains your notebook.
Early adopter communities
- OpenDreamKit: Open Digital Research Environment Toolkit for the Advancement of Mathematics
- AGINFRA+: Accelerating user-driven e-infrastructure innovation in Food and Agriculture
- IFREMER: Marine sciences
Technology Stack
The EGI setup is based in the following components:
- Kubernetes as container orchestration platform running on top of EGI Federated Cloud resources
- Jupyterhub with custom EGI Check-in oauthentication and Kubernetes Spawner
- BinderHub to build and reproduce notebooks
- Traefik as HTTP proxy
The enolfc/egi-jupyterhub github repo contains detailed configuration information on the existing setup at EGI resources.
Next steps
We are looking into:
- Prometheus based monitoring for the Kubernetes cluster and the JupyterHub
- Integration with EGI accounting to report usage of resources
- Complete Binder integration with the regular JupyterHub so users can have persistent notebooks created from external repositories
- Integration with storage services, EGI DataHub as first target