Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI Notebooks

From EGIWiki
Revision as of 12:34, 17 September 2018 by Enolfc (talk | contribs)
Jump to navigation Jump to search

The more you go in data analysis, the more you understand that the most suitable tool for coding and visualizing is not a pure code, or SQL IDE, or even simplified data manipulation diagrams (aka workflows or jobs). From some point you realize that you need a mix of these all – that’s what “notebook” platforms are, with Jupyter being the most popular notebook software out there.

EGI Notebooks is an 'as a Service' environment based on the Jupyter technology, offering a browser-based, scalable tool for interactive data analysis. The EGI Notebooks environment provides users with notebooks where they can combine text, mathematics, computations and rich media output. EGI Notebooks is a multi-user service and can scale to multiple servers based on the EGI cloud service.

Unique Features

EGI Notebooks provides the well-known Jupyter interface for notebooks with the following added features:

  • Integration with EGI Check-in for authentication, login with any EduGAIN or social accounts (e.g. Google, Facebook)
  • Persistent storage associated to each user, available in the notebooks environment.
  • Customisable with new notebook environments, expose any existing notebook to your users.
  • Runs on EGI e-Infrastructure so can easily use EGI compute and storage from your notebooks.

Service Modes

We offer different service modes depending on your needs:

  • Individual users can use the centrally operated service from EGI. Users, after lightweight approval, can login, write and play and re-play notebooks. Notebooks can use storage and compute capacity from the access.egi.eu Virtual Organisation. Request access via EGI marketplace
  • User communities can have their customised EGI Notebooks service instance. EGI offers consultancy and support, as well as can operate the setup. Contact support@egi.eu to make an arrangement. A community specific setup allows the community to
    • use the community's own Virtual Organisation (i.e. federated compute and storage sites) for Jupyter
    • add custom libraries into Jupyter (e.g. discipline-specific analysis libraries)
    • have fine grained control on who can access the instance (based on the information available to the EGI Check-in AAI service).
  • (under development) BinderHub mode that allows to recreate notebooks from existing repositories making the code immediately reproducible by anyone, anywhere. While under development, this option does not have persistent storage and does not require authentication, there is ongoing work to integrate with the modes described above. Alpha instance available at https://binderhub.fedcloud-tf.fedcloud.eu

Data Management

1GB persistent storage for the notebooks is available at /persistent linked from the notebooks home directory. Please note that files stored on any other folder will be lost when your notebook server is closed (which can happen if there is no activity for more than 1 hour!). There is no space limit set for other folders.

If you need to increase your persistent storage space, open a GGUS ticket to the EGI Notebooks Support Unit

Access to other kinds of persistent storage for community specific instances that can be tailored to your specific needs and available storage systems.

Getting your data in

Your notebooks have full outgoing internet connectivity so you can connect to any external service to bring data in for analysis. We are evaluating integration with EOSC-hub services for facilitating the access to input data, with EGI DataHub as first target. Please contact support@egi.eu if you are interested in other I/O integrations.

Deposit output data

As with input data, you can connect to any external service to deposit the notebooks output.

Access to other services

Notebooks running on EGI can access other existing computing and storage services. The centrally operated EGI Notebooks instance is using resources from the access.egi.eu Virtual Organisation. We are open to suggestions on which services you would like to access to create guidelines and extend the service with tools to ease these tasks.

Bring your custom notebooks

Adding new notebooks to the service just requires a working Docker image accessible from a public repository that follows these rules:

  1. It must install JupyterHub v0.9
  2. It must not run as user root, user with uid 1000 is recommended
  3. It must use $HOME as notebook directory

If you have such image, let us know so we can add it to the configuration.

Once binder integration is complete, you will be able to import any notebook just by providing the URL of a repository which contains your notebook.

Early adopter communities and Customisations

AGINFRA+/D4Science

An instance of the EGI notebooks is deployed for the AGINFRA+ project and made available via selected D4Science VREs. Besides the features described above, this instance has been further customised to support :

  • Embedding the EGI notebooks interface into the community we portal, no separate web browser windows or tab to access the notebooks functionality.
  • Integration of AAI between Notebooks and the community portal for single-sign on. Enabled users are automatically recognised by the notebooks.
  • Access to other VRE services from the notebooks using their personal token easily available in the notebooks environment.
  • File sharing between notebooks and the community web portal file space.

Other communities

  • OpenDreamKit: Open Digital Research Environment Toolkit for the Advancement of Mathematics
  • AGINFRA+: Accelerating user-driven e-infrastructure innovation in Food and Agriculture
  • IFREMER: Marine sciences

Service Architecture

Technology Stack

The EGI setup is based in the following components:

The enolfc/egi-jupyterhub github repo contains detailed configuration information on the existing setup at EGI resources.

Next steps

We are looking into:

  • Prometheus based monitoring for the Kubernetes cluster and the JupyterHub
  • Integration with EGI accounting to report usage of resources
  • Complete Binder integration with the regular JupyterHub so users can have persistent notebooks created from external repositories
  • Integration with storage services, EGI DataHub as first target