Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI Notebooks"

From EGIWiki
Jump to navigation Jump to search
(Replaced content with "{{Template:Block-comment | name=Documentation moved! | text=The EGI Notebooks documentation is now available at https://egi-notebooks.readthedocs.io/. }}")
 
Line 1: Line 1:
The more you go in data analysis, the more you understand that the most suitable tool for coding and visualizing is not a pure code, or SQL IDE, or even simplified data manipulation diagrams (aka workflows or jobs). From some point you realize that you need a mix of these all – that’s what “notebook” platforms are, with Jupyter being the most popular notebook software out there.
{{Template:Block-comment
 
| name=Documentation moved!
EGI Notebooks is an 'as a Service' environment based on the [http://jupyter.org/ Jupyter technology], offering a browser-based, scalable tool for interactive data analysis. The EGI Notebooks environment provides users with notebooks where they can combine text, mathematics, computations and rich media output. EGI Notebooks is a multi-user service and can scale to multiple servers based on the [[Federated_Cloud_user_support#What_is_the_EGI_cloud.3F|EGI cloud service]].
| text=The EGI Notebooks documentation is now available at https://egi-notebooks.readthedocs.io/.
 
}}
= Unique Features =
 
EGI Notebooks provides the well-known Jupyter interface for notebooks with the following added features:
* Integration with EGI Check-in for authentication, login with any EduGAIN or social accounts (e.g. Google, Facebook)
* Persistent storage associated to each user, available in the notebooks environment.
* Customisable with new notebook environments, expose any existing notebook to your users.
* Runs on EGI e-Infrastructure so can easily use EGI compute and storage from your notebooks.
 
= Service Modes =
 
We offer different service modes depending on your needs:
 
* Individual users can use the centrally operated service from EGI. Users, after lightweight approval, can login, write and play and re-play notebooks. Notebooks can use storage and compute capacity from the access.egi.eu Virtual Organisation. Request access via [https://marketplace.egi.eu/applications-on-demand-beta/65-jupyter.html EGI marketplace]
 
* User communities can have their customised EGI Notebooks service instance. EGI offers consultancy and support, as well as can operate the setup. Contact support@egi.eu to make an arrangement. A community specific setup allows the community to
** use the community's own Virtual Organisation (i.e. federated compute and storage sites) for Jupyter
** add custom libraries into Jupyter (e.g. discipline-specific analysis libraries)
** have fine grained control on who can access the instance (based on the information available to the EGI Check-in AAI service).
 
* '''''(under development)''''' BinderHub mode that allows to recreate notebooks from existing repositories making the code immediately reproducible by anyone, anywhere. While under development, this option does not have persistent storage and does not require authentication, there is ongoing work to integrate with the modes described above. Alpha instance available at https://binderhub.fedcloud-tf.fedcloud.eu
 
= Data Management =
 
1GB persistent storage for the notebooks is available at <code>/persistent</code> linked from the notebooks home directory. '''Please note that files stored on any other folder will be lost when your notebook server is closed (which can happen if there is no activity for more than 1 hour!)'''. There is no space limit set for other folders.
 
If you need to increase your persistent storage space, open a [https://ggus.eu GGUS ticket to the EGI Notebooks Support Unit]
 
Access to other kinds of persistent storage for community specific instances that can be tailored to your specific needs and available storage systems.
 
== Getting your data in ==
 
Your notebooks have full outgoing internet connectivity so you can connect to any external service to bring data in for analysis. We are evaluating integration with EOSC-hub services for facilitating the access to input data, with EGI DataHub as first target. Please contact support@egi.eu if you are interested in other I/O integrations.
 
== Deposit output data ==
 
As with input data, you can connect to any external service to deposit the notebooks output.
 
== Interfacing with EUDAT B2DROP ==
The Notebooks service is interoperable with the EUDAT B2DROP service, allowing a user to access files stored under his/her B2DROP account from the EGI Notebooks. To use this feature you should sign up for a B2DROP account, upload files into it, then register the account in your EGI Notebooks session. User guide about the steps is coming soon.
 
= Access to other services =
 
Notebooks running on EGI can access other existing computing and storage services. The centrally operated EGI Notebooks instance is using resources from the access.egi.eu Virtual Organisation. We are open to suggestions on which services you would like to access to create guidelines and extend the service with tools to ease these tasks.
 
= Bring your custom notebooks =
 
Adding new notebooks to the service just requires a working Docker image accessible from a public repository that follows these rules:
# It must install <code>JupyterHub v0.9</code>
# It must '''not''' run as user <code>root</code>, user with uid 1000 is recommended
# It must use <code>$HOME</code> as notebook directory
 
If you have such image, let us know so we can add it to the configuration.
 
Once binder integration is complete, you will be able to import any notebook just by providing the URL of a repository which contains your notebook.
 
= Getting support =
 
You can use the [https://ggus.eu EGI Helpdesk] to contact us for support or any additional service requests.
 
= Early adopter communities and Customisations =
 
== AGINFRA+/D4Science ==
 
An instance of the EGI notebooks is deployed for the [http://plus.aginfra.eu/ AGINFRA+] project and made available via selected [https://www.d4science.org/ D4Science VREs]. Besides the features described above, this instance has been further customised to support :
* Embedding the EGI notebooks interface into the community we portal, no separate web browser windows or tab to access the notebooks functionality.
* Integration of AAI between Notebooks and the community portal for single-sign on. Enabled users are automatically recognised by the notebooks.
* Access to other VRE services from the notebooks using their personal token easily available in the notebooks environment.
* File sharing between notebooks and the community web portal file space.
 
== Other communities ==
* [http://opendreamkit.org/ OpenDreamKit]: Open Digital Research Environment Toolkit for the Advancement of Mathematics
*[http://www.aginfra.eu/ AGINFRA+]: Accelerating user-driven e-infrastructure innovation in Food and Agriculture
* IFREMER: Marine sciences
 
= Service Architecture =
 
The EGI Notebooks service relies on the following technologies to provide its functionality:
 
* [https://github.com/jupyterhub/jupyterhub Jupyterhub] with custom [https://github.com/enolfc/oauthenticator EGI Check-in oauthentication] configured to spawn pods on Kubernetes.
* [https://kubernetes.io/ Kubernetes] as container orchestration platform running on top of [[EGI Federated Cloud]] resources. Within the service it is in charge of managing the allocated resources and providing the right abstraction to deploy the containers that build the service. Resources are provided by EGI Federated Cloud providers, including persistent storage for users notebooks.
* CA authority to allocate recognised certificates for the HTTPS server
* [https://prometheus.io/ Prometheus] for monitoring resource consumption.
* Specific EGI hooks for [https://github.com/EGI-Foundation/egi-notebooks-monitoring monitoring] and [https://github.com/EGI-Foundation/egi-notebooks-accounting accounting].
* VO-Specific storage/Big data facilities or any pluggable tools into the notebooks environment can be added to community specific instances.
 
 
[[File:EGI_Notebooks_Stack.png|center|650px|EGI Notebooks Achitecture]]
 
== Kubernetes usage ==
 
A Kubernetes (k8s) cluster deployed into a resource provider is in charge of managing the containers that will provide the service. On this cluster there are:
* 1 master node that manages the whole cluster
* 1 or more edge nodes with a public IP and corresponding public DNS name (notebooks.egi.eu) where a k8s ingress HTTP reverse proxy redirects requests from user to other components of the service. The HTTP server has a valid certificate from one CA recognised at most browsers (e.g. DigiCert).
* 1 or more nodes that host the JupyterHub server, the notebooks servers where the users will run their notebooks. Hub is deployed using the [https://jupyterhub.github.io/helm-chart/ JupyterHub helm charts].
* Persistent storage managed via NFS to be exposed into the notebooks as user space.
* Prometheus allows monitoring of usage of resources and it's queried by the accounting plugin to produce accounting records.
 
All communication with the user goes via HTTPS and the service only needs a publicly accessible entry point (public IP with resolvable name)
 
Monitoring and accounting are provided by hooking into the respective monitoring and accounting EGI services.
 
There are no specific hardware requirements and the whole environment can run on commodity virtual machines.
 
== Ideas for future development ==
* Provide a way to parametrise and execute notebooks like https://github.com/nteract/papermill

Latest revision as of 15:46, 17 May 2019

Documentation moved!:
The EGI Notebooks documentation is now available at https://egi-notebooks.readthedocs.io/.