Long-tail of science

From EGIWiki
Revision as of 12:36, 19 November 2014 by Tsz (talk | contribs) (User Management Portal)
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security



Coordinator: Peter Solagna/EGI.eu

Meetings pagehttp://indico.egi.eu/indico/categoryDisplay.py?categId=36

Mailing list: long-tail-pilot@mailman.egi.eu


Overview

Motivation

The goal of this activity is to evaluate possible technologies to implement a set of services to make easier for the users of the long tail of platform to access EGI resources. The services should be used both by the EGI central user support to support centrally new users who are approaching EGI, and by the NGIs who would reuse these services to serve their local users with their resources.

Mandate

The work group will be active in the last months of EGI-InSPIRE, the target of this activity is to start the design of the services that will be deployed in production during EGI-Engage.

Objectives

The capabilities that the long tail of science platform have to support are:

  • Zero-barrier access: any user who carries out relevant research can get a start-up resource allocation
  • 100% coverage: anyone with internet access can become a user
  • User-centric: User support for platform users is available through the NGIs
  • Realistic: Reuse existing technology building blocks as much as possible, require minimal new development
  • Secure: Provide acceptable level of tracking of users and user activities (Not necessarily f2f vetting)
  • Scalable: Can scale up to support large number resource providers, technology providers, use cases and users
  • Valuable: Result tangible outcomes

Milestones/Timeline

TBD

Members

How to Join

If interested, please contact Peter Solagna peter.solagna@egi.eu

Technical Information

The following table contains the current services and technologies available in the NGIs participating the work group that can be used in the long tail of science platform. The possible types of contribution are:

  • User management
  • Credential management
  • Resources provisioning for the pilot
  • ... (please, feel free to add)
Institution Type of contribution Description
INFN eTokenServer The eTokenServer is a standard-based solution developed by INFN for central management of robot certificates and provisioning of proxies to get seamless and secure access to computing e-Infrastructures, based on local, Grid and Cloud middleware supporting the X.509 standard for authorization. The business logic of the servlet has been conceived to provide "resources" (e.g., full-legacy and RFC-3820 proxies) in a "web-manner" which can be consumed by simple users, client applications and by portals and Science Gateways.

Starting from release v2.0.1, the eTokenServer is now able to account users of robot certificates (RFC proxies only). This is key for accounting and auditing usage of e-Infrastructures.

The Catania Grid & Cloud Engine The Catania Grid & Cloud engine offers a seamless access to local HPC, Grid and Cloud infrastructures, exploiting a common set of APIs based on JSAGA.

JSAGA implements the OGF SAGA standard for the Java language and it addresses many different distributed infrastructures through a set of adaptors. Currently, adaptors are available for several middleware (gLite/EMI (Grid), rOCCI (Cloud), ssh (HPC), etc.) while new adaptors can easily be written to target new infrastructures. The Catania Grid & Cloud engine also provides a set of RESTful APIs allowing software developers to access distributed infrastructure from many different web portal engines or even mobile devices, such as smartphones and tablets. It is worth noting that the Catania Grid & Cloud Engine interacts with another module of the CSGF called the User Tracking Database, to ensure compliance with both the EGI VO Portal Policy (https://documents.egi.eu/public/ShowDocument?docid=80) and the EGI Grid Security Traceability and Logging Policy (https://documents.egi.eu/public/ShowDocument?docid=81).

Support to both eduGAIN-compliant and STORK Identity Federations Liferay plugins have been developed to allow SAML 2.0 based federated authentication. Science Gateways developed with the CSGF can be configured as Service Providers of both eduGAIN-compliant Identity Federation and, since a few weeks, of the STORK Federation (https://www.eid-stork.eu/) promoted by the European Commission as platform for e-ID cross-border trust.

The integration of Science Gateways in the STORK federation has been the result of a joint work carried out by INFN and the Politecnico di Torino and is demonstrated in a short video that can be watched at http://youtu.be/GmYOMn8Lsw4. As of today, no other Science Gateway framework in the world is known to be compatible with STORK-based authentication. Concerning eduGAIN-compliant Identity Federations, INFN operates the GridP “catch-all” federation (http://gridp.garr.it) that includes both “catch-all” Identity Providers (e.g., http://idpopen.garr.it and http://idpsocial.garr.it) for “homeless” users and Identity Providers that do not belong to any official federation (e.g., the EGI SSO).

General purpose Web Portal INFN has developed a general purpose Grid Portal (IGP), based on Liferay, which provides a web graphical user interface access to Grid job submission, workflow definition and data management. It is also interfaced with external Infrastructure as a Service (IaaS) frameworks for the dynamic provisioning of computing resources.
Authentication service The CASShib service provides a SAML-based authentication mechanism for the Liferay portal based on the eduGAIN Federation. This service is deployed in a tomcat container, it can optionally run on a separate server machine and provides translation of SAML Assertions (retrieved from a Shibboleth service provider part of the eduGAIN federation) into CAS Assertions. The CAS Assertions can then be used into the Liferay portal for the user authentication and session.
IGP Registration portlet Based on the Liferay framework, this portlet provides a mechanism to register an authenticated user by retrieving the required information from an IDP (eduGAIN Federation), an X.509 certificate and VOMS. If a valid certificate is not available, the portlet requests a new certificate to an on-line CA (see item 4) on behalf of the user. To improve security and preserve the necessary information, the X.509 certificate is only used to create a proxy and it is then destroyed; the proxy certificate is stored in a dedicated Myproxy server.
On-line CA related services The on-line CA is a service based on EJBCA (http://www.ejbca.org). This service, for security reasons, runs on a dedicated server in a JBoss container. To access the on-line CA, the registration portlet uses the on-line CA Bridge service deployed in a tomcat container running on a separate machine. The on-line CA Bridge communicates with the on-line CA in a secure way through the EJBCA APIs. When the portal requests a certificate on behalf of the user, the on-line CA Bridge retrieves the user information from the eduGAIN federation using the SSO mechanism through the Shibboleth service running on the server. The on-line CA Bridge also provides a user interface that can be accessed directly by the users via a web browser; in this case the certificate will be installed directly in the browser.
IGP Authorization portlet Based on the Liferay framework, this portlet allows managing proxy and VOMS extensions for each user. These credentials are then used by the portal for job submission and data management.
IGP Job submission portlets 1. Workflow submission performed by a subset of WS-PGRADE portlets (workflows creation, submission and management), wrapping them through the gUSE component that can execute acyclic workflows.

2. Simple submission through a portlet that allows users to: build their own JDL; save JDLs as templates for sharing and reusing purposes; show the list of submitted jobs; monitor the job state during its execution; retrieve the output; resubmit an ended job; view log files.

3. The Cloud portlet, allowing access to different Cloud resources (WNoDeS, OpenStack and OpenNebula) by instantiating new virtual machines on the available resources. The portlet can manage the users’ existing ssh keys, or can create a new pair if needed. The portlet interacts with the federated resources through a python command-line interface that simplifies the communication by wrapping the OCCI protocol.

IGP Data Management service A service to transfer files between Grid and local resources has been designed and integrated in the IGP.

This service allows users to easily upload files to the Grid in two ways: via a web browser for local files, or via an external server using different protocols (http, webdav, ftp, sftp). The data management interface is based on a specific plugin for Pydio tool (http://pyd.io) that interacts with the user, the local storage element and the Grid storage element, and moves files between them.

CYFRONET User Portal: Registration User registration is initiated by the user who provides his/her data through an on-line form. E-mail address is confirmed by a validation link. Account application is manually approved by a Vetting Person. Vetting in PL-Grid is done by confirmation of an affiliation of a person with a scientific institute. After this step the user can apply for access to a services offered by infrastructure.
User Portal: Getting certificates (integration with on-line CA) While registered the user can apply for an X.509 certificate from an on-line CA. There is an on-line CA server based on ejbca which issues personal certificates. User can obtain and revoke a certificate.
User Portal: Application for an access to a service. In order to get access to a service the user must apply for it. There is a Service Administrator role who approves the requests. As a result of positive approval some backend actions are executed. Portal is able to communicate with VOMS, UVOS (UNICORE) and LDAP services.
User Portal: Initial allocation While registered the user gets an initial allocation. The initial allocation allows users to utilize the service, but only in a limited scope: 1000 compute hours, 40 GB of storage for 6 months. The initial allocation is automatically renewed each half a year.
Open ID service provider. User database is interfaced with OpenID provider. Other services which need to authenticate/authorize a user can do this through OpenID. User can be authenticated by openID by providing X.509 certificate. Optionally openID service can pass X.509 proxy to the requesting service if user provides a password to a X.509 key.
GRNET

Technical discussion and architecture details

User Management Portal (UMP)

  • Registration of the user. Including the form where to provide information about the user's institution, field of research and the purposes of his/her activity in EGI resources.
    • The request must be approved by authorized users.
  • User registry. The UMP will be a registry of the users who are accessing, or accessed, EGI through the long tail of science platform.
  • User authentication
    • UMP must support a catch all IdP for the homeless users (use of EGI SSO?)
    • Consider in the UMP the possibility to integrate external IdPs.
    • The other services of the long tail of science platform should get hthe user information from the UMP. This will ensure that users are associated to uniform identifiers assigned only by the UMP to facilitate accounting and authorization.

As shown in the following figure, the UMP must act as a service proxy, between science gateways and the identity providers, being them EGI SSO or other IdP (e.g. eduGAIN federations). In this way UMP can control the access to the infrastructure for the long tail of science users. UMP acts as unique IdP for the science gateways.

This architecture also allows the UMP to be the service provider that needs to be authorized by the IdPs.

User Management Portal Architecture


Once the users' request is authorized on the User Management Portal, they are redirected to one or several science gateways where they can run their computational tasks or manage their data on the grid. A possible workflow to access resources could be the following:

  1. User accesses the Scienge Gateway (SG).
  2. The SG redirect the request to the UMP.
  3. The UMP redirect the request to the IdP that holds the credentials of the user (e.g. EGI SSO).
  4. The User authenticate on his/her IdP.
  5. The IdP provides the assertion with some attributes about the user to the UMP (e.g. the user email address).
  6. The UMP answers to the SG adding more attributes including the Unique Identifier that identifies the user in the UMP registry, and that is unique for every user using the LTOS platform.
  7. The SG uses the UID to ask a credentials that can be univocally associated to the individual user.
  8. The credential is used to access EGI resources.

User Management Portal workflow

Credential services

Policy changes

Accounting of usage

How can we account for the resource usage of a user of the LTOS? Should be done through the science gateways, or through the EGI accounting?

Timeline for deployment

User Management Portal

  • Week 24-28.11:
    • clarify use-story (input for CYFRNONET) [?]
  • Week 1-5.12:
    • wireframe draft (?) [CYFRONET]
    • (implementation internal [CYFRONET])
  • Week 8-12.12
    • (implementation internal [CYFRONET])
  • Week 15-19.12 portal very early protype [CYFRONET]

User credential service

Scientific gateway

References