Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Federated Cloud Architecture"

From EGIWiki
Jump to navigation Jump to search
imported>Enolfc
(Replaced content with "{{Fedcloud_Menu}} {{TOC_right}} {| style="border:1px solid black; background-color:lightgrey; color: black; padding:5px; font-size:140%; width: 90%; margin: auto;" |- |...")
Tag: Replaced
 
Line 3: Line 3:
{{TOC_right}}
{{TOC_right}}


= Federation Model =
{| style="border:1px solid black; background-color:lightgrey; color: black; padding:5px; font-size:140%; width: 90%; margin: auto;"
 
The EGI Federated Cloud is a multi-national cloud system that integrates community, private and/or public clouds into a scalable computing platform for research. The Federation pools  services from a heterogeneous set of cloud providers using a single authentication and authorisation framework that allows the portability of workloads across multiple providers and enable bringing computing to data. The current implementation is focused on IaaS services but can be easily applied to PaaS and SaaS layers.
 
Each resource centre of the federated infrastructure operates a Cloud Management Framework (CMF) according to its own preferences and constraints and joins the federation by integrating this CMF with components of the EGI Federation and Collaboration Services. All services provided by the CMFs must at least be integrated with EGI AAI so users can access services with a single identity, integration with other components and APIs to be provided are agreed by the community the resource centre provides services to.
 
EGI follows a Service Integration and Management (SIAM) approach to manage the federation with processes that cover the different aspects of the IT Service Management. Providers in the federation keep complete control of their services and resources. EGI VO OLAs establish a reliable, trust-based communication channel between the Customer and the providers to agree on the services, their levels and the types of support. The EGI VO OLAs are not legal contracts but, as agreements, they outline the clear intentions to collaborate and support research.
 
== Federated IaaS ==
 
The EGI Federated Cloud Infrastructure as a Service (IaaS) resource centres deploy a Cloud Management Framework (CMF) that provide users with an API-based service for management of Virtual Machines and associated Block Storage to enable persistence and Networks to enable connectivity of the Virtual Machines among themselves and third party resources.
 
These capabilities must be provided via community agreed APIs (OpenStack and/or OCCI are supported at the moment) that can be integrated with EGI Check-in for authentication and authorisation of users. Moreover, the provider must:
* Register in the GOCDB, to record information about the topology of the e-infrastructure.
* Provide accounting records to be collected and aggregated by the EGI accounting repository
* Provide dynamic information about the endpoints so they can be discovered by users and other clients of the federation.
* Allow monitoring activities that regularly check the service availability and reliability.
* Integrate with the VM image synchronisation so users can find their software ready to be used at the providers.
 
[[Image:Federated_Cloud_IaaS_Model.png|thumb|center|600px|Federated Cloud Model]]
 
Users and Community platforms built on top of the EGI Federated Cloud IaaS have several ways of interacting with the cloud providers:
* Directly using the IaaS APIs to manage individual resources. This option is recommended for pre-existing use cases with requirements on specific APIs.
* Leveraging IaaS Federated Access Tools that allow managing the complexity of dealing with different providers in a uniform way. These tools include
**IaaS provisioning systems that allow to define infrastructure as code and manage and combine resources from different providers, thus enabling the portability of application deployments between them (e.g. IM or Terraform);
**Cloud brokers, that provide matchmaking for workloads to available providers (e.g. the INDIGO-DataCloud Orchestrator); and
**Cloud Management Software that provides a unified console for accessing resources and deploy workloads following a set of user-defined established policies (e.g. Scalr or RightScale)
* Using the AppDB VMOps dashboard, a web-based GUI that simplifies the management of VMs on any provider of the EGI infrastructure. AppDB VMOps in turn relies on the Infrastructure Manager, a federated IaaS provisioning tool documented in the aforementioned wiki.
 
Currently, EGI supports providers running OpenStack, OpenNebula or Synnefo Cloud Management Frameworks via a set of technology components that interact whenever possible using the public interfaces of these CMFs, thus minimising the impact on operations of the site. A detailed listing of tools and how they interact with the underlying infrastructure is available at the [[Federated Cloud Technology]].
 
= Technology =
 
EGI Federated Cloud provides the services and technologies to create federation of clouds (community, private or public clouds) that operate according to the preferences, choices and constraints set by its members and users. The EGI Cloud Federations are modeled around the concept of an abstract Cloud Management stack subsystem that is integrated with components of the EGI Core Infrastructure and that provides a set of agreed uniform interfaces within the community it provides services to. You can find here the technical solutions provided by EGI to create such federations.
 
== Single Sign-On for users ==
 
SSO Ensures that users of the federation needs to register for access only once before they can use the federated services. Single sign-on is increasingly implemented in the form of identity federations in both industry and academia. Within EGI, research communities are generally identified and, for the purpose of using EGI resources, managed through “Virtual Organisations” (VOs).
 
=== OpenID Connect ===
 
Cloud providers of the EGI Cloud must support authentication with OAuth2.0 tokens provided by Check-in OpenID Connect Identity provider. Support builds on the [[AAI_guide_for_SPs|generic support for Service Providers of Check-in]] with detailed configuration provided at [https://egi-federated-cloud-integration.readthedocs.io/en/latest/openstack.html#openid-connect-support| EGI Cloud integration manual].
 
=== Legacy VOMS / X.509 certificates ===
 
EGI Cloud can support those users still using X.509 certificates extended with VO attributes (e.g. acknowledging that the user is member of the VO) in a so called VOMS proxy. This VOMS proxy certificate is used in subsequent calls to the cloud endpoints which map the certificate and VO information each cloud management framework authentication and authorisation mechanisms via the integration modules for VOMS authentication. Configuring these modules into a provider’s cloud installation will allow members of these VOs to access the cloud.
 
Generic information about how to configure VOMS support for the supported Cloud Management Frameworks is available at [[MAN10]]. Information to how to add the support for a new Virtual Organisation on the EGI Federated Cloud can be found at [[HOWTO16]].
 
=== Implementations ===
 
; [https://github.com/EGI-FCTF/fctf-perun OpenNebula Perun integration]
: rOCCI-server maps the certificate and VO information to local users. Local users need to have been created in advance, which is triggered by regular synchronizations of the OpenNebula installation with Perun.
 
;[https://github.com/IFCA/Keystone-VOMS Keystone-VOMS]
:Plugin for OpenStack Keystone to enable VOMS authentication. Allows users to get tokens which can be used to access any of the OpenStack services (including the OCCI interface). Users are generated on the fly in Keystone, it does not need regular synchronization with the VO Management server Perun.
 
== IaaS Interfaces ==
 
Cloud systems must provide a set of interfaces through which users and user applications can interact with the services offered. In case of an IaaS cloud federation these interfaces offer compute, storage and network management capabilities. The interfaces can be harmonised across all participating cloud providers - in which case the providers are responsible for implementing the agreed standard - or can be native at the different sites. In this latter case, libraries or portals can hide heterogeneity from the users and can translate user requests to diverse native formats.
 
=== OpenStack ===
 
OpenStack sites of the EGI Federated Cloud provide access through the native OpenStack API. The OpenStack API documentation is available at [http://developer.openstack.org/ OpenStack developer pages]. EGI Federated Cloud supports the usage of the [http://developer.openstack.org/api-ref-compute-v2.1.html Compute (nova) v2.1 API]. The [[Federated_Cloud_APIs_and_SDKs|Federated Cloud APIs and SDKs page]] describes how to use this API in the EGI resources.
 
=== OCCI ===
 
The [http://occi-wg.org/ Open Cloud Computing Interface (OCCI)] is a RESTful Protocol and API designed to facilitate interoperable access to, and query of, cloud-based resources across multiple resource providers and heterogeneous environments. The formal specification is maintained and actively worked on by OGF’s OCCI-WG.
 
The [[Federated_Cloud_VM_Management|VM Management]] scenario page contains further details on the support for OCCI on different Cloud Management Stacks.
 
==== Implementations ====
 
;[https://github.com/gwdg/rOCCI rOCCI - A Ruby OCCI Framework]
:Provides OCCI support for various Cloud Management Frameworks, including OpenNebula
 
;[https://github.com/openstack/ooi ooi]
:OCCI for OpenStack Interface, provides OCCI support for the most recent versions of OpenStack
 
;[https://github.com/EGI-FCTF/occi-os OCCI-OS]
:OpenStack OCCI interface, now deprecated.
 
== Virtual Machine Image management ==
 
In a distributed, federated Cloud infrastructure, users will often face the situation of efficiently managing and distributing their VM Images across multiple resource providers. Users need a catalogue of Virtual Machine images (VMIs) that are usable on the IaaS cloud provider sites and encapsulate those software configurations that are useful and relevant for the given community. (Typically pre-configured scientific models and algorithms). To maximise usability of VMIs across cloud sites the images should be in a format that’s supported at every federation member site (Or at least can be converted to such formats). Users also need  a system that automatically replicates VMIs from the VMI catalogue to the federation member sites, as well as removes them when needed. Automated replication can ensure consistency of capabilities across sites and is very often coupled with a VMI vetting process to ensure that only properly working, and relevant VMIs are replicated to the cloud sites of the community
 
=== AppDB Cloud MarketPlace ===
 
The [https://appdb.egi.eu/ EGI AppDB] service has been extended to a Virtual Appliance Marketplace. This brings about a new category of software entries, called Virtual Appliances (VAs), which are, in all practical manners, clean-and mean virtual machine images designed to run on a virtualization platform, that provide a software solution out-of-the-box, ready to be used with minimal or no set-up needed within the EGI Federated Cloud infrastruture.
 
=== VM Image distribution ===
 
AppDB's Virtual Appliance Marketplace provides the ground for managing and publishing versioned repositories of virtual appliances using HEPiX image lists.
 
Research Communities ultimately create and update VM Images (or delegate this functionality). The Images themselves are stored in Appliance repositories that are provided and managed elsewhere, typically by the Research Community itself. A representative of the Research Community then generates a VM Image list (or updates an existing one) using AppDB user interface. Federated Clouds Resource Provider then subscribe to changes in VM Image lists by regularly downloading the list from AppDB, and comparing it against local copies. New and updated VM Images are downloaded from the appliance repository referenced in the VM Image list into a local staging cache and, where required, made available for further examination and assessment.
 
Ultimately, Cloud resource Providers will make VM Images available for immediate instantiation by the Research Community.
 
==== Implementations ====
; [https://github.com/the-cloudkeeper-project/cloudkeeper cloudkeeper ]:
: Provides automated synchronisation between AppDB and OpenStack/OpenNebula providers.
 
== Service Registry: GOCDB ==
 
The Service Registry contains general information about the providers participating to the infrastructure and their capabilities. The registry provides the ‘big picture view’ about the federation for both human users and online services (such as service monitors).
 
EGI’s central service catalogue is used to catalogue the static information of the production infrastructure topology. The service is provided using the GOCDB tool that is developed and deployed within EGI. To allow Resource Providers to expose Cloud resources to the production infrastructure, a number of service types are available:
* <code>org.openstack.nova</code>
* <code>org.openstack.swift</code>
* <code>eu.egi.cloud.accounting</code>
* <code>eu.egi.cloud.vm-management.occi</code>
* <code>eu.egi.cloud.vm-metadata.marketplace</code>
 
All providers '''must''' enter cloud service endpoints to GOCDB in order to enable integration with other operational tools.
 
Further information about GOCDB can be find on the following page: [[GOCDB/Input System User Documentation]].
 
== Information Discovery ==
 
The information system provides a real-time view about the actual capabilities and load of federation participants. The information system can be used by both human users and online services.
 
=== BDII and GlueSchema ===
 
Users and tools can discover the available resource in the infrastructure by querying EGI information discovery services. The common information system deployed at EGI is based on the Berkeley Database Information Index (BDII) with a hierarchical structure distributed over the whole infrastructure.
 
The information system is structured in three levels: the services publish their information (e.g. specific capabilities, total and available capacity or user community supported by the service) using an OGF recommended standard format, [http://www.ogf.org/documents/GFD.147.pdf GLUE2]. The information published by the services is collected by a Site-BDII, a service deployed in every site in EGI. The Site-BDIIs are queried by the Top-BDIIs - a national or regional located level of the hierarchy, which contain the information of all the site services available in the infrastructure and their services. NGIs usually provide an authoritative instance of Top-BDII, but every Top-BDII, if properly configured, should contain the same set of information.
 
Resource Providers must provide a Site-BDII endpoint that published information on the available resource following the GLUE2 schema. Even if the GLUE2 schema defines generic computing and storage entities, it was developed originally for Grid resources and can represent only partially the information needed by the Cloud users. Thus, the EGI Federated Cloud is working within the GLUE2 WG at OGF to profile and extend the schema to represent Cloud Computing, Storage and in the future Platform and Software services. The proposed extensions are currently under discussion at the WG.
EGI provides an implementation for service-level information that generates information supporting OpenStack and OpenNebula, Synnefo support is currently being added. The information is published in a different subtree (<code>Glue2GroupID=cloud</code>) so it can coexist with grid information and is easily discoverable by users.
 
Information available for each provider:
* Cloud computing resources
* Service endpoint
* Capabilities provided by the service, such as: virtual machine management or snapshot taking. The labels that identify the capabilities are agreed within the taskforce.
* Interface, the type of interface – e.g. webservice or webportal – and the interface name and version, for example OCCI 1.2.0
* User authentication and authorization profiles supported by the service, e.g. X.509 certificates
* Virtual machines images made available by the cloud provider
* Resource templates (number of cores and physical memory) allocable in a virtual machine.
 
=== Implementations ===
 
; [https://github.com/EGI-FCTF/cloud-info-provider Cloud information provider]
:Generates Glue 2 information by querying the Cloud Management Framework.
 
== Accounting ==
 
Federated Accounting provides an integrated view about resource/service usage: it pulls together usage information from the federated sites and services, integrates the data and presents them in such a way that both individual users as well as whole communities can monitor their own resource/service usage across the whole federation.
 
=== Cloud Usage Record  ===
 
EGI Federated Cloud has agreed on a Cloud Usage Record -which inherits from the [https://www.ogf.org/documents/GFD.98.pdf OGF Usage record]- that defines the data that resource providers must send to EGI’s central Accounting repository.
 
Version 0.4 of the Cloud Accounting Usage Record was agreed at the FedCloud Face to Face in Amsterdam in January 2015. The definition with comments found [[Media:Cloud_Accounting_Usage_Record_Schema_v0.4-final.pdf|here]]. A summary table of the format is shown below:
 
<br>
 
{| style="border:1px solid black; text-align:left;" class="wikitable sortable" cellspacing="0" cellpadding="5"
|- style="background:lightgray;"
! style="border-bottom:1px solid black;" | Cloud Usage Record Property
! style="border-bottom:1px solid black;" | Type
! style="border-bottom:1px solid black;" | Null
! style="border-bottom:1px solid black;" | Definition
|-
| VMUUID
| varchar(255)
| No
| Virtual Machine's Universally Unique Identifier concatenation of CurrentTime, SiteName and MachineName
|-
| SiteName
| varchar(255)
| No
| GOCDB SiteName - GOCDB now has cloud service types and a cloud-only site is allowed.
|-
| CloudComputeService (NEW)
| varchar(255)
|
| Name identifying cloud resource within the site. Allows multiple cloud resources within a site. i.e. a level of granularity.
|-
| MachineName
| varchar(255)
| No
| VM Id - the site name for the VM
|-
| LocalUserId
| varchar(255)
|
| Local user name
|-
| LocalGroupId
| varchar(255)
|
| Local group name
|-
| GlobalUserName
| varchar(255)
|
| Global identity of user (certificate DN)
|-
| FQAN
| varchar(255)
|
| Use if VOs part of authorization mechanism
|-
| Status
| varchar(255)
|
| Completion status - completed, started or suspended
|-
| StartTime
| datetime
|
| Must be set when Status = started
|-
| EndTime
| datetime
|
| Set to NULL until Status = completed
|-
| SuspendDuration
| datetime
|
| Set when Status = suspended (Timestamp)
|-
| WallDuration
| int
|
| WallClock time - actual time used
|-
| CpuDuration
| int
|
| CPU time consumed (Duration)
|-
|-
| CpuCount
| style="padding-right: 15px; padding-left: 15px;" |  
| int
| [[Image:Alert.png]] This page is '''obsoleted''' by [https://egi-federated-cloud.readthedocs.io/en/latest/federation.html the updated version here].
|
| Number of CPUs allocated
|-
| NetworkType
| varchar(255)
|
| Needs clarifying
|-
| NetworkInbound
| int
|
| GB received
|-
| NetworkOutbound
| int
|
| GB sent
|-
| PublicIPCount (NEW)
| int
|
| Number of public IP addresses assigned to VM '''Not used'''. See [[#Public_IP_Usage_Record|Public IP Usage Record]]
|-
| Memory
| int
|
| Memory allocated to the VM
|-
| Disk
| int
|
| Size in GB allocated to the VM
|-
| BenchmarkType (NEW)
| varchar(255)
|
| Name of benchmark used for normalization of times (eg HEPSPEC06)
|-
| Benchmark (NEW)
| Decimal
|
| Value of benchmark of VM using ServiceLevelType benchmark’
|-
| StorageRecordId
| varchar(255)
|
| Link to other associated storage record Need to check feasibility
|-
| ImageId
| varchar(255)
|
| Every image has a unique ID associated with it.  
*For images from the EGI FedCloud AppDB this should be VMCATCHER_EVENT_AD_MPURI
*For images from other repositories it should be a vmcatcher equivalent
*For local images - local identifier of the image
 
|-
| CloudType
| varchar(255)
|
| Type of cloud infrastructure: OpenNebula; OpenStack; Synnefo; etc.
|}
|}


=== Public IP Usage Record  ===
EGI Federated Cloud has agreed on an IP Usage Record. The format uses many of the same fields as the Cloud Usage Record. The Usage Record should be a "snapshot" of the number of IPs currently assigned to a user. A table defining v0.2 of the format is shown below:
{| style="border:1px solid black; text-align:left;" class="wikitable sortable" cellspacing="0" cellpadding="5"
|- style="background:lightgray;"
! style="border-bottom:1px solid black;" | Cloud Usage Record Property
! style="border-bottom:1px solid black;" | Type
! style="border-bottom:1px solid black;" | Null
! style="border-bottom:1px solid black;" | Definition
! style="border-bottom:1px solid black;" | Notes
|-
| MeasurementTime
| DateTime
| No
| The time the usage was recorded
| In the message format, this must be a UNIX timestamp, i.e. the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970)
|-
| SiteName
| Varchar(255)
| No
| The GOCDB site assigning the IP
|
|-
| CloudComputeService
| Varchar(255)
| Yes
| See [[#Cloud_Usage_Record|Cloud Usage Record]]
|
|-
| CloudType
| Varchar(255)
| No
| See [[#Cloud_Usage_Record|Cloud Usage Record]]
|
|-
| LocalUser
| Varchar(255)
| No
| See [[#Cloud_Usage_Record|Cloud Usage Record]]
|
|-
| LocalGroup
| Varchar(255)
| No
| See [[#Cloud_Usage_Record|Cloud Usage Record]]
|
|-
| GlobalUserName
| Varchar(255)
| No
| See [[#Cloud_Usage_Record|Cloud Usage Record]]
|
|-
| FQAN
| Varchar(255)
| No
| See [[#Cloud_Usage_Record|Cloud Usage Record]]
|
|-
| IPVersion
| Byte
| No
| 4 or 6
|
|-
| IPCount
| int(11)
| No
| The number of IP addresses of IPType this user currently assigned to them
|
|}
A JSON schema defining a valid Public IP Usage message can be found at [https://github.com/apel/apel/blob/9476bd86424f6162c3b87b6daf6b4270ceb8fea6/apel/db/__init__.py].
=== APEL and accounting portal ===
Once generated, records are delivered via the network of EGI message brokers to the central accounting repository using APEL SSM (Secure STOMP Messenger) provided by STFC. SSM client packages can be obtained at https://apel.github.io. A Cloud Accounting Summary Usage Record has also been defined and summaries created on a daily basis from all the accounting records received from the Resource Providers are sent to the EGI Accounting Portal. The [http://accounting-devel.egi.eu/egi.php EGI Accounting Portal] also runs SSM to receive these summaries and provides a web page displaying different views of the Cloud Accounting data received from the Resource Providers.
==== Implementations ====
; [https://github.com/EGI-FCTF/oneacct_export oneacct_export]
: OpenNebula Accounting probe
; [https://github.com/IFCA/caso| cASO]
: OpenStack Accounting Probe
== EGI A/R Monitoring ==
The participating providers may share certain operational tools and practices at the level of the federation, for example use a shared system to collect availability and reliability statistics about their site, or to share and respond to security alerts. 
Services in the EGI infrastructure are monitored via [https://argoeu.github.io/ ARGO]. Specific probes to check functionality and availability of services must be provided by service developers, The current set of probes used for monitoring cloud resources consists of:
* OCCI probes (eu.egi.cloud.OCCI-VM and eu.egi.cloud.OCCI-Context): OCCI-VM creates an instance of a given image by using OCCI, checks its status and deletes it afterwards. OCCI-Context checks that the OCCI interfaces correctly supports the standard and the FedCloud contextualization extension.
* Accounting probe (eu.egi.cloud.APEL-Pub): Checks if the cloud resource is publishing data to the Accounting repository
* TCP checks (org.nagios.Broker-TCP, org.nagios.CDMI-TCP, org.nagios.OCCI-TCP and org.nagios.CloudBDII-Check): Basic TCP checks for services.
* VM Marketplace probe (eu.egi.cloud.AppDB-Update): gets a predetermined image list from AppDB and checks its update interval.
* Perun probe (eu.egi.cloud.Perun-Check): connects to the server and checks the status by using internal Perun interface
Probes for the image synchronization mechanism are currently under development. More information on cloud probes can be at [[Cloud SAM tests]].
Currently a [https://cloudmon.egi.eu/nagios central instance] specific to the activities of the EGI Federated Clouds Task has been deployed for monitoring test bed Results of cloud probes are visible on the [http://mon.egi.eu/myegi/sa/ central SAM interface] under profile <code>ch.cern.sam-CLOUD-MON</code> and <code>ch.cern.sam-CLOUD-MON_CRITICAL</code>.
= Roadmap =
The TCB-Cloud board defines the roadmap for the technical evolution of the EGI Cloud. All the components are continuously maintained to:
* Improve their programmability, providing complete APIs specification in adequate format for facilitating the generation clients (e.g. following the OpenAPI initiative) and Swagger.
* Lower the barriers to integrate and operate resource centres in the federation by a) minimizing the number of components used; b) contributing code to upstream distributions; and c) use only public APIs of the Cloud Management Frameworks.
Currently the EGI FedCloud TaskForce is focused on moving to a central operations model, where providers only need to integrate their system with EGI Check-in but do not need to deploy and configure the different tools (accounting, discovery, VMI management, etc.) locally but delegate this to a central EGI team.
The table below summarises the main activities planned for the short term:
{| style="border:1px solid black; text-align:left;" class="wikitable sortable" cellspacing="0" cellpadding="5"
|- style="background:lightgray;"
! style="border-bottom:1px solid black;" | Component
! style="border-bottom:1px solid black;" | Evolution
|-
| '''Cloud information provider'''
|
* Use messaging system for information transport.
* Implementation of GlueSchema 2.1
|-
| '''CloudKeeper'''
| * Move to a VO-scoped model for central operations
|-
| '''AppDB Cloud Marketplace'''
|
* Integration with security tools, better control of images
* Endorser dashboard
* Security dashboard
* VO-wide image list dashboard
* Docker repository
|-
| '''AppDB VMOps'''
|
* Improve support for OpenID Connecto
* Improve support for native APIs
|-
| '''AAI'''
|
* Remove X.509 for users and sites
* Complete support for OIDC at the providers and clients
|-
| '''Accounting'''
|
* Public IP address accounting
* Storage accounting
|}


[[Category:Federated_Cloud]]
[[Category:Federated_Cloud]]

Latest revision as of 15:42, 3 March 2020