Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "GOCDB/Release4/XML Input"

From EGIWiki
Jump to navigation Jump to search
 
(76 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<< Back to [[GOCDB/Documentation_Index]]
{{Template:Op menubar}}
{{Template:GOCDB_menubar}}
{{TOC_right}}
[[Category:GOCDB]]
===Overivew===
===Overivew===
The XML input module is used to insert PROM objects and links between PROM objects into the database. The properties of the objects and their relationships to each other are reflected in the structure of the XML input file (a nested/hierarchical input file structure inserts data according to established parent-child object relationships). Data for many objects can be defined within a single input file.
The XML input module is used to insert [[GOCDB/PROM#Objects:_Groups_of_Data|PROM objects]] and [[GOCDB/PROM#Links_between_objects|links between PROM objects]] directly into the database. XML Input is most commonly used for CRUD operations that cannot easily be accomplished in the web portal and for populating a new GOCDB system with data. The properties of the objects and their relationships to each other are reflected in the structure of an XML input file (a nested/hierarchical structure is used to insert data according to established parent-child object relationships). Data for many objects and links can be defined within a single input file.


===XML Input File Formatting Rules===
===XML Input File Formatting Rules===
Line 8: Line 11:
* Element names and attribute values ARE case-sensitive.  
* Element names and attribute values ARE case-sensitive.  
* The root element name is NOT significant (e.g. <results> is used for the Sample and Seed data).
* The root element name is NOT significant (e.g. <results> is used for the Sample and Seed data).
* The root element can contain one or more child Object elements.  
* The root element can contain one or more child <b>Object Elements</b> (e.g. <service_endpoint>) to be inserted into the database.  
* An Object element MUST be named after an object that is already declared using the <object_type_name> value in the ‘$GOCDB_HOME/config/gocdb_schema.xml’ file.  (In turn, the <abstract_class> that encloses the <object_type_name> is used to map the object and its properties directly to a database table – see ‘gocdb_schema.xml’ for more information).  
* An Object element MUST be named after an object that is already declared using the <object_type_name> value in the ‘$GOCDB_HOME/config/gocdb_schema.xml’ file.  (In turn, the <abstract_class> that encloses the <object_type_name> is used to map the object and its properties directly to a database table – see ‘gocdb_schema.xml’ for more information).  
* Object elements MUST have a ‘gridID’ attribute of type integer. This value SHOULD always be ‘0’ to signify that the inserted object(s) belong to that grid. When inserting nested parent-child objects, the gridID value MUST be identical.   
* Object elements MUST have a ‘gridID’ attribute of type integer. This value SHOULD always be ‘0’ to signify that the inserted object(s) belong to that grid. When inserting nested parent-child objects, the gridID value MUST be identical.   
* An Object element MUST contain one or more child elements that MUST be of the following type (ordering is NOT significant and multiples of both types ARE allowed):
* An Object element MUST contain one or more child elements that MUST be of the following type (ordering is NOT significant and multiples of both types ARE allowed):
** a) A database Row element(s). This is a leaf element that MUST NOT nest child elements.  The Row element’s name maps directly to an object’s property/field name (which corresponds to the <abstract_class><field><fname> value in ‘gocdb_schema.xml’). The Row element value provides the property value. Row elements do NOT specify the ‘gridID’ attribute.  
** a) A <b>Property Element</b> to define a property of the object (e.g. <short_name> which contains a site's name). Since this is a leaf element, it MUST NOT nest child elements.  The property element’s name must match one of the defined field names for this object type in the gocdb_schema.xml file. Property elements do NOT specify the ‘gridID’ attribute. Each property element will be inserted into a single field (row+column combination) in the database.
** b) A nested child Object element(s).  
** b) A nested child Object element(s).  
* A nested child Object element can only have a SINGLE parent element in that particular nesting hierarchy/branch. If an Object has many parents, each parent-child relationship needs to be defined directly beneath the root element as a new parent-child branch (a child object can only be enclosed by a single parent).  
* A nested child Object element can only have a SINGLE parent element in that particular nesting hierarchy/branch. If an Object has many parents, each parent-child relationship needs to be defined directly beneath the root element as a new parent-child branch (this is because a child object can only be enclosed by a single parent).  
* A Row element MAY define the key="primary" attribute to signify that this property value must NOT BE NULL and IS UNIQUE within the database (it effectively marks a PK - primary key property). Important: If an object already exists in the database that defines the same property value, then ALL the properties of that object will be updated with the new values of the given Object’s properties (rather than a new object being inserted).  
* A property element MAY define the <b>key="primary"</b> attribute to signify that this property value must NOT BE NULL and IS UNIQUE within the database (it effectively marks a PK - primary key property). Important: If an object already exists in the database that defines the same property value, then the properties of that object will be updated with the values given in the XML file rather than a new object being inserted.  
* An Object element MUST NOT define the key=”primary” attribute.
* An Object element MUST NOT define the key=”primary” attribute.
* To UPDATE selected properties of an object, use the key="primary" attribute to identify the object and fill in the child property elements. You can specify just the corresponding property elements (you do not need to provide all property elements, only those specified will be used to update the object).


===Ensure Mandatory Relationships are Established on Initial Inserts===
===Mandatory Relationships===
The GOCDB web portal and programmatic interface uses implicit definitions for entities such as a site, service endpoint, user or downtime. Each of these entities are made up of a specific group of PROM objects and links between those objects. For example the service endpoint entity as viewed through the web portal and programmatic interface is made up of the following components:
The GOCDB web portal and programmatic interface use implicit definitions of the site and service endpoint entities. These entities are made up of a specific group of PROM objects linked together in a particular way. For example the service endpoint entity as viewed through the web portal and programmatic interface is made up of the following components:


(Note: PROM object types and link types are described in gocdb_schema.xml)
(Note: PROM object types and link types are described in gocdb_schema.xml)
* A PROM object of type service_endpoint describing the service endpoint itself
* A PROM object of type service_endpoint describing the service endpoint itself
* A link between a service_type PROM object and the above service_endpoint PROM object. The link must be of type 4: the child service_endpoint has the type defined in the parent service_type.  
* A link between a service_type PROM object and the above service_endpoint PROM object. The link must be of type 4 meaning that the child service_endpoint has the type defined in the parent service_type.
* A link of type 6 (parent site provides child endpoint) between the service_endpoint and it's hosting site
* A link of type 6 (parent site provides child endpoint) between the service_endpoint and it's hosting site
* A link of type 43 (child service_endpoint has parent tag) between the service_endpoint and a tag object. (This link is used to define whether a service endpoint is "[[GOCDB/Input_System_User_Documentation#Data_Visibility|Visible to EGI]]" or not).
* A link of type 43 (child service_endpoint has parent tag) between the service_endpoint and a tag object. (This link is used to define whether a service endpoint is "[[GOCDB/Input_System_User_Documentation#Data_Visibility|Visible to EGI]]" or not).


Inserting a service_endpoint object without these supporting objects (for example, through XML Input) will render it invisible to the web portal and the programmatic interface. However when adding a service endpoint through the web portal, the above objects and links are created based on the user's input from the form.
The full definition for each entity is below:


[[GOCDB/Relase4/XML_Input/Entities/Site|Site]]


* On initial insertion of NEW objects, ensure that ALL MANDATORY parent-child relationships are inserted. This is neccessary so that subsequent SQL SELECT statements can successfully (INNER) JOIN across data tables when viewing linked object graphs in the portal.
[[GOCDB/Relase4/XML_Input/Entities/Service_Endpoint|Service Endpoint]]
   
* To UPDATE selected attributes of an object, you can specify just the corresponding leaf elements (you do not need to provide all leaf elements, only the specified object attributes specified will be updated).
 
To finish.
 
[Parent -> Child]


tag -> service
If an entity isn't created as expected then it won't appear in the web portal or programmatic interface. For example inserting a service_endpoint object through XML Input without the supporting objects detailed above will render it invisible to the web portal and the programmatic interface. Conversely when adding a service endpoint through the web portal, the above objects and links are created based on the user's input from the form.


tag -> service_endpoint
===Primary Keys and Cardinality===
Implicit primary keys are used to uniquely identify objects within the XML Input module. For example, a site should be uniquely identified by it's SHORT_NAME property. Primary keys aren't explicitly defined or enforced within the system.


service_type -> service_endpoint
====Primary Keys====
{| {{egi-table}}
! PROM Object Type !! Primary Key Field
|-
| site || SHORT_NAME
|-
| service_endpoint || ENDPOINT
|-
| downtime || (None)
|-
| user || CERTIFICATE_DN
|-
| group_type || NAME
|-
| certification_status || NAME
|-
| service_type || NAME
|-
| political_role || NAME
|-
| political_role_type || NAME
|-
| political_role_request || NAME
|-
| political_role_request_denial || NAME
|-
| domain || NAME
|-
| group || NAME
|-
| endpoint_location || URL
|-
| tag || NAME
|-
| admin || CERTIFICATE_DN
|-
| virtual_site || NAME
|}


Also need to highlight which are the key="primary" values and explain Cardinality.
When creating a link between two objects XML Input will perform a cardinality check. Each link type defined in gocdb_schema.xml has a cardinality property limiting the number of instances of the link. The cardinality options available are:
{| {{egi-table}}
! Cardinality Property !! Name !! Example !! Notes
|-
| n || Many parents to many children || Many service endpoints may be linked to many downtimes (link type 15) ||
|-
| 1 || One parent to many children || One parent timezone can be linked to many child sites (link type 1) || In XML Input, if an existing relationship of this type prevents a new link being created, the existing relationship is revoked then the new one added.
|-
| mix[n] || OBSOLETE || N/A || This link type is now obsolete and isn't enforced
|}


===Example Explained===
===Example Explained===
Line 61: Line 107:
     Object elements have the 'grid_id' attribute.  
     Object elements have the 'grid_id' attribute.  


     The 'name' and 'description' leaf elements map to an object's attributes
     The 'name' and 'description' property elements map to an object's properties.
     these elements have no children).  
     Property elements are leaf elements (they have no children).  


     key="primary" signifies that the name attribute is NOT NULL and UNIQUE  
     key="primary" signifies that the name property must not be NOT NULL and UNIQUE  
     in the database (it is a Primary key). If a 'service_type' object  
     in the database (it is a Primary key). If a 'service_type' object  
     already exists in the database with a name of 'Site-BDII', its  
     already exists in the database with a name of 'Site-BDII', its  
Line 87: Line 133:
    
    
     <!--  
     <!--  
     'group' is the name of the object used to define ROCs/NGIs.
     'group' is the name of the object type used to define ROCs/NGIs, countries and production statuses.
     -->  
     -->  
     <group grid_id="0">
     <group grid_id="0">
         <!--  
         <!--  
         'name' and 'email' are leaf elements that map to object attributes (see above).   
         'name' and 'email' are leaf elements that map to object properties (see above).   
         -->
         -->
         <name key="primary">AsiaPacific</name>
         <name key="primary">AsiaPacific</name>
Line 108: Line 154:
         -->
         -->
         <site grid_id="0">
         <site grid_id="0">
             <!-- The site's SHORT_NAME attribute uniquely identifies this site -->
             <!-- The site's SHORT_NAME property uniquely identifies this site -->
             <SHORT_NAME key="primary">YAS-FTP</SHORT_NAME>
             <SHORT_NAME key="primary">YAS-FTP</SHORT_NAME>
             <!-- A site has a single domain -->
             <!-- A site should have a single domain -->
             <domain grid_id="0">
             <domain grid_id="0">
                 <name key="primary">myurl.com</name>
                 <name key="primary">myurl.com</name>
Line 178: Line 224:


     <!--
     <!--
     On initial insertion of NEW objects, ensure that ALL MANDATORY
     On initial insertion of NEW sites and service endpoints, ensure that ALL MANDATORY
     parent-child relationships are inserted/established (e.g. between  
     parent-child relationships are inserted/established (e.g. between tag and
     service_endpoint and tag). This is neccessary so that subsequent SQL
     service_endpoint, and tag and site). This is neccessary so that the sites and endpoints
    SELECT statements can successfully (INNER) JOIN across data tables when
     appear in the web portal and programmatic interface.
     viewing object graphs in the portal.  
     -->
     -->
     <tag grid_id="0">
     <tag grid_id="0">
Line 193: Line 238:
         </service_endpoint>
         </service_endpoint>
     </tag>
     </tag>
    <tag grid_id="0">
      <name key="primary">EGI</name>
        <site grid_id="0">
          <SHORT_NAME key="primary">YAS-FTP</SHORT_NAME> 
        </site>
    </tag>
      
      
    
    
</results>
</results>
</source>
</source>

Latest revision as of 12:33, 18 December 2012

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


GOC DB menu: Home Documentation Index


Overivew

The XML input module is used to insert PROM objects and links between PROM objects directly into the database. XML Input is most commonly used for CRUD operations that cannot easily be accomplished in the web portal and for populating a new GOCDB system with data. The properties of the objects and their relationships to each other are reflected in the structure of an XML input file (a nested/hierarchical structure is used to insert data according to established parent-child object relationships). Data for many objects and links can be defined within a single input file.

XML Input File Formatting Rules

The formatting rules of an input XML document are as follows:

  • Element names and attribute values ARE case-sensitive.
  • The root element name is NOT significant (e.g. <results> is used for the Sample and Seed data).
  • The root element can contain one or more child Object Elements (e.g. <service_endpoint>) to be inserted into the database.
  • An Object element MUST be named after an object that is already declared using the <object_type_name> value in the ‘$GOCDB_HOME/config/gocdb_schema.xml’ file. (In turn, the <abstract_class> that encloses the <object_type_name> is used to map the object and its properties directly to a database table – see ‘gocdb_schema.xml’ for more information).
  • Object elements MUST have a ‘gridID’ attribute of type integer. This value SHOULD always be ‘0’ to signify that the inserted object(s) belong to that grid. When inserting nested parent-child objects, the gridID value MUST be identical.
  • An Object element MUST contain one or more child elements that MUST be of the following type (ordering is NOT significant and multiples of both types ARE allowed):
    • a) A Property Element to define a property of the object (e.g. <short_name> which contains a site's name). Since this is a leaf element, it MUST NOT nest child elements. The property element’s name must match one of the defined field names for this object type in the gocdb_schema.xml file. Property elements do NOT specify the ‘gridID’ attribute. Each property element will be inserted into a single field (row+column combination) in the database.
    • b) A nested child Object element(s).
  • A nested child Object element can only have a SINGLE parent element in that particular nesting hierarchy/branch. If an Object has many parents, each parent-child relationship needs to be defined directly beneath the root element as a new parent-child branch (this is because a child object can only be enclosed by a single parent).
  • A property element MAY define the key="primary" attribute to signify that this property value must NOT BE NULL and IS UNIQUE within the database (it effectively marks a PK - primary key property). Important: If an object already exists in the database that defines the same property value, then the properties of that object will be updated with the values given in the XML file rather than a new object being inserted.
  • An Object element MUST NOT define the key=”primary” attribute.
  • To UPDATE selected properties of an object, use the key="primary" attribute to identify the object and fill in the child property elements. You can specify just the corresponding property elements (you do not need to provide all property elements, only those specified will be used to update the object).

Mandatory Relationships

The GOCDB web portal and programmatic interface use implicit definitions of the site and service endpoint entities. These entities are made up of a specific group of PROM objects linked together in a particular way. For example the service endpoint entity as viewed through the web portal and programmatic interface is made up of the following components:

(Note: PROM object types and link types are described in gocdb_schema.xml)

  • A PROM object of type service_endpoint describing the service endpoint itself
  • A link between a service_type PROM object and the above service_endpoint PROM object. The link must be of type 4 meaning that the child service_endpoint has the type defined in the parent service_type.
  • A link of type 6 (parent site provides child endpoint) between the service_endpoint and it's hosting site
  • A link of type 43 (child service_endpoint has parent tag) between the service_endpoint and a tag object. (This link is used to define whether a service endpoint is "Visible to EGI" or not).

The full definition for each entity is below:

Site

Service Endpoint

If an entity isn't created as expected then it won't appear in the web portal or programmatic interface. For example inserting a service_endpoint object through XML Input without the supporting objects detailed above will render it invisible to the web portal and the programmatic interface. Conversely when adding a service endpoint through the web portal, the above objects and links are created based on the user's input from the form.

Primary Keys and Cardinality

Implicit primary keys are used to uniquely identify objects within the XML Input module. For example, a site should be uniquely identified by it's SHORT_NAME property. Primary keys aren't explicitly defined or enforced within the system.

Primary Keys

PROM Object Type Primary Key Field
site SHORT_NAME
service_endpoint ENDPOINT
downtime (None)
user CERTIFICATE_DN
group_type NAME
certification_status NAME
service_type NAME
political_role NAME
political_role_type NAME
political_role_request NAME
political_role_request_denial NAME
domain NAME
group NAME
endpoint_location URL
tag NAME
admin CERTIFICATE_DN
virtual_site NAME

When creating a link between two objects XML Input will perform a cardinality check. Each link type defined in gocdb_schema.xml has a cardinality property limiting the number of instances of the link. The cardinality options available are:

Cardinality Property Name Example Notes
n Many parents to many children Many service endpoints may be linked to many downtimes (link type 15)
1 One parent to many children One parent timezone can be linked to many child sites (link type 1) In XML Input, if an existing relationship of this type prevents a new link being created, the existing relationship is revoked then the new one added.
mix[n] OBSOLETE N/A This link type is now obsolete and isn't enforced

Example Explained

The XML input file example below demonstrates the rules defined above.

<?xml version="1.0" encoding="UTF-8"?>
<results> <!-- The root element name is not significant -->
    
    <!-- 
    'service_type' is an Object element. It refers to the value of an 
    <abstract_class><object_type_name> element that is defined in gocdb_schema.xml.
    Object elements have the 'grid_id' attribute. 

     The 'name' and 'description' property elements map to an object's properties. 
     Property elements are leaf elements (they have no children). 

     key="primary" signifies that the name property must not be NOT NULL and UNIQUE 
     in the database (it is a Primary key). If a 'service_type' object 
     already exists in the database with a name of 'Site-BDII', its 
     description will be updated and a new service_type will NOT be added.
    -->
    <service_type grid_id="0">
        <name key="primary">Site-BDII</name>
        <description>
            [Site service] This service collects and publishes site's data 
            for the Information System. All sites MUST install one Site-BDII. 
         </description>
    </service_type>
  
    <service_type grid_id="0">
        <name key="primary">CE</name>
        <description>
            [Site service] The LCG Compute Element. Currently the standard 
            CE within the gLite middleware stack. Soon to be replaced by the 
            CREAM CE. 
        </description>
    </service_type>
  
    <!-- 
    'group' is the name of the object type used to define ROCs/NGIs, countries and production statuses.
    --> 
    <group grid_id="0">
        <!-- 
        'name' and 'email' are leaf elements that map to object properties (see above).  
        -->
        <name key="primary">AsiaPacific</name>
        <email>contact@fakeemail.aproc.org</email>
        <!--
        Parent 'group' object is related to the child 'group_type' object. This 
        relationship is defined in gocdb_schema.xml file using the 
        <abstract_class> and <linktype> elements. 
        -->
        <group_type grid_id="0">
            <name key="primary">ROC</name>
            <description>Regional Operation Centre</description>
        </group_type>
        <!-- 
        A parent group (NGI) has a child 'site' object ('site' is nested in 'group'). 
        -->
        <site grid_id="0">
            <!-- The site's SHORT_NAME property uniquely identifies this site -->
            <SHORT_NAME key="primary">YAS-FTP</SHORT_NAME>
            <!-- A site should have a single domain -->
            <domain grid_id="0">
                <name key="primary">myurl.com</name>
            </domain>
            <!-- A site can have one or more child 'service_endpoint' objects -->
            <service_endpoint grid_id="0">
                <!-- A service_endpoint has the following attributes -->
                <ENDPOINT key="primary">myce.myurl.comCE</ENDPOINT>
                <HOSTNAME>myce.myurl.com</HOSTNAME>
                <HOST_IP>111.222.333.441</HOST_IP>
                <HOST_OS>RH7</HOST_OS>
                <HOST_ARCH>86x32</HOST_ARCH>
                <HOST_DN>/C=AU/O=YAS/O=host/OU=FTP/CN=myce.myurl.com</HOST_DN>
                <DESCRIPTION>YAS-FTP Computing Element.</DESCRIPTION>
                <PRODUCTION_LEVEL>N</PRODUCTION_LEVEL>
                <IS_MONITORED>N</IS_MONITORED> 
                <!-- A 'service_endoint' object can define a child 'endpoint_location'. -->
                <endpoint_location grid_id="0">
                    <URL>http://some.endpoint.url.ac.uk/endpoint1</URL>
                    <WSDL>http://some.wsdl.url.ac.uk/wsdl_1</WSDL>
                </endpoint_location>
            </service_endpoint>
            <service_endpoint grid_id="0">
                <ENDPOINT key="primary">mysbdii.myurl.comSite-BDII</ENDPOINT>
                <HOSTNAME>mysbdii.myurl.com</HOSTNAME>
                <HOST_IP>111.222.333.442</HOST_IP>
                <HOST_DN>/C=AU/O=YAS/O=host/OU=FTP/CN=mysbdii.myurl.com</HOST_DN>
                <HOST_OS>RH7</HOST_OS>
                <HOST_ARCH>86x32</HOST_ARCH>
                <DESCRIPTION>YAS-FTP Site BDII</DESCRIPTION>
                <PRODUCTION_LEVEL>N</PRODUCTION_LEVEL>
                <IS_MONITORED>N</IS_MONITORED>                                
                <endpoint_location grid_id="0">
                    <URL>http://some.endpoint.url.ac.uk/endpoint2</URL>
                    <WSDL>http://some.wsdl.url.ac.uk/wsdl_2</WSDL>
                </endpoint_location>
                
            </service_endpoint>
        </site>
    </group>
    
    <!-- 
    If an Object has many parents, each parent-child relationship needs 
    to be defined directly beneath the root element as a new parent-child 
    branch (i.e. start a new nesting branch as a child object can only 
    be enclosed by a single parent). 
    
    In this example, 'service_type' is a parent to 'service_endpoint'. 
    We can't define this relationship in the nesting branch above because 
    'site' is defined as the parent. 
    -->
    <service_type grid_id="0">
        <name key="primary">CE</name>
        <service_endpoint grid_id="0">
            <ENDPOINT key="primary">myce.myurl.comCE</ENDPOINT>
        </service_endpoint>
    </service_type>
    
    
    <service_type grid_id="0">
        <name key="primary">Site-BDII</name>
        <service_endpoint grid_id="0">
            <ENDPOINT key="primary">mysbdii.myurl.comSite-BDII</ENDPOINT>
        </service_endpoint>
    </service_type>


    <!--
    On initial insertion of NEW sites and service endpoints, ensure that ALL MANDATORY
    parent-child relationships are inserted/established (e.g. between tag and 
    service_endpoint, and tag and site). This is neccessary so that the sites and endpoints 
    appear in the web portal and programmatic interface.
    -->
    <tag grid_id="0">
        <name key="primary">EGI</name>
        <service_endpoint grid_id="0">
            <ENDPOINT key="primary">myce.myurl.comCE</ENDPOINT>
        </service_endpoint>
        <service_endpoint grid_id="0">
            <ENDPOINT key="primary">mysbdii.myurl.comSite-BDII</ENDPOINT>
        </service_endpoint>
    </tag>
    <tag grid_id="0">
       <name key="primary">EGI</name>
        <site grid_id="0">
           <SHORT_NAME key="primary">YAS-FTP</SHORT_NAME>  
        </site>
    </tag> 
    
   
</results>