Difference between revisions of "USG Using AMGA Metadata Catalog"
Line 7: | Line 7: | ||
---- | ---- | ||
This document has the goal of providing basic information on the usage of the AMGA Metadata Catalog. | |||
<div class="sect2" title="Introduction"><div class="titlepage"><div><div> | |||
=== Introduction === | |||
</div></div></div> | |||
The <span class="emphasis">''AMGA Metadata Catalog''</span> is the EGI <span class="emphasis">''gLite''</span> service that allows metadata handling on the grid. The main usage can be as a "Front-end" file metadata service, providing means of describing and discovering data files required by users and their jobs. It can also be used as a Grid-Enabled Database for applications that require to structure their data, proving a database-like service supporting Grid Security features (X509 Proxies and the VOMS authentication and authorization system). Finally, an additional feature allows the accessing of existing relational databases from a grid environment (<span class="emphasis">''Worker Nodes''</span>, <span class="emphasis">''User Interface''</span>, etc), which enables the addition of Grid Security to existing DBs. | |||
Users and applications can interact with an AMGA server using <span class="emphasis">''command'' | |||
</span> | |||
line tools'':'' | |||
line tools'' | |||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | *<code class="code">mdcli/mdclient</code> available for Scientic Linux flavours (they can be built easily to other platforms) | ||
<code class="code">mdcli/mdclient</code> available for Scientic Linux flavours (they can be built easily to other platforms) | |||
* | *<code class="code">mdjavacli/mdjavaclient</code>, the Java versions of the previous one that allows the interaction from any platform | ||
<code class="code">mdjavacli/mdjavaclient</code>, the Java versions of the previous one that allows the interaction from any platform | </div> | ||
</div> | or through <span class="emphasis">''APIs''</span>, available for: | ||
or through <span class="emphasis">''APIs''</span>, available for: | |||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | *C++ | ||
C++ | |||
* | *Java | ||
Java | |||
* | *Python | ||
Python | |||
* | *Perl | ||
Perl | |||
* | *PHP | ||
PHP | </div> | ||
</div> | Some of the <code class="code">mdcli/mdclient</code> command line tools are explained below. | ||
Some of the <code class="code">mdcli/mdclient</code> command line tools are explained below. | </div><div class="sect2" title="AMGA Metadata Basic Concepts"><div class="titlepage"><div><div> | ||
</div><div title="AMGA Metadata Basic Concepts | === AMGA Metadata Basic Concepts === | ||
=== AMGA Metadata Basic Concepts === | </div></div></div> | ||
</div></div></div> | There are certain fundamental concepts which must be understood when dealing with AMGA | ||
There are certain fundamental concepts which must be understood when dealing with AMGA | |||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | *<span class="bold">'''Entry'''</span> - it is the representation of the real world entity which we are attaching metadata to in order to describe it | ||
<span class="bold">'''Entry'''</span> - it is the representation of the real world entity which we are attaching metadata to in order to describe it | |||
* | *<span class="bold">'''Attribute'''</span> - key/value pair. It has: | ||
<span class="bold">'''Attribute'''</span> - key/value pair. It has: | |||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | *<span class="emphasis">''Type''</span> - The type (int, float, string, etc...) | ||
<span class="emphasis">''Type''</span> - The type (int, float, string, etc...) | |||
** | **<span class="emphasis">''Name/Key''</span> - The name of the attribute | ||
<span class="emphasis">''Name/Key''</span> - The name of the attribute | |||
** | **<span class="emphasis">''Value''</span> - The value of an entry's attribute | ||
<span class="emphasis">''Value''</span> - The value of an entry's attribute | </div> | ||
</div> | *<span class="bold">'''Schema'''</span> - A set of attributes | ||
* | |||
<span class="bold">'''Schema'''</span> - A set of attributes | |||
* | *<span class="bold">'''Collection'''</span> - A set of entries associated with a schema | ||
<span class="bold">'''Collection'''</span> - A set of entries associated with a schema | |||
* | *<span class="bold">'''Metadata'''</span> - The list of attributes (including their values) associated with entries | ||
<span class="bold">'''Metadata'''</span> - The list of attributes (including their values) associated with entries | </div> | ||
</div> | If we want to make an analogy with the RDBMS world, we have the following: | ||
If we want to make an analogy with the RDBMS world, we have the following: | |||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | *schema = table schema | ||
schema = table schema | |||
* | *collection = database table | ||
collection = database table | |||
* | *attribute = schema column | ||
attribute = schema column | |||
* | *entry = table row/record | ||
entry = table row/record | </div> | ||
</div> | By analogy with a file system, we have: | ||
By analogy with a file system, we have: | |||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | *collection = directory | ||
collection = directory | |||
* | *entry = file | ||
entry = file | </div> | ||
</div> | |||
<span class="bold">'''In the AMGA help and documentation, often directory is used to refer to collection, as file refers to entry'''</span> | <span class="bold">'''In the AMGA help and documentation, often directory is used to refer to collection, as file refers to entry'''</span> | ||
<span class="emphasis">''Example: Metadata for movies''</span> | <span class="emphasis">''Example: Metadata for movies''</span> | ||
Movie files (<span class="emphasis">''entries''</span>) could be saved on Grid <span class="emphasis">''Storage Elements''</span> and registered into a <span class="emphasis">''File Catalogue''</span>. We want to add <span class="emphasis">''metadata''</span> to describe the movie content. A possible <span class="emphasis">''schema''</span> could be: | Movie files (<span class="emphasis">''entries''</span>) could be saved on Grid <span class="emphasis">''Storage Elements''</span> and registered into a <span class="emphasis">''File Catalogue''</span>. We want to add <span class="emphasis">''metadata''</span> to describe the movie content. A possible <span class="emphasis">''schema''</span> could be: | ||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | *<code class="code">Title</code> -- varchar | ||
<code class="code">Title</code> -- varchar | |||
* | *<code class="code">Runtime</code> -- int | ||
<code class="code">Runtime</code> -- int | |||
* | *<code class="code">Cast</code> -- varchar | ||
<code class="code">Cast</code> -- varchar | |||
* | *<code class="code">LFN</code> -- varchar | ||
<code class="code">LFN</code> -- varchar | </div> | ||
</div> | We can use the GUID of the file as the <span class="emphasis">''entry''</span> name. | ||
We can use the GUID of the file as the <span class="emphasis">''entry''</span> name. | |||
A <span class="emphasis">''collection''</span> named <code class="code">movies</code> of an AMGA server could be the repository of the movies' metadata and will allow to find the movies satisfying users' queries. | A <span class="emphasis">''collection''</span> named <code class="code">movies</code> of an AMGA server could be the repository of the movies' metadata and will allow to find the movies satisfying users' queries. | ||
</div><div title="Accessing AMGA from the command line | </div><div class="sect2" title="Accessing AMGA from the command line"><div class="titlepage"><div><div> | ||
=== Accessing AMGA from the command line === | === Accessing AMGA from the command line === | ||
</div></div></div> | </div></div></div> | ||
To start using AMGA from the command line, you need to use either <code class="code">mdcli</code> or <code class="code">mdclient</code> executables. They have to be installed, together with the required libraries, into a <span class="emphasis">''User Interface''</span> (and on the <span class="emphasis">''Worker Nodes''</span> of the sites where you plan to run jobs that will access AMGA). These do not come by default within the standard gLite <span class="emphasis">''UserInterface''</span> and <span class="emphasis">''WorkerNodes''</span> packages (on gLite 3.1 they should be available), so you need to install them manually. | To start using AMGA from the command line, you need to use either <code class="code">mdcli</code> or <code class="code">mdclient</code> executables. They have to be installed, together with the required libraries, into a <span class="emphasis">''User Interface''</span> (and on the <span class="emphasis">''Worker Nodes''</span> of the sites where you plan to run jobs that will access AMGA). These do not come by default within the standard gLite <span class="emphasis">''UserInterface''</span> and <span class="emphasis">''WorkerNodes''</span> packages (on gLite 3.1 they should be available), so you need to install them manually. | ||
You can download RPMs for SLC here: | You can download RPMs for SLC here: | ||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | *[http://amga.web.cern.ch/amga/downloads/glite-amga-cli-1.3.0-1.SLC4.i386.rpm http://amga.web.cern.ch/amga/downloads/glite-amga-cli-1.3.0-1.SLC4.i386.rpm] for SLC4 systems | ||
[http://amga.web.cern.ch/amga/downloads/glite-amga-cli-1.3.0-1.SLC4.i386.rpm http://amga.web.cern.ch/amga/downloads/glite-amga-cli-1.3.0-1.SLC4.i386.rpm] for SLC4 systems | |||
* | *[http://amga.web.cern.ch/amga/downloads/glite-amga-cli-1.3.0-1.i386.rpm http://amga.web.cern.ch/amga/downloads/glite-amga-cli-1.3.0-1.i386.rpm] for SLC3 systems | ||
[http://amga.web.cern.ch/amga/downloads/glite-amga-cli-1.3.0-1.i386.rpm http://amga.web.cern.ch/amga/downloads/glite-amga-cli-1.3.0-1.i386.rpm] for SLC3 systems | </div> | ||
</div> | Once installed, you need to properly configure the configuration file: <code class="code">$HOME/.mdclient.config</code>. A template of this file can be found in <code class="code">$GLITE_INSTALLATION/etc/mdclient.config</code>. This also behaves as a system wide configuration file, useful in a multi-user system (like on Worker Nodes) and it will be read by the AMGA clients if the $HOME/.mdclient.config does not exist. | ||
Once installed, you need to properly configure the configuration file: <code class="code">$HOME/.mdclient.config</code>. A template of this file can be found in <code class="code">$GLITE_INSTALLATION/etc/mdclient.config</code>. This also behaves as a system wide configuration file, useful in a multi-user system (like on Worker Nodes) and it will be read by the AMGA clients if the $HOME/.mdclient.config does not exist. | |||
The relevant values of the mdclient.config are the following: | The relevant values of the mdclient.config are the following: | ||
<pre class="program">Host = amga.ct.infn.it | <pre class="program">Host = amga.ct.infn.it | ||
Port = 8822 | Port = 8822 | ||
Line 140: | Line 113: | ||
UseGridProxy = 1 | UseGridProxy = 1 | ||
VerifyServerCert = 0 | VerifyServerCert = 0 | ||
</pre> | </pre> | ||
where | where | ||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | * | ||
<span class="emphasis">''Host''</span> defines the AMGA server you want to use. You can use amga.ct.infn.it for the purpose of testing (installed as part of the EGEE GILDA training infrastructure | |||
<span class="emphasis">''Host''</span> defines the AMGA server you want to use. You can use amga.ct.infn.it for the purpose of testing (installed as part of the EGEE GILDA training infrastructure | |||
* | * | ||
<span class="emphasis">''Port''</span> defines the port where the AMGA server daemon is listening. It should not changed | |||
<span class="emphasis">''Port''</span> defines the port where the AMGA server daemon is listening. It should not changed | |||
* | * | ||
<span class="emphasis">''Login''</span> defines the credential to be used to authenticated within the AMGA catalog. You can use: | |||
<span class="emphasis">''Login''</span> defines the credential to be used to authenticated within the AMGA catalog. You can use: | |||
* | * | ||
your <code class="code">username</code>, if you have requested one to AMGA server administrator providing him your Grid Certificate Distinguish Name (DN) | |||
your <code class="code">username</code>, if you have requested one to AMGA server administrator providing him your Grid Certificate Distinguish Name (DN) | |||
* | * | ||
or use <code class="code">NULL</code>, to be authenticated as a generic VO user (given the fact that your VO is supported by the AMGA server you are going to use) | |||
or use <code class="code">NULL</code>, to be authenticated as a generic VO user (given the fact that your VO is supported by the AMGA server you are going to use) | |||
* | * | ||
The AMGA server in the GILDA training infrastructure supports the following VOs: <code class="code">gilda, eela, eumed, euchina, cometa</code>. Please contact the amga.ct.inf n.it sysadmins (tony.calanducci [at] ct.infn.it) to request support for your VO or to get a personal account. Soon a registration page will be availabe at: [https://amga.ct.infn.it:8443/register https://amga.ct.infn.it:8443/register] | |||
The AMGA server in the GILDA training infrastructure supports the following VOs: <code class="code">gilda, eela, eumed, euchina, cometa</code>. Please contact the amga.ct.inf n.it sysadmins (tony.calanducci [at] ct.infn.it) to request support for your VO or to get a personal account. Soon a registration page will be availabe at: [https://amga.ct.infn.it:8443/register https://amga.ct.infn.it:8443/register] | |||
* | * | ||
Please take a look at the AMGA documentation [http://amga.web.cern.ch/amga/mdclient_config.html here] for all the available options of the <code class="code">.mdclient.config</code> file. | |||
Please take a look at the AMGA documentation [http://amga.web.cern.ch/amga/mdclient_config.html here] for all the available options of the <code class="code">.mdclient.config</code> file. | |||
* | * | ||
The options on the previous example were set up to be authenticated as a generic VO user using your Grid Proxy. Be sure to initialize your proxy with the proper command: | |||
The options on the previous example were set up to be authenticated as a generic VO user using your Grid Proxy. Be sure to initialize your proxy with the proper command: | |||
</div><pre class="response">-bash-2.05b$ voms-proxy-init --voms gilda | </div><pre class="response">-bash-2.05b$ voms-proxy-init --voms gilda | ||
Enter GRID pass phrase: | Enter GRID pass phrase: | ||
Line 173: | Line 154: | ||
Creating proxy ...................................... Done | Creating proxy ...................................... Done | ||
Your proxy is valid until Sun Feb 3 08:04:46 2008 | Your proxy is valid until Sun Feb 3 08:04:46 2008 | ||
</pre> | </pre> | ||
Once everything has been set up properly, start the AMGA <code class="code">mdclient</code> : | Once everything has been set up properly, start the AMGA <code class="code">mdclient</code> : | ||
<pre class="response">-bash-2.05b$ mdclient | <pre class="response">-bash-2.05b$ mdclient | ||
Connecting to amga.ct.infn.it:8822... | Connecting to amga.ct.infn.it:8822... | ||
ARDA Metadata Server 1.3.0 | ARDA Metadata Server 1.3.0 | ||
Query> | Query> | ||
</pre> | </pre> | ||
This is an interactive shell similar to <code class="code">mysql o psql</code> that allows to issue AMGA commands to the server. The other AMGA executable that comes with the RPM, <code class="code">mdcli</code>, provides the very same functionality, but it executes ONLY one command at time, passed as command line parameter, and immediately exits. This is quite useful in bash script to be run on Worker Nodes that needs to interact with the metadata catalog. | This is an interactive shell similar to <code class="code">mysql o psql</code> that allows to issue AMGA commands to the server. The other AMGA executable that comes with the RPM, <code class="code">mdcli</code>, provides the very same functionality, but it executes ONLY one command at time, passed as command line parameter, and immediately exits. This is quite useful in bash script to be run on Worker Nodes that needs to interact with the metadata catalog. | ||
</div><div title="Basic commands | </div><div class="sect2" title="Basic commands"><div class="titlepage"><div><div> | ||
=== Basic commands === | === Basic commands === | ||
</div></div></div> | </div></div></div> | ||
It is possible to get help anytime on the client just using the <code class="code">'help'</code> command. | It is possible to get help anytime on the client just using the <code class="code">'help'</code> command. | ||
<span class="emphasis">''Try the <code class="code">help</code> command''</span> | <span class="emphasis">''Try the <code class="code">help</code> command''</span> | ||
Line 193: | Line 174: | ||
capabilities commands | capabilities commands | ||
Query> | Query> | ||
</pre> | </pre> | ||
Commands are grouped by topic. You can get the list of valid commands for each topic, typing: <code class="code">help [topic]</code> | Commands are grouped by topic. You can get the list of valid commands for each topic, typing: <code class="code">help [topic]</code> | ||
The list of valid topics is: | The list of valid topics is: | ||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | * | ||
help | |||
help | |||
* | * | ||
metadata | |||
metadata | |||
* | * | ||
metadata-optional | |||
metadata-optional | |||
* | * | ||
directory | |||
directory | |||
* | * | ||
replication | |||
replication | |||
* | * | ||
entry | |||
entry | |||
* | * | ||
group | |||
group | |||
* | * | ||
acl | |||
acl | |||
* | * | ||
index | |||
index | |||
* | * | ||
schema | |||
schema | |||
* | * | ||
sequence | |||
sequence | |||
* | * | ||
user | |||
user | |||
* | * | ||
view | |||
view | |||
* | * | ||
ticket | |||
ticket | |||
* | * | ||
commands | |||
</div> | commands | ||
</div> | |||
<span class="emphasis">''Try the use of <code class="code">help</code> command with any topic''</span> | <span class="emphasis">''Try the use of <code class="code">help</code> command with any topic''</span> | ||
<code class="code">Query> help entry</code> | <code class="code">Query> help entry</code> | ||
</div><div title="mdclient general commands | </div><div class="sect2" title="mdclient general commands"><div class="titlepage"><div><div> | ||
=== <code class="code">mdclient</code> general commands === | === <code class="code">mdclient</code> general commands === | ||
</div></div></div> | </div></div></div> | ||
The following tables gives a brief description of the general use commands. | The following tables gives a brief description of the general use commands. | ||
<div class="informaltable"> | <div class="informaltable"> | ||
{| border="1" | {| border="1" | ||
Line 289: | Line 285: | ||
| Change the current collection | | Change the current collection | ||
|} | |} | ||
</div><div title="General commands examples: | </div><div class="sect3" title="General commands examples:"><div class="titlepage"><div><div> | ||
==== General commands examples: ==== | ==== General commands examples: ==== | ||
</div></div></div> | </div></div></div> | ||
<code class="code">Query> whoami</code> | <code class="code">Query> whoami</code> | ||
<pre class="response">>> gilda | <pre class="response">>> gilda | ||
</pre> | </pre> | ||
<code class="code">Query> pwd</code> | <code class="code">Query> pwd</code> | ||
<pre class="response">>> / | <pre class="response">>> / | ||
</pre> | </pre> | ||
<code class="code">Query> cd /gilda</code> | <code class="code">Query> cd /gilda</code> | ||
Line 304: | Line 300: | ||
<code class="code">Query> pwd</code> | <code class="code">Query> pwd</code> | ||
<pre class="response">>> /gilda/tony/ | <pre class="response">>> /gilda/tony/ | ||
</pre> | </pre> | ||
<code class="code">Query> dir</code> | <code class="code">Query> dir</code> | ||
<pre class="response">>> /gilda/tony/seq2 | <pre class="response">>> /gilda/tony/seq2 | ||
Line 330: | Line 326: | ||
>> /gilda/tony/20 | >> /gilda/tony/20 | ||
>> entry | >> entry | ||
</pre> | </pre> | ||
<code class="code">Query> listentries</code> | <code class="code">Query> listentries</code> | ||
<pre class="response">>> /gilda/tony/aentry | <pre class="response">>> /gilda/tony/aentry | ||
Line 339: | Line 335: | ||
>> /gilda/tony/18 | >> /gilda/tony/18 | ||
>> /gilda/tony/20 | >> /gilda/tony/20 | ||
</pre> | </pre> | ||
<code class="code">Query> cd seconda</code> | <code class="code">Query> cd seconda</code> | ||
Line 345: | Line 341: | ||
<pre class="response">>> /gilda/tony/seconda/2 | <pre class="response">>> /gilda/tony/seconda/2 | ||
>> entry | >> entry | ||
</pre> | </pre> | ||
<code class="code">Query> rm *</code> | <code class="code">Query> rm *</code> | ||
Line 351: | Line 347: | ||
<code class="code">Query> pwd</code> | <code class="code">Query> pwd</code> | ||
<pre class="response">>> /gilda/tony/</pre> | <pre class="response">>> /gilda/tony/</pre> | ||
<code class="code">Query> rmdir seconda</code> | <code class="code">Query> rmdir seconda</code> | ||
</div></div><div title="Handling schemas and attributes | </div></div><div class="sect2" title="Handling schemas and attributes"><div class="titlepage"><div><div> | ||
=== Handling schemas and attributes === | === Handling schemas and attributes === | ||
</div></div></div> | </div></div></div> | ||
Once a <span class="emphasis">''collection''</span> has been created, its <span class="emphasis">''schema''</span> should be defined, adding one or more <span class="emphasis">''attributes''</span>. As illustrated in the basic concept section, each attribute is defined by its <span class="emphasis">''name''</span> and its <span class="emphasis">''type''</span>. | Once a <span class="emphasis">''collection''</span> has been created, its <span class="emphasis">''schema''</span> should be defined, adding one or more <span class="emphasis">''attributes''</span>. As illustrated in the basic concept section, each attribute is defined by its <span class="emphasis">''name''</span> and its <span class="emphasis">''type''</span>. | ||
The command to add a new attribute to a collection schema is the following: | The command to add a new attribute to a collection schema is the following: | ||
<code class="code">addattr dir attribute_name type</code> where: | <code class="code">addattr dir attribute_name type</code> where: | ||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | * | ||
<code class="code">dir</code> is the collection/directory you are adding the attribute to. You can use relative or absolute path to refer to it. | |||
<code class="code">dir</code> is the collection/directory you are adding the attribute to. You can use relative or absolute path to refer to it. | |||
* | * | ||
<code class="code">attribute_name</code> is the name you want to give to the attribute you are adding | |||
<code class="code">attribute_name</code> is the name you want to give to the attribute you are adding | |||
* | * | ||
<code class="code">type</code> specifies what kind of values the attribute is able to contain | |||
</div> | <code class="code">type</code> specifies what kind of values the attribute is able to contain | ||
AMGA valid <span class="emphasis">''attribute types''</span> and their corresponding types used in the internal AMGA back-end are shown in the following table: | </div> | ||
AMGA valid <span class="emphasis">''attribute types''</span> and their corresponding types used in the internal AMGA back-end are shown in the following table: | |||
<div class="informaltable"> | <div class="informaltable"> | ||
{| border="1" | {| border="1" | ||
|- | |- | ||
! AMGA | ! AMGA | ||
! PostgreSQL | ! PostgreSQL | ||
! MySQL | ! MySQL | ||
! Oracle | ! Oracle | ||
! SQLite | ! SQLite | ||
! Python | ! Python | ||
|- | |- | ||
Line 396: | Line 395: | ||
| float | | float | ||
|- | |- | ||
| varchar(n) | | varchar(n) | ||
| character varying(n) | | character varying(n) | ||
| character varying(n) | | character varying(n) | ||
Line 424: | Line 423: | ||
| float | | float | ||
|} | |} | ||
</div> | </div> | ||
The AMGA server uses internally a relational database to store all the users' metadata. It can use almost any RDBMS that has an ODBC driver. Most of the installations use PostgreSQL and MySQL. If the types indicated in the first column are used to define attributes, metadata can be moved and replicated easily among AMGA servers that use different DB backends. If you don't mind to metadata portability between servers, you can also use all the specific data types of a given DB back-end (we have tried GIS datatypes and Network datatypes of PostgreSQL, for example). To find out which database back-end a given AMGA server is employing, you can use the command <code class="code">backend</code>: | The AMGA server uses internally a relational database to store all the users' metadata. It can use almost any RDBMS that has an ODBC driver. Most of the installations use PostgreSQL and MySQL. If the types indicated in the first column are used to define attributes, metadata can be moved and replicated easily among AMGA servers that use different DB backends. If you don't mind to metadata portability between servers, you can also use all the specific data types of a given DB back-end (we have tried GIS datatypes and Network datatypes of PostgreSQL, for example). To find out which database back-end a given AMGA server is employing, you can use the command <code class="code">backend</code>: | ||
<pre class="response">Query> backend | <pre class="response">Query> backend | ||
>> PostgreSQL | >> PostgreSQL | ||
</pre> | </pre> | ||
To remove an attribute from a collection schema, the following command is used: | To remove an attribute from a collection schema, the following command is used: | ||
<code class="code">removeattr dir attribute_name</code> | <code class="code">removeattr dir attribute_name</code> | ||
To inspect the schema of a given collection (or of an entry), use: | To inspect the schema of a given collection (or of an entry), use: | ||
<code class="code">listattr dir/entry</code> | <code class="code">listattr dir/entry</code> | ||
<div title="Schema population example: | <div class="sect3" title="Schema population example:"><div class="titlepage"><div><div> | ||
==== Schema population example: ==== | ==== Schema population example: ==== | ||
</div></div></div> | </div></div></div> | ||
Let's create a <span class="emphasis">''movies''</span> collection and define its schema, adding the following attributes: <span class="emphasis">''title, runtime, | Let's create a <span class="emphasis">''movies''</span> collection and define its schema, adding the following attributes: <span class="emphasis">''title, runtime,'' | ||
cast, LFN, to_remove'' | </span> | ||
cast, LFN, to_remove'' (one of them will be removed):'' | |||
<pre class="response">Query> createdir /gilda/movies | <pre class="response">Query> createdir /gilda/movies | ||
Query> addattr /gilda/movies title varchar(50) | Query> addattr /gilda/movies title varchar(50) | ||
Line 469: | Line 470: | ||
>> LFN | >> LFN | ||
>> varchar | >> varchar | ||
</pre></div></div><div title="Handling entries and metadata | </pre></div></div><div class="sect2" title="Handling entries and metadata"><div class="titlepage"><div><div> | ||
=== Handling entries and metadata === | === Handling entries and metadata === | ||
</div></div></div> | </div></div></div> | ||
Once the schema of a collection has been defined, it is possible to add new <span class="emphasis">''entries''</span>. <span class="bold">'''Each entry must have an <span class="emphasis">''entry name''</span>'''</span>. You can think of entry names as primary keys of a database table. Entry names are unique. According to your purposes, you could have different options. To mention some examples, GUIDs (Globally Unique Identifiers) could be an option if you are adding metadata to files, the final part of JOB IDs ('/' can't be part of entry names) if you are adding metadata to running jobs, or simply an incremental integer number. You may use any appropriate entry name to better describe your entities. If you want to use an incremental integer as entry name, AMGA <span class="bold">'''sequences'''</span> can be very useful. You can define one or more sequences for a given collection, but those will not generate by themselves new numbers unless you explicitly request it. | Once the schema of a collection has been defined, it is possible to add new <span class="emphasis">''entries''</span>. <span class="bold">'''Each entry must have an <span class="emphasis">''entry name''</span>'''</span>. You can think of entry names as primary keys of a database table. Entry names are unique. According to your purposes, you could have different options. To mention some examples, GUIDs (Globally Unique Identifiers) could be an option if you are adding metadata to files, the final part of JOB IDs ('/' can't be part of entry names) if you are adding metadata to running jobs, or simply an incremental integer number. You may use any appropriate entry name to better describe your entities. If you want to use an incremental integer as entry name, AMGA <span class="bold">'''sequences'''</span> can be very useful. You can define one or more sequences for a given collection, but those will not generate by themselves new numbers unless you explicitly request it. | ||
<pre class="command">Query> help sequence | <pre class="command">Query> help sequence | ||
Line 481: | Line 482: | ||
>> Deletes a sequence. | >> Deletes a sequence. | ||
Query> | Query> | ||
</pre><div title="Sequence examples | </pre><div class="sect3" title="Sequence examples"><div class="titlepage"><div><div> | ||
==== Sequence examples ==== | ==== Sequence examples ==== | ||
</div></div></div> | </div></div></div> | ||
Create a sequence for the <span class="emphasis">''movies''</span> collection and get the next sequence id: | Create a sequence for the <span class="emphasis">''movies''</span> collection and get the next sequence id: | ||
<pre class="response">Query> pwd | <pre class="response">Query> pwd | ||
>> /gilda/movies/ | >> /gilda/movies/ | ||
Line 497: | Line 498: | ||
Query> sequence_next /gilda/movies/id | Query> sequence_next /gilda/movies/id | ||
>> 3 | >> 3 | ||
</pre> | </pre> | ||
Once decided how to handle entry names, we can actually start <span class="bold">'''adding or removing entries'''</span>. Four commands are available for that purpose: | Once decided how to handle entry names, we can actually start <span class="bold">'''adding or removing entries'''</span>. Four commands are available for that purpose: | ||
<div class="informaltable"> | <div class="informaltable"> | ||
{| border="1" | {| border="1" | ||
Line 514: | Line 515: | ||
| <span class="emphasis">''Removes entries matching pattern/condition''</span> | | <span class="emphasis">''Removes entries matching pattern/condition''</span> | ||
|} | |} | ||
</div></div><div title="Entry creation and deletion examples | </div></div><div class="sect3" title="Entry creation and deletion examples"><div class="titlepage"><div><div> | ||
==== Entry creation and deletion examples ==== | ==== Entry creation and deletion examples ==== | ||
</div></div></div> | </div></div></div> | ||
Let's add 2 entries with valid attributes and 3 empty entries, then delete the last two: | Let's add 2 entries with valid attributes and 3 empty entries, then delete the last two: | ||
<pre class="response">Query> pwd | <pre class="response">Query> pwd | ||
>> /gilda/movies/ | >> /gilda/movies/ | ||
Line 567: | Line 568: | ||
| <span class="emphasis">''Removes entries matching pattern/condition''</span> | | <span class="emphasis">''Removes entries matching pattern/condition''</span> | ||
|} | |} | ||
</div> | </div> | ||
There are three more useful commands for handling the value of attributes: | There are three more useful commands for handling the value of attributes: | ||
<div class="informaltable"> | <div class="informaltable"> | ||
{| border="1" | {| border="1" | ||
Line 581: | Line 582: | ||
| <span class="emphasis">''Sets the attribute to NULL for all entries matching entry pattern.''</span> | | <span class="emphasis">''Sets the attribute to NULL for all entries matching entry pattern.''</span> | ||
|} | |} | ||
</div> | </div> | ||
Let's use the previous command to set and get entry attributes'values: | Let's use the previous command to set and get entry attributes'values: | ||
<pre class="response">Query> pwd | <pre class="response">Query> pwd | ||
>> /gilda/movies/ | >> /gilda/movies/ | ||
Line 610: | Line 611: | ||
>> lfn:/grid/gilda/movies/armageddon.mov | >> lfn:/grid/gilda/movies/armageddon.mov | ||
>> Bruce Willis, Ben Affleck | >> Bruce Willis, Ben Affleck | ||
</pre></div></div><div title="Querying metadata | </pre></div></div><div class="sect2" title="Querying metadata"><div class="titlepage"><div><div> | ||
=== Querying metadata === | === Querying metadata === | ||
</div></div></div> | </div></div></div> | ||
Finally, after we have created a collection, defined its schema, added entries with their attribute values to it, we can issue a query to get back the information we need. | Finally, after we have created a collection, defined its schema, added entries with their attribute values to it, we can issue a query to get back the information we need. | ||
The most used command to issue queries is <span class="bold">'''<code class="code">selectattr</code>'''</span>. Its syntax is as follows: | The most used command to issue queries is <span class="bold">'''<code class="code">selectattr</code>'''</span>. Its syntax is as follows: | ||
<pre class="programlisting">selectattr collection_name:attribute_name... condition | <pre class="programlisting">selectattr collection_name:attribute_name... condition | ||
</pre> | </pre> | ||
which returns the values of given attributes for all files matching the condition where: | which returns the values of given attributes for all files matching the condition where: | ||
<div class="itemizedlist"> | <div class="itemizedlist"> | ||
* | * | ||
<code class="code">collection_name</code> specifies the path of the attribute's collection we want to print out. If it's in current collection, the '.' (dot) is mandatory. If more than one attribute will follow and they are in the same collection of the first one, the collection_name can be omitted | |||
<code class="code">collection_name</code> specifies the path of the attribute's collection we want to print out. If it's in current collection, the '.' (dot) is mandatory. If more than one attribute will follow and they are in the same collection of the first one, the collection_name can be omitted | |||
* | * | ||
<code class="code">attribute_name</code> specifies the attribute whose values we want to print out | |||
<code class="code">attribute_name</code> specifies the attribute whose values we want to print out | |||
* | * | ||
<code class="code">condition</code> specifies a condition on attributes to filter the result set. Logical (and/or/not), comparison, aggregation operators can be used. Joins (inner, outer, left, right) between schemas are allowed. Limit,order,distinct,group are also available. [http://amga.web.cern.ch/amga/queries.html Here] you can find a summary of all the available operators and options. If you don't want to give any condition, use a pair of '' (single quotes). | |||
</div> | <code class="code">condition</code> specifies a condition on attributes to filter the result set. Logical (and/or/not), comparison, aggregation operators can be used. Joins (inner, outer, left, right) between schemas are allowed. Limit,order,distinct,group are also available. [http://amga.web.cern.ch/amga/queries.html Here] you can find a summary of all the available operators and options. If you don't want to give any condition, use a pair of ''(single quotes).'' | ||
A simpler query command is <span class="bold">'''<code class="code">find</code>'''</span>: | </div> | ||
A simpler query command is <span class="bold">'''<code class="code">find</code>'''</span>: | |||
<code class="code">find pattern condition</code> | <code class="code">find pattern condition</code> | ||
It returns only the names of the entries that match the pattern and satisfy the condition. | It returns only the names of the entries that match the pattern and satisfy the condition. | ||
<div title="Some examples of queries: | <div class="sect3" title="Some examples of queries:"><div class="titlepage"><div><div> | ||
==== Some examples of queries: ==== | ==== Some examples of queries: ==== | ||
</div></div></div> | </div></div></div> | ||
To print the titles and the LFNs of all the movies whose runtime is greater than 80 minutes: | To print the titles and the LFNs of all the movies whose runtime is greater than 80 minutes: | ||
<pre class="response">Query> selectattr /gilda/movies:title LFN 'runtime > 80' | <pre class="response">Query> selectattr /gilda/movies:title LFN 'runtime > 80' | ||
>> Spiderman 3 | >> Spiderman 3 | ||
Line 645: | Line 649: | ||
>> Armageddon | >> Armageddon | ||
>> lfn:/grid/gilda/movies/armageddon.mov | >> lfn:/grid/gilda/movies/armageddon.mov | ||
</pre> | </pre> | ||
To print the titles and the runtime of the movies where Julia Roberts performed: | To print the titles and the runtime of the movies where Julia Roberts performed: | ||
<pre class="response">Query> pwd | <pre class="response">Query> pwd | ||
>> / | >> / | ||
Line 655: | Line 659: | ||
>> Pretty Woman | >> Pretty Woman | ||
>> 95 | >> 95 | ||
</pre> | </pre> | ||
To issue the last query example using find: | To issue the last query example using find: | ||
<pre class="response">Query> find /gilda/movies/ 'like(cast, "Julia%")' | <pre class="response">Query> find /gilda/movies/ 'like(cast, "Julia%")' | ||
>> 5 | >> 5 | ||
Line 663: | Line 667: | ||
>> Pretty Woman | >> Pretty Woman | ||
>> 95 | >> 95 | ||
</pre></div></div><div title="More documentation | </pre></div></div><div class="sect2" title="More documentation"><div class="titlepage"><div><div> | ||
=== More documentation === | === More documentation === | ||
</div></div></div><div class="itemizedlist"> | </div></div></div><div class="itemizedlist"> | ||
* | *[http://amga.web.cern.ch/amga/ AMGA homepage] | ||
[http://amga.web.cern.ch/amga/ AMGA homepage] | |||
* | *[http://amga.web.cern.ch/amga/downloads/amga-manual_1_3_0.pdf AMGA User's and Administrator's Manual(PDF)] | ||
[http://amga.web.cern.ch/amga/downloads/amga-manual_1_3_0.pdf AMGA User's and Administrator's Manual(PDF)] | |||
* | *[https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGAHandsOn From the GILDA Twiki: Metadata - Introduction to AMGA] | ||
[https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGAHandsOn From the GILDA Twiki: Metadata - Introduction to AMGA] | |||
* | *[https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGAAdv From the GILDA Twiki: AMGA Advanced usage] | ||
[https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGAAdv From the GILDA Twiki: AMGA Advanced usage] | |||
* | *[https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGADBaccess From the GILDA Twiki: Accessing pre-existing databases through AMGA] | ||
[https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGADBaccess From the GILDA Twiki: Accessing pre-existing databases through AMGA] | </div></div> | ||
</div></div> | [[Category:Operations_Manuals]] |
Latest revision as of 17:28, 10 January 2013
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
This document has the goal of providing basic information on the usage of the AMGA Metadata Catalog.
Introduction
The AMGA Metadata Catalog is the EGI gLite service that allows metadata handling on the grid. The main usage can be as a "Front-end" file metadata service, providing means of describing and discovering data files required by users and their jobs. It can also be used as a Grid-Enabled Database for applications that require to structure their data, proving a database-like service supporting Grid Security features (X509 Proxies and the VOMS authentication and authorization system). Finally, an additional feature allows the accessing of existing relational databases from a grid environment (Worker Nodes, User Interface, etc), which enables the addition of Grid Security to existing DBs.
Users and applications can interact with an AMGA server using command
line tools:
mdcli/mdclient
available for Scientic Linux flavours (they can be built easily to other platforms)
mdjavacli/mdjavaclient
, the Java versions of the previous one that allows the interaction from any platform
or through APIs, available for:
- C++
- Java
- Python
- Perl
- PHP
Some of the mdcli/mdclient
command line tools are explained below.
AMGA Metadata Basic Concepts
There are certain fundamental concepts which must be understood when dealing with AMGA
- Entry - it is the representation of the real world entity which we are attaching metadata to in order to describe it
- Attribute - key/value pair. It has:
- Type - The type (int, float, string, etc...)
- Name/Key - The name of the attribute
- Value - The value of an entry's attribute
- Schema - A set of attributes
- Collection - A set of entries associated with a schema
- Metadata - The list of attributes (including their values) associated with entries
If we want to make an analogy with the RDBMS world, we have the following:
- schema = table schema
- collection = database table
- attribute = schema column
- entry = table row/record
By analogy with a file system, we have:
- collection = directory
- entry = file
In the AMGA help and documentation, often directory is used to refer to collection, as file refers to entry
Example: Metadata for movies
Movie files (entries) could be saved on Grid Storage Elements and registered into a File Catalogue. We want to add metadata to describe the movie content. A possible schema could be:
Title
-- varchar
Runtime
-- int
Cast
-- varchar
LFN
-- varchar
We can use the GUID of the file as the entry name.
A collection named movies
of an AMGA server could be the repository of the movies' metadata and will allow to find the movies satisfying users' queries.
Accessing AMGA from the command line
To start using AMGA from the command line, you need to use either mdcli
or mdclient
executables. They have to be installed, together with the required libraries, into a User Interface (and on the Worker Nodes of the sites where you plan to run jobs that will access AMGA). These do not come by default within the standard gLite UserInterface and WorkerNodes packages (on gLite 3.1 they should be available), so you need to install them manually.
You can download RPMs for SLC here:
Once installed, you need to properly configure the configuration file: $HOME/.mdclient.config
. A template of this file can be found in $GLITE_INSTALLATION/etc/mdclient.config
. This also behaves as a system wide configuration file, useful in a multi-user system (like on Worker Nodes) and it will be read by the AMGA clients if the $HOME/.mdclient.config does not exist.
The relevant values of the mdclient.config are the following:
Host = amga.ct.infn.it Port = 8822 Login = NULL PermissionMask = rwx GroupMask = r-x Home = /gilda UseSSL = yes AuthenticateWithCertificate = 1 UseGridProxy = 1 VerifyServerCert = 0
where
Host defines the AMGA server you want to use. You can use amga.ct.infn.it for the purpose of testing (installed as part of the EGEE GILDA training infrastructure
Port defines the port where the AMGA server daemon is listening. It should not changed
Login defines the credential to be used to authenticated within the AMGA catalog. You can use:
your username
, if you have requested one to AMGA server administrator providing him your Grid Certificate Distinguish Name (DN)
or use NULL
, to be authenticated as a generic VO user (given the fact that your VO is supported by the AMGA server you are going to use)
The AMGA server in the GILDA training infrastructure supports the following VOs: gilda, eela, eumed, euchina, cometa
. Please contact the amga.ct.inf n.it sysadmins (tony.calanducci [at] ct.infn.it) to request support for your VO or to get a personal account. Soon a registration page will be availabe at: https://amga.ct.infn.it:8443/register
Please take a look at the AMGA documentation here for all the available options of the .mdclient.config
file.
The options on the previous example were set up to be authenticated as a generic VO user using your Grid Proxy. Be sure to initialize your proxy with the proper command:
-bash-2.05b$ voms-proxy-init --voms gilda Enter GRID pass phrase: Your identity: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN Catania/CN=Tony Calanducci/Email=tony.calanducci@ct.infn.it Creating temporary proxy ............................................................................. Done Contacting voms.ct.infn.it:15001 [/C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it] "gilda" Done Creating proxy ...................................... Done Your proxy is valid until Sun Feb 3 08:04:46 2008
Once everything has been set up properly, start the AMGA mdclient
:
-bash-2.05b$ mdclient Connecting to amga.ct.infn.it:8822... ARDA Metadata Server 1.3.0 Query>
This is an interactive shell similar to mysql o psql
that allows to issue AMGA commands to the server. The other AMGA executable that comes with the RPM, mdcli
, provides the very same functionality, but it executes ONLY one command at time, passed as command line parameter, and immediately exits. This is quite useful in bash script to be run on Worker Nodes that needs to interact with the metadata catalog.
Basic commands
It is possible to get help anytime on the client just using the 'help'
command.
Try the help
command
Query> help >> help [topic] >> Displays help on a command or a topic. >> Valid topics are: help metadata metadata-optional directory replication constraints entry group acl index schema sequence user view site replicas ticket / capabilities commands Query>
Commands are grouped by topic. You can get the list of valid commands for each topic, typing: help [topic]
The list of valid topics is:
help
metadata
metadata-optional
directory
replication
entry
group
acl
index
schema
sequence
user
view
ticket
commands
Try the use of help
command with any topic
Query> help entry
mdclient
general commands
The following tables gives a brief description of the general use commands.
createdir path [options]
|
Create a new collection . It can inherit (using the inherit option) the schema associated to the upper level collection
|
rm pattern
|
Remove the entries corresponding to the given pattern |
dir collection
|
List the content (entries, subcollections, sequences, indexes) of the given collection |
listentries collection
|
List the entries only of the given collection |
stat pattern
|
Show the statistics of an entry or collection |
chown file owner
|
Change the ownership of an entry or collection |
chmod file rights
|
Change the access mode of an entry or collection |
rmdir collection
|
Remove a collection |
dump collection
|
Make a recursive dump starting from a given collection, (the default is: '/') |
pwd
|
Prints the current collection |
whoami
|
Prints the current user |
cd collection
|
Change the current collection |
General commands examples:
Query> whoami
>> gilda
Query> pwd
>> /
Query> cd /gilda
Query> cd tony
Query> pwd
>> /gilda/tony/
Query> dir
>> /gilda/tony/seq2 >> sequence >> /gilda/tony/seconda >> collection >> /gilda/tony/v1 >> collection >> /gilda/tony/v2 >> collection >> /gilda/tony/view1 >> view >> /gilda/tony/aentry >> entry >> /gilda/tony/14 >> entry >> /gilda/tony/15 >> entry >> /gilda/tony/16 >> entry >> /gilda/tony/17 >> entry >> /gilda/tony/18 >> entry >> /gilda/tony/20 >> entry
Query> listentries
>> /gilda/tony/aentry >> /gilda/tony/14 >> /gilda/tony/15 >> /gilda/tony/16 >> /gilda/tony/17 >> /gilda/tony/18 >> /gilda/tony/20
Query> cd seconda
Query> dir
>> /gilda/tony/seconda/2 >> entry
Query> rm *
Query> cd ..
Query> pwd
>> /gilda/tony/
Query> rmdir seconda
Handling schemas and attributes
Once a collection has been created, its schema should be defined, adding one or more attributes. As illustrated in the basic concept section, each attribute is defined by its name and its type.
The command to add a new attribute to a collection schema is the following:
addattr dir attribute_name type
where:
dir
is the collection/directory you are adding the attribute to. You can use relative or absolute path to refer to it.
attribute_name
is the name you want to give to the attribute you are adding
type
specifies what kind of values the attribute is able to contain
AMGA valid attribute types and their corresponding types used in the internal AMGA back-end are shown in the following table:
AMGA | PostgreSQL | MySQL | Oracle | SQLite | Python |
---|---|---|---|---|---|
int | integer | int | number(38) | int | int |
float | double precision | double precision | float | float | float |
varchar(n) | character varying(n) | character varying(n) | varchar2(n) | varchar(n) | string |
timestamp | timestamp w/o TZ | datetime | timestamp(6) | unsupported | time(unsupported) |
text | text | text | long | text | string |
numeric(p,s) | numeric(p.s) | numeric(p.s) | numeric(p.s) | numeric(p.s) | float |
The AMGA server uses internally a relational database to store all the users' metadata. It can use almost any RDBMS that has an ODBC driver. Most of the installations use PostgreSQL and MySQL. If the types indicated in the first column are used to define attributes, metadata can be moved and replicated easily among AMGA servers that use different DB backends. If you don't mind to metadata portability between servers, you can also use all the specific data types of a given DB back-end (we have tried GIS datatypes and Network datatypes of PostgreSQL, for example). To find out which database back-end a given AMGA server is employing, you can use the command backend
:
Query> backend >> PostgreSQL
To remove an attribute from a collection schema, the following command is used:
removeattr dir attribute_name
To inspect the schema of a given collection (or of an entry), use:
listattr dir/entry
Schema population example:
Let's create a movies collection and define its schema, adding the following attributes: title, runtime,
cast, LFN, to_remove (one of them will be removed):
Query> createdir /gilda/movies Query> addattr /gilda/movies title varchar(50) Query> addattr /gilda/movies runtime int Query> addattr /gilda/movies cast text Query> addattr /gilda/movies LFN varchar Query> addattr /gilda/movies to_remove float Query> listattr /gilda/movies >> title >> varchar(50) >> runtime >> int >> cast >> text >> LFN >> varchar >> to_remove >> float Query> cd /gilda/movies Query> removeattr . to_remove Query> listattr . >> title >> varchar(50) >> runtime >> int >> cast >> text >> LFN >> varchar
Handling entries and metadata
Once the schema of a collection has been defined, it is possible to add new entries. Each entry must have an entry name. You can think of entry names as primary keys of a database table. Entry names are unique. According to your purposes, you could have different options. To mention some examples, GUIDs (Globally Unique Identifiers) could be an option if you are adding metadata to files, the final part of JOB IDs ('/' can't be part of entry names) if you are adding metadata to running jobs, or simply an incremental integer number. You may use any appropriate entry name to better describe your entities. If you want to use an incremental integer as entry name, AMGA sequences can be very useful. You can define one or more sequences for a given collection, but those will not generate by themselves new numbers unless you explicitly request it.
Query> help sequence >> sequence_create name dir [increment] [start value] >> Creates a new sequences with given name in the given directory. >> sequence_next sequence >> Gets the next value from a sequence. >> sequence_remove sequence >> Deletes a sequence. Query>
Sequence examples
Create a sequence for the movies collection and get the next sequence id:
Query> pwd >> /gilda/movies/ Query> sequence_create id /gilda/movies Query> dir >> /gilda/movies/id >> sequence Query> sequence_next /gilda/movies/id >> 1 Query> sequence_next /gilda/movies/id >> 2 Query> sequence_next /gilda/movies/id >> 3
Once decided how to handle entry names, we can actually start adding or removing entries. Four commands are available for that purpose:
addentries entry1...
|
Adds one or more entries (also across collections) |
addentry entry [attribute_name value]...
|
Adds one new entry and initializes one or more attributes |
removeentries entry1...
|
Removes one or more entries (also across collections) |
rm [-rf] pattern [condition]
|
Removes entries matching pattern/condition |
Entry creation and deletion examples
Let's add 2 entries with valid attributes and 3 empty entries, then delete the last two:
Query> pwd >> /gilda/movies/ Query> sequence_next id >> 4 Query> addentry 4 title 'Spiderman 3' runtime 120 cast 'Kirsten Dunst, Tobey Maguire' LFN 'lfn:/grid/gilda/movies/spiderman.mov' Query> sequence_next id >> 5 Query> addentry 5 title 'Pretty Woman' runtime 95 cast 'Julia Roberts, Richard Gere' LFN 'lfn:/grid/gilda/movies/prettywoman.mov' Query> sequence_next id >> 6 Query> addentries 6 7 8 Query> dir >> /gilda/movies/id >> sequence >> /gilda/movies/4 >> entry >> /gilda/movies/5 >> entry >> /gilda/movies/6 >> entry >> /gilda/movies/7 >> entry >> /gilda/movies/8 >> entry Query> removeentries 7 8 Query> dir >> /gilda/movies/id >> sequence >> /gilda/movies/4 >> entry >> /gilda/movies/5 >> entry >> /gilda/movies/6 >> entry
addentries entry1...
|
Adds one or more entries (also across collections) |
addentry entry [attribute_name value]...
|
Adds one new entry and initializes one or more attributes |
removeentries entry1...
|
Removes one or more entries (also across collections) |
rm [-rf] pattern [condition]
|
Removes entries matching pattern/condition |
There are three more useful commands for handling the value of attributes:
getattr pattern attribute1 attribute2 ...
|
Returns the values of the attributes for all files matching pattern |
setattr entry attribute value [attribute value]...
|
Sets given attributes to specified values for all entries matching entry |
clearattr entry attribute
|
Sets the attribute to NULL for all entries matching entry pattern. |
Let's use the previous command to set and get entry attributes'values:
Query> pwd >> /gilda/movies/ Query> getattr * title >> 4 >> Spiderman 3 >> 5 >> Pretty Woman >> 6 >> Query> getattr 6 title >> 6 >> Query> setattr 6 title 'Armageddon' Query> setattr 6 runtime 150 cast 'Bruce Willis, Ben Affleck' LFN 'lfn:/grid/gilda/movies/armageddon.mov' Query> getattr /gilda/movies/ title LFN cast >> 4 >> Spiderman 3 >> lfn:/grid/gilda/movies/spiderman.mov >> Kirsten Dunst, Tobey Maguire >> 5 >> Pretty Woman >> lfn:/grid/gilda/movies/prettywoman.mov >> Julia Roberts, Richard Gere >> 6 >> Armageddon >> lfn:/grid/gilda/movies/armageddon.mov >> Bruce Willis, Ben Affleck
Querying metadata
Finally, after we have created a collection, defined its schema, added entries with their attribute values to it, we can issue a query to get back the information we need.
The most used command to issue queries is selectattr
. Its syntax is as follows:
selectattr collection_name:attribute_name... condition
which returns the values of given attributes for all files matching the condition where:
collection_name
specifies the path of the attribute's collection we want to print out. If it's in current collection, the '.' (dot) is mandatory. If more than one attribute will follow and they are in the same collection of the first one, the collection_name can be omitted
attribute_name
specifies the attribute whose values we want to print out
condition
specifies a condition on attributes to filter the result set. Logical (and/or/not), comparison, aggregation operators can be used. Joins (inner, outer, left, right) between schemas are allowed. Limit,order,distinct,group are also available. Here you can find a summary of all the available operators and options. If you don't want to give any condition, use a pair of (single quotes).
A simpler query command is find
:
find pattern condition
It returns only the names of the entries that match the pattern and satisfy the condition.
Some examples of queries:
To print the titles and the LFNs of all the movies whose runtime is greater than 80 minutes:
Query> selectattr /gilda/movies:title LFN 'runtime > 80' >> Spiderman 3 >> lfn:/grid/gilda/movies/spiderman.mov >> Pretty Woman >> lfn:/grid/gilda/movies/prettywoman.mov >> Armageddon >> lfn:/grid/gilda/movies/armageddon.mov
To print the titles and the runtime of the movies where Julia Roberts performed:
Query> pwd >> / Query> cd /gilda/movies Query> pwd >> /gilda/movies/ Query> selectattr .:title runtime 'like(cast, "Julia%")' >> Pretty Woman >> 95
To issue the last query example using find:
Query> find /gilda/movies/ 'like(cast, "Julia%")' >> 5 Query> getattr 5 title runtime >> 5 >> Pretty Woman >> 95