Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-04-07-2011"

From EGIWiki
Jump to navigation Jump to search
Line 2: Line 2:


= Detailed agenda: Grid Operations Meeting 04 July 2011 14h00 Amsterdam time  =
= Detailed agenda: Grid Operations Meeting 04 July 2011 14h00 Amsterdam time  =
* [https://www.egi.eu/indico/conferenceDisplay.py?confId=507 Indico page]
 
* EVO details [https://www.egi.eu/indico/materialDisplay.py?materialId=0&confId=507 on indico]
*[https://www.egi.eu/indico/conferenceDisplay.py?confId=507 Indico page]  
** [http://evo.caltech.edu/evoNext/koala.jnlp?meeting=MsMiMI282MDuDl9u9tDt9s Direct EVO link]
*EVO details [https://www.egi.eu/indico/materialDisplay.py?materialId=0&confId=507 on indico]  
** pass: gridops
**[http://evo.caltech.edu/evoNext/koala.jnlp?meeting=MsMiMI282MDuDl9u9tDt9s Direct EVO link]  
**pass: gridops


== 1. Middleware releases and staged rollout  ==
== 1. Middleware releases and staged rollout  ==


=== 1.1 EMI-1 release status (Cristina)  ===
=== 1.1 EMI-1 release status (Cristina)  ===
[https://www.egi.eu/indico/getFile.py/access?contribId=0&resId=0&materialId=slides&confId=507 Slides from Cristina]<br>
 
* EMI Update 2 '''23.06.2011'''
[https://www.egi.eu/indico/getFile.py/access?contribId=0&resId=0&materialId=slides&confId=507 Slides from Cristina]<br>  
** CREAM&CEMon v. ''1.13.1''
 
* EMI Update 3: '''07.07.2011'''
*EMI Update 2 '''23.06.2011'''  
** Storm SE (First release in EMI) v. ''1.7.0''
**CREAM&amp;CEMon v. ''1.13.1''  
** L&B v. ''3.0.12''
*EMI Update 3: '''07.07.2011'''  
** glite-proxyrenewal v. ''1.3.21''
**Storm SE (First release in EMI) v. ''1.7.0''  
** glite-MPI v. ''1.0.1''
**L&amp;B v. ''3.0.12''  
** UNICORE UVOS v. ''1.4.2''
**glite-proxyrenewal v. ''1.3.21''  
**glite-MPI v. ''1.0.1''  
**UNICORE UVOS v. ''1.4.2''


=== 1.2. EMI/UMD current status  ===
=== 1.2. EMI/UMD current status  ===
Line 25: Line 28:


===== 1.3.1 gLite 3.1 series<br>  =====
===== 1.3.1 gLite 3.1 series<br>  =====
===== 1.3.2 gLite 3.2 series  =====
===== 1.3.2 gLite 3.2 series  =====
===== 1.3.3 EMI1 - UMD1<br>  =====
===== 1.3.3 EMI1 - UMD1<br>  =====


==== 1.4 Interoperability (Michaela)  ====
==== 1.4 Interoperability (Michaela)  ====


===== UNICORE =====
===== UNICORE =====
* Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=498
 
* New servicetypes added to GOCDB, descriptions updated in last meeting. Now about to add more UNICORE services into GOCDB.
*Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=498  
**https://goc.gridops.org/portal/index.php?Page_Type=View_Object&object_id=22973&grid_id=0
*New servicetypes added to GOCDB, descriptions updated in last meeting. Now about to add more UNICORE services into GOCDB.  
** https://goc.egi.eu/portal/index.php?Page_Type=View_Object&object_id=14727&grid_id=0
**https://goc.gridops.org/portal/index.php?Page_Type=View_Object&amp;object_id=22973&amp;grid_id=0  
** Found problems with parsing of some characters in Service Point URL field.
**https://goc.egi.eu/portal/index.php?Page_Type=View_Object&amp;object_id=14727&amp;grid_id=0  
* Unexpected UNICORE Nagios probes integration delay due to misjudging the amount of effort needed for actual last step integration. Deadline for SAM Update-12 release missed. Next deadline is SAM Update-13 which will be released around the end of July.
**Found problems with parsing of some characters in Service Point URL field.  
* NGI_BY had to draw back their permission to go open source with their UNICORE accounting solution. Hoping to have time to investigate further in this matter now after the first years EC review.
*Unexpected UNICORE Nagios probes integration delay due to misjudging the amount of effort needed for actual last step integration. Deadline for SAM Update-12 release missed. Next deadline is SAM Update-13 which will be released around the end of July.  
* UNICORE summit http://www.unicore.eu/summit/2011/ at Nicolaus Copernicus University, Torun, Poland, on July 7th - 8th
*NGI_BY had to draw back their permission to go open source with their UNICORE accounting solution. Hoping to have time to investigate further in this matter now after the first years EC review.  
* Next meeting second or forth week of July.
*UNICORE summit http://www.unicore.eu/summit/2011/ at Nicolaus Copernicus University, Torun, Poland, on July 7th - 8th  
* Further information: [[UNICORE_integration_task_force]]
*Next meeting second or forth week of July.  
*Further information: [[UNICORE integration task force]]


===== Globus =====
===== Globus =====


* Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=496
*Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=496  
* Reminder to all NGIs to tell their sites to register all their Globus GT5 services in GOCDB, since this is a good time now with the upcoming SAM/Nagios release.
*Reminder to all NGIs to tell their sites to register all their Globus GT5 services in GOCDB, since this is a good time now with the upcoming SAM/Nagios release.  
* IGE will officially take over support for Nagios probes, details to be fixed.
*IGE will officially take over support for Nagios probes, details to be fixed.  
* LRZ will be an EA for Globus.
*LRZ will be an EA for Globus.  
* Looking for a future staged-rollout manager.
*Looking for a future staged-rollout manager.  
* Globus/IGE people now also in EMI ComputeAccounting working group.
*Globus/IGE people now also in EMI ComputeAccounting working group.  
* Next meeting second week of July.
*Next meeting second week of July.  
* Further information: [[Globus_integration_task_force]]
*Further information: [[Globus integration task force]]


===== ARC =====
===== ARC =====


Major problems in operations since this weekend due to waterfloding of NBI computerhall in Copenhagen infecting most NorduGrid infrastructure (GIIS, Mail, SVN, Download) except WWW. GIIS not working effects BDII services. Services went totally down from Saturday evening until Sunday afternoon. Emergency diesel power flooded as well. Some services still effected now: The one of the four global GIIS servers in Denmark and e.g NDGF-T1 mail server is also still down. Possible effect on all sites under http://www.nordugrid.org/monitor/ ARC-CEs in Copenhagen killed. d-Cache Pools in Denmark still kept alive. Most other ARC workernodes free and working fine, but no new jobs coming in. Weatherforcast for Denmark still bad after this worst Thunderstorm in history.
Major problems in operations since this weekend due to waterfloding of NBI computerhall in Copenhagen infecting most NorduGrid infrastructure (GIIS, Mail, SVN, Download) except WWW. GIIS not working effects BDII services. Services went totally down from Saturday evening until Sunday afternoon. Emergency diesel power flooded as well. Some services still effected now: The one of the four global GIIS servers in Denmark and e.g NDGF-T1 mail server is also still down. Possible effect on all sites under http://www.nordugrid.org/monitor/ ARC-CEs in Copenhagen killed. d-Cache Pools in Denmark still kept alive. Most other ARC workernodes free and working fine, but no new jobs coming in. Weatherforcast for Denmark still bad after this worst Thunderstorm in history.  


=== 2. Operational Issues  ===
=== 2. Operational Issues  ===
==== 2.1 Publishing site information in BDII ====
Most of the site in the EGI integrated infrastructure are correctly publishing '''SiteOtherInfo : GRID=EGI'''. There are still site that are publishing '''GRID=EGEE''' and the Resource infrastructure Provider name as '''EGEE_ROC''' instead of '''EGI_NGI''':


<pre>
==== 2.1 Publishing site information in BDII  ====
GlueSiteOtherInfo: GRID=EGEE
 
Most of the site in the EGI integrated infrastructure are correctly publishing '''SiteOtherInfo&nbsp;: GRID=EGI'''. There are still site that are publishing '''GRID=EGEE''' and the Resource infrastructure Provider name as '''EGEE_ROC''' instead of '''EGI_NGI''':
<pre>GlueSiteOtherInfo: GRID=EGEE
GlueSiteOtherInfo: EGEE_SERVICE=prod
GlueSiteOtherInfo: EGEE_SERVICE=prod
GlueSiteOtherInfo: EGEE_ROC=XXX
GlueSiteOtherInfo: EGEE_ROC=XXX
</pre>
</pre>  
 
Should Be:  
Should Be:
<pre>GlueSiteOtherInfo: EGEE_SERVICE=prod
<pre>
GlueSiteOtherInfo: EGEE_SERVICE=prod
GlueSiteOtherInfo: EGI_NGI=XXX
GlueSiteOtherInfo: EGI_NGI=XXX
GlueSiteOtherInfo: GRID=EGI
GlueSiteOtherInfo: GRID=EGI
</pre>
</pre>  
 
The ''EGEE_ROC'' has to be '''always''' replaced by ''EGI_NGI''. Sites that are publishing both ''GRID=EGEE ''and''GRID=EGI'' should remove the first attribute.  
The ''EGEE_ROC'' has to be '''always''' replaced by ''EGI_NGI''. Sites that are publishing both ''GRID=EGEE ''and'' GRID=EGI'' should remove the first attribute.


** sites still using these values are available from gstat:
**sites still using these values are available from gstat:  
*** [http://gstat.egi.eu/gstat/summary/GRID/EGEE/ EGEE sites]
***[http://gstat.egi.eu/gstat/summary/GRID/EGEE/ EGEE sites]  
*** [http://gstat.egi.eu/gstat/summary/EGEE_ROC/ALL/ EGEE_ROC sites]
***[http://gstat.egi.eu/gstat/summary/EGEE_ROC/ALL/ EGEE_ROC sites]  
** [https://wiki.egi.eu/wiki/MAN1_How_to_publish_Site_Information Current manual]
**[https://wiki.egi.eu/wiki/MAN1_How_to_publish_Site_Information Current manual]


=== 3. AOB  ===
=== 3. AOB  ===

Revision as of 11:46, 4 July 2011

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security



Detailed agenda: Grid Operations Meeting 04 July 2011 14h00 Amsterdam time

1. Middleware releases and staged rollout

1.1 EMI-1 release status (Cristina)

Slides from Cristina

  • EMI Update 2 23.06.2011
    • CREAM&CEMon v. 1.13.1
  • EMI Update 3: 07.07.2011
    • Storm SE (First release in EMI) v. 1.7.0
    • L&B v. 3.0.12
    • glite-proxyrenewal v. 1.3.21
    • glite-MPI v. 1.0.1
    • UNICORE UVOS v. 1.4.2

1.2. EMI/UMD current status

1.3. Staged Rollout (Mario)

1.3.1 gLite 3.1 series
1.3.2 gLite 3.2 series
1.3.3 EMI1 - UMD1

1.4 Interoperability (Michaela)

UNICORE
Globus
  • Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=496
  • Reminder to all NGIs to tell their sites to register all their Globus GT5 services in GOCDB, since this is a good time now with the upcoming SAM/Nagios release.
  • IGE will officially take over support for Nagios probes, details to be fixed.
  • LRZ will be an EA for Globus.
  • Looking for a future staged-rollout manager.
  • Globus/IGE people now also in EMI ComputeAccounting working group.
  • Next meeting second week of July.
  • Further information: Globus integration task force
ARC

Major problems in operations since this weekend due to waterfloding of NBI computerhall in Copenhagen infecting most NorduGrid infrastructure (GIIS, Mail, SVN, Download) except WWW. GIIS not working effects BDII services. Services went totally down from Saturday evening until Sunday afternoon. Emergency diesel power flooded as well. Some services still effected now: The one of the four global GIIS servers in Denmark and e.g NDGF-T1 mail server is also still down. Possible effect on all sites under http://www.nordugrid.org/monitor/ ARC-CEs in Copenhagen killed. d-Cache Pools in Denmark still kept alive. Most other ARC workernodes free and working fine, but no new jobs coming in. Weatherforcast for Denmark still bad after this worst Thunderstorm in history.

2. Operational Issues

2.1 Publishing site information in BDII

Most of the site in the EGI integrated infrastructure are correctly publishing SiteOtherInfo : GRID=EGI. There are still site that are publishing GRID=EGEE and the Resource infrastructure Provider name as EGEE_ROC instead of EGI_NGI:

GlueSiteOtherInfo: GRID=EGEE
GlueSiteOtherInfo: EGEE_SERVICE=prod
GlueSiteOtherInfo: EGEE_ROC=XXX

Should Be:

GlueSiteOtherInfo: EGEE_SERVICE=prod
GlueSiteOtherInfo: EGI_NGI=XXX
GlueSiteOtherInfo: GRID=EGI

The EGEE_ROC has to be always replaced by EGI_NGI. Sites that are publishing both GRID=EGEE andGRID=EGI should remove the first attribute.

3. AOB

3.1

Next Meeting: