|Main||EGI.eu operations services||Support||Documentation||Tools||Activities||Performance||Technology||Catch-all Services||Resource Allocation||Security|
Detailed agenda: Grid Operations Meeting 04 July 2011 14h00 Amsterdam time
1. Middleware releases and staged rollout
1.1 EMI-1 release status (Cristina)
- EMI Update 2 23.06.2011
- CREAM&CEMon v. 1.13.1
- EMI Update 3: 07.07.2011
- Storm SE (First release in EMI) v. 1.7.0
- L&B v. 3.0.12
- glite-proxyrenewal v. 1.3.21
- glite-MPI v. 1.0.1
- UNICORE UVOS v. 1.4.2
1.2. EMI/UMD current status
1.3. Staged Rollout (Mario)
1.3.1 gLite 3.1 series
- WMS 3.2.17: installed and in production, waiting for the staged rollout report
1.3.2 gLite 3.2 series
- gLexec: EA teste it, waiting for the staged rollout report
1.3.3 EMI1 - UMD1
- 27 products are in the UMDStore area, which means that staged rollout has been performed, and they will be in the UMD1 release.
- The products missing (at the time of this meeting) and under staged rollout, are: arc-ce, arc-clients and cream (from EMI update 2)
- We are now in the process of preparing the release: collect release notes, issues found in verification and staged rollout, workarounds, etc..
|Staged-rollout||GGUS Tickets||DocDB ID||EA teams|
|RT ticket ID||Product - sw-rel Ticket||Verif||StgRllt||ET (Finish)||Verif||StgRllt||Verif||SR||UMDStore||done||waiting|
|2431||EMI.arc-ce.sl5.x86_64||DONE||OnGoing||5-Jul||71120||608||wait||4 arc EA teams|
|2305||EMI.dpm.sl5.x86_64||DONE||DONE||28-Jun||71205 71353 71357||573||614||OK||2|
|EMI.wms.sl5.x86_64||Rejected||71168 71065 71190||567|
1.4 Interoperability (Michaela)
- Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=498
- New servicetypes added to GOCDB, descriptions updated in last meeting. Now about to add more UNICORE services into GOCDB.
- Found problems with parsing of some characters in Service Point URL field.
- Unexpected UNICORE Nagios probes integration delay due to misjudging the amount of effort needed for actual last step integration. Deadline for SAM Update-12 release missed. Next deadline is SAM Update-13 which will be released around the end of July.
- NGI_BY had to draw back their permission to go open source with their UNICORE accounting solution. Hoping to have time to investigate further in this matter now after the first years EC review.
- UNICORE summit http://www.unicore.eu/summit/2011/ at Nicolaus Copernicus University, Torun, Poland, on July 7th - 8th
- Next meeting second or forth week of July. http://www.doodle.com/58w93q6xqi32pvqd
- Further information: UNICORE integration task force
- Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=496
- Reminder to all NGIs to tell their sites to register all their Globus GT5 services in GOCDB, since this is a good time now with the upcoming SAM/Nagios release.
- IGE will officially take over support for Nagios probes, details to be fixed.
- LRZ will be an EA for Globus.
- Looking for a future staged-rollout manager.
- Globus/IGE people now also in EMI ComputeAccounting working group.
- Next meeting second week of July. http://www.doodle.com/dk532e78uerp84nh
- Further information: Globus integration task force
Major problems in operations since this weekend due to waterfloding of NBI computerhall in Copenhagen infecting most NorduGrid infrastructure (GIIS, Mail, SVN, Download) except WWW. GIIS not working effects BDII services. Services went totally down from Saturday evening until Sunday afternoon. Emergency diesel power flooded as well. Some services still effected now: The one of the four global GIIS servers in Denmark and e.g NDGF-T1 mail server is also still down. Possible effect on all sites under http://www.nordugrid.org/monitor/ ARC-CEs in Copenhagen killed. d-Cache Pools in Denmark still kept alive. Most other ARC workernodes free and working fine, but no new jobs coming in. Weatherforcast for Denmark still bad after this worst Thunderstorm in history.
2. Operational Issues
2.1 Publishing site information in BDII
Most of the site in the EGI integrated infrastructure are correctly publishing SiteOtherInfo : GRID=EGI. There are still site that are publishing GRID=EGEE and the Resource infrastructure Provider name as EGEE_ROC instead of EGI_NGI:
GlueSiteOtherInfo: GRID=EGEE GlueSiteOtherInfo: EGEE_SERVICE=prod GlueSiteOtherInfo: EGEE_ROC=XXX
GlueSiteOtherInfo: EGEE_SERVICE=prod GlueSiteOtherInfo: EGI_NGI=XXX GlueSiteOtherInfo: GRID=EGI
The EGEE_ROC has to be always replaced by EGI_NGI. Sites that are publishing both GRID=EGEE andGRID=EGI should remove the first attribute.
2.2 Batch System survey results
Survey link : The deadline was June 30th 2011, but the survey is still open. It will be closed in the next days.
- 230 surveys submitted (including information from 238 sites)
- Question: Which batch system are you currently deploying?
- Question: Are you planning to replace your batch system?
2.3 Purging of LB
glite-lb-purge fails on glite 3.2 LB (https://ggus.eu/tech/ticket_show.php?ticket=67151): even if the jobs are purged the database keeps increasing in size which is less than ideal. Patch ready for release in EMI, but currently not scheduled for release in gLite 3.2. The proposal is to reasses the impact of the issue flagged as "less urgent" in GGUS, in order to have the problem fixed in gLite 3.2 too.
3.1 gridops domain decomissioned
All the operations tools are no more reachable through the previous domain *.gridops.org.
All the *.egi.eu aliases are already available, you can find them in the Tools wiki page Tools
Next Meeting proposal: July 18th h 14:00