Difference between revisions of "Agenda-04-07-2011"
Line 602: | Line 602: | ||
==== 2.2 Batch System survey results ==== | ==== 2.2 Batch System survey results ==== | ||
[http://www.zoomerang.com/Survey/WEB22CE9DXKZDT/ Survey link] : The deadline was June 30th 2011. | [http://www.zoomerang.com/Survey/WEB22CE9DXKZDT/ Survey link] : The deadline was June 30th 2011, but the survey is still open. It will be closed in the next days. | ||
* 230 surveys submitted (including information from 238 sites) | * 230 surveys submitted (including information from 238 sites) |
Revision as of 12:13, 4 July 2011
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Detailed agenda: Grid Operations Meeting 04 July 2011 14h00 Amsterdam time
- Indico page
- EVO details on indico
- Direct EVO link
- pass: gridops
1. Middleware releases and staged rollout
1.1 EMI-1 release status (Cristina)
- EMI Update 2 23.06.2011
- CREAM&CEMon v. 1.13.1
- EMI Update 3: 07.07.2011
- Storm SE (First release in EMI) v. 1.7.0
- L&B v. 3.0.12
- glite-proxyrenewal v. 1.3.21
- glite-MPI v. 1.0.1
- UNICORE UVOS v. 1.4.2
1.2. EMI/UMD current status
1.3. Staged Rollout (Mario)
1.3.1 gLite 3.1 series
- WMS 3.2.17: installed and in production, waiting for the staged rollout report
1.3.2 gLite 3.2 series
- gLexec: No EA has answered the request yet for staged rollout. (This update is needed to bring in new versions of the LCMAPS-plugins-pep-c and PEP-API.)
1.3.3 EMI1 - UMD1
- 27 products are in the UMDStore area, which means that staged rollout has been performed, and they will be in the UMD1 release.
- The products missing (at the time of this meeting) and under staged rollout, are: arc-ce, arc-clients and cream (from EMI update 2)
- We are now in the process of preparing the release: collect release notes, issues found in verification and staged rollout, workarounds, etc..
Staged-rollout | GGUS Tickets | DocDB ID | EA teams | ||||||||
RT ticket ID | Product - sw-rel Ticket | Verif | StgRllt | ET (Finish) | Verif | StgRllt | Verif | SR | UMDStore | done | waiting |
2269 | EMI.apel.sl5.x86_64 | DONE | DONE | 28-Jun | 551 | 607 | OK | 2 | |||
2431 | EMI.arc-ce.sl5.x86_64 | DONE | OnGoing | 5-Jul | 71120 | 608 | wait | 4 arc EA teams | |||
2493 | EMI.arc-clients.sl5.x86_64 | DONE | OnGoing | 5-Jul | 639 | wait | |||||
EMI.arc-infosys.sl5.x86_64 | OnGoing | 71129 | |||||||||
2303 | EMI.argus.sl5.x86_64 | DONE | DONE | 28-Jun | 572 | 604 | OK | 3 | |||
2270 | EMI.bdii-site.sl5.x86_64 | DONE | DONE | 23-Jun | 552 | 574 | OK | 1 | |||
2271 | EMI.bdii-top.sl5.x86_64 | DONE | DOME | 23-Jun | 553 | 575 | OK | 1 | |||
2343 | EMI.cluster.sl5.x86_64 | DONE | DONE | 28-Jun | 596 | 637 | OK | 1 | |||
2263 | EMI.cream.sl5.x86_64 | DONE | DONE | 28-Jun | 549 | 577 | OK -Supersed | 3 | |||
EMI.dcache.sl5.x86_64 | Not Started | ||||||||||
2300 | EMI.dgas.sl5.x86_64 | DONE | DONE | 28-Jun | 549 | 577 | OK | 1 | |||
2305 | EMI.dpm.sl5.x86_64 | DONE | DONE | 28-Jun | 71205 71353 71357 | 573 | 614 | OK | 2 | ||
2336 | EMI.glexec_wn.sl5.x86_64 | DONE | DONE | 28-Jun | 71569 | 594 | 618 | OK | 1 | ||
2347 | EMI.lb.sl5.x86_64 | DONE | DONE | 28-Jun | 71448 71449 | 597 | 605 | OK | 3 | ||
2342 | EMI.lfc_mysql.sl5.x86_64 | DONE | DONE | 28-Jun | 595 | 636 | OK | 1 | |||
EMI.lfc_oracle.sl5.x86_64 | Rejected | 71593 71607 | |||||||||
2323 | EMI.lsf-utils.sl5.x86_64 | DONE | DONE | 28-Jun | 586 | 577 | OK | 1 | |||
EMI.mpi.sl5.x86_64 | Rejected | 71304 | 566 | ||||||||
2273 | EMI.proxyrenewal.sl5.x86_64 | DONE | DONE | 23-Jun | 558 | 576 | OK | 1 | |||
2315 | EMI.torque-client.sl5.x86_64 | DONE | DONE | 28-Jun | 560 | 617 | OK | 3 | |||
2265 | EMI.torque-server.sl5.x86_64 | DONE | DONE | 23-Jun | 549 | 578 | OK | 1 | |||
2264 | EMI.torque-utils.sl5.x86_64 | DONE | DONE | 23-Jun | 549 | 579 | OK | 1 | |||
2262 | EMI.ui.sl5.x86_64 | DONE | DONE | 5-Jul | 72196 | 543 | 641 | OK | 1 | ||
2284 | EMI.unicore-client.sl5.x86_64 | DONE | DONE | 28-Jun | 539 | 630 | OK | 1 | |||
2285 | EMI.unicore-gateway.sl5.x86_64 | DONE | DONE | 28-Jun | 547 | 631 | OK | 2 | |||
2286 | EMI.unicore-hila.sl5.x86_64 | DONE | DONE | 28-Jun | 550 | 632 | OK | 1 | |||
EMI.unicore-registry.sl5.x86_64 | Rejected | 537 | |||||||||
2289 | EMI.unicore-tsi.sl5.x86_64 | DONE | DONE | 28-Jun | 548 | 634 | OK | 2 | |||
2290 | EMI.unicore-uvos.sl5.x86_64 | DONE | DONE | 28-Jun | 548 | 635 | OK | 1 | |||
2288 | EMI.unicore-ws.sl5.x86_64 | DONE | DONE | 28-Jun | 545 | 629 | OK | 2 | |||
2287 | EMI.unicore-xuudb.sl5.x86_64 | DONE | DONE | 28-Jun | 546 | 633 | OK | 1 | |||
2272 | EMI.voms_mysql.sl5.x86_64 | DONE | DONE | 23-Jun | 554 | 603 | OK | 2 | |||
EMI.voms_oracle.sl5.x86_64 | onHOLD | ||||||||||
EMI.wms.sl5.x86_64 | Rejected | 71168 71065 71190 | 567 | ||||||||
2314 | EMI.wn.sl5.x86_64 | DONE | DONE | 28-Jun | 71198 71167 | 71723 | 560 | 617 | OK | 3 | |
2489 | EMI.cream.sl5.x86_64 | DONE | OnGoing | 5-Jul | 625 | waiting | 1 | ||||
2498 | EMI.unicore-registry.sl5.x86_64 | DONE | DONE | 640 | 642 | OK | 1 |
1.4 Interoperability (Michaela)
UNICORE
- Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=498
- New servicetypes added to GOCDB, descriptions updated in last meeting. Now about to add more UNICORE services into GOCDB.
- https://goc.gridops.org/portal/index.php?Page_Type=View_Object&object_id=22973&grid_id=0
- https://goc.egi.eu/portal/index.php?Page_Type=View_Object&object_id=14727&grid_id=0
- Found problems with parsing of some characters in Service Point URL field.
- Unexpected UNICORE Nagios probes integration delay due to misjudging the amount of effort needed for actual last step integration. Deadline for SAM Update-12 release missed. Next deadline is SAM Update-13 which will be released around the end of July.
- NGI_BY had to draw back their permission to go open source with their UNICORE accounting solution. Hoping to have time to investigate further in this matter now after the first years EC review.
- UNICORE summit http://www.unicore.eu/summit/2011/ at Nicolaus Copernicus University, Torun, Poland, on July 7th - 8th
- Next meeting second or forth week of July.
- Further information: UNICORE integration task force
Globus
- Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=496
- Reminder to all NGIs to tell their sites to register all their Globus GT5 services in GOCDB, since this is a good time now with the upcoming SAM/Nagios release.
- IGE will officially take over support for Nagios probes, details to be fixed.
- LRZ will be an EA for Globus.
- Looking for a future staged-rollout manager.
- Globus/IGE people now also in EMI ComputeAccounting working group.
- Next meeting second week of July.
- Further information: Globus integration task force
ARC
Major problems in operations since this weekend due to waterfloding of NBI computerhall in Copenhagen infecting most NorduGrid infrastructure (GIIS, Mail, SVN, Download) except WWW. GIIS not working effects BDII services. Services went totally down from Saturday evening until Sunday afternoon. Emergency diesel power flooded as well. Some services still effected now: The one of the four global GIIS servers in Denmark and e.g NDGF-T1 mail server is also still down. Possible effect on all sites under http://www.nordugrid.org/monitor/ ARC-CEs in Copenhagen killed. d-Cache Pools in Denmark still kept alive. Most other ARC workernodes free and working fine, but no new jobs coming in. Weatherforcast for Denmark still bad after this worst Thunderstorm in history.
2. Operational Issues
2.1 Publishing site information in BDII
Most of the site in the EGI integrated infrastructure are correctly publishing SiteOtherInfo : GRID=EGI. There are still site that are publishing GRID=EGEE and the Resource infrastructure Provider name as EGEE_ROC instead of EGI_NGI:
GlueSiteOtherInfo: GRID=EGEE GlueSiteOtherInfo: EGEE_SERVICE=prod GlueSiteOtherInfo: EGEE_ROC=XXX
Should Be:
GlueSiteOtherInfo: EGEE_SERVICE=prod GlueSiteOtherInfo: EGI_NGI=XXX GlueSiteOtherInfo: GRID=EGI
The EGEE_ROC has to be always replaced by EGI_NGI. Sites that are publishing both GRID=EGEE andGRID=EGI should remove the first attribute.
- sites still using these values are available from gstat:
- Current manual
2.2 Batch System survey results
Survey link : The deadline was June 30th 2011, but the survey is still open. It will be closed in the next days.
- 230 surveys submitted (including information from 238 sites)
- Question: Which batch system are you currently deploying?
Torque/Maui | 151 |
Torque | 40 |
SGE | 20 |
LSF | 18 |
PBS-pro | 13 |
PBS/Moab | 7 |
Slurm | 5 |
Condor | 3 |
Load Leveler | 3 |
- Question: Are you planning to replace your batch system?
No plans | 205 |
SGE | 8 |
Slurm | 8 |
Torque | 4 |
Maui | 3 |
Condor | 2 |
3. AOB
3.1
Next Meeting: