Difference between revisions of "Agenda-04-07-2011"
(27 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
{{Template:Op menubar}} | {{Template:Op menubar}} | ||
[[Category:Grid Operations Meetings]] | |||
= Detailed agenda: Grid Operations Meeting 04 July 2011 14h00 Amsterdam time = | = Detailed agenda: Grid Operations Meeting 04 July 2011 14h00 Amsterdam time = | ||
Line 6: | Line 7: | ||
*EVO details [https://www.egi.eu/indico/materialDisplay.py?materialId=0&confId=507 on indico] | *EVO details [https://www.egi.eu/indico/materialDisplay.py?materialId=0&confId=507 on indico] | ||
**[http://evo.caltech.edu/evoNext/koala.jnlp?meeting=MsMiMI282MDuDl9u9tDt9s Direct EVO link] | **[http://evo.caltech.edu/evoNext/koala.jnlp?meeting=MsMiMI282MDuDl9u9tDt9s Direct EVO link] | ||
**pass: gridops | **pass: gridops<br> | ||
[https://www.egi.eu/indico/materialDisplay.py?materialId=minutes&confId=507 Minutes] | |||
== 1. Middleware releases and staged rollout == | == 1. Middleware releases and staged rollout == | ||
Line 33: | Line 35: | ||
===== 1.3.2 gLite 3.2 series ===== | ===== 1.3.2 gLite 3.2 series ===== | ||
*gLexec: | *gLexec: EA teste it, waiting for the staged rollout report | ||
===== 1.3.3 EMI1 - UMD1<br> ===== | ===== 1.3.3 EMI1 - UMD1<br> ===== | ||
Line 43: | Line 45: | ||
<br> | <br> | ||
{| | {| cellspacing="0" cellpadding="0" border="2" style="border-collapse: collapse; width: 949px; height: 776px;" | ||
|- | |- | ||
| width="74" height="12" class="xl26" | [[Staged-rollout]] | | width="74" height="12" class="xl26" | [[Staged-rollout]] | ||
Line 50: | Line 52: | ||
| width="62" class="xl24" | | | width="62" class="xl24" | | ||
| width="65" class="xl24" | | | width="65" class="xl24" | | ||
| width="162" | | width="162" colspan="2" class="xl26" | GGUS Tickets | ||
| width="84" | | width="84" colspan="2" class="xl26" | DocDB ID | ||
| width="65" class="xl24" | | | width="65" class="xl24" | | ||
| width="115" | | width="115" colspan="2" class="xl26" | EA teams | ||
|- | |- | ||
| height="12" class="xl26" | RT ticket ID | | height="12" class="xl26" | RT ticket ID | ||
Line 556: | Line 558: | ||
*Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=498 | *Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=498 | ||
*New servicetypes added to GOCDB, descriptions updated in last meeting. Now about to add more UNICORE services into GOCDB. | *New servicetypes added to GOCDB, descriptions updated in last meeting. Now about to add more UNICORE services into GOCDB. | ||
**https://goc. | **https://goc.egi.eu/portal/index.php?Page_Type=View_Object&object_id=22973&grid_id=0 | ||
**https://goc.egi.eu/portal/index.php?Page_Type=View_Object&object_id=14727&grid_id=0 | **https://goc.egi.eu/portal/index.php?Page_Type=View_Object&object_id=14727&grid_id=0 | ||
**Found problems with parsing of some characters in Service Point URL field. | **Found problems with parsing of some characters in Service Point URL field. | ||
Line 562: | Line 564: | ||
*NGI_BY had to draw back their permission to go open source with their UNICORE accounting solution. Hoping to have time to investigate further in this matter now after the first years EC review. | *NGI_BY had to draw back their permission to go open source with their UNICORE accounting solution. Hoping to have time to investigate further in this matter now after the first years EC review. | ||
*UNICORE summit http://www.unicore.eu/summit/2011/ at Nicolaus Copernicus University, Torun, Poland, on July 7th - 8th | *UNICORE summit http://www.unicore.eu/summit/2011/ at Nicolaus Copernicus University, Torun, Poland, on July 7th - 8th | ||
*Next meeting second or forth week of July. | *Next meeting second or forth week of July. http://www.doodle.com/58w93q6xqi32pvqd | ||
*Further information: [[UNICORE integration task force]] | *Further information: [[UNICORE integration task force]] | ||
Line 573: | Line 575: | ||
*Looking for a future staged-rollout manager. | *Looking for a future staged-rollout manager. | ||
*Globus/IGE people now also in EMI ComputeAccounting working group. | *Globus/IGE people now also in EMI ComputeAccounting working group. | ||
*Next meeting second week of July. | *Next meeting second week of July. http://www.doodle.com/dk532e78uerp84nh | ||
*Further information: [[Globus integration task force]] | *Further information: [[Globus integration task force]] | ||
Line 599: | Line 601: | ||
***[http://gstat.egi.eu/gstat/summary/GRID/EGEE/ EGEE sites] | ***[http://gstat.egi.eu/gstat/summary/GRID/EGEE/ EGEE sites] | ||
***[http://gstat.egi.eu/gstat/summary/EGEE_ROC/ALL/ EGEE_ROC sites] | ***[http://gstat.egi.eu/gstat/summary/EGEE_ROC/ALL/ EGEE_ROC sites] | ||
**[ | **[[MAN1_How_to_publish_Site_Information| Current manual]] | ||
==== 2.2 Batch System survey results ==== | ==== 2.2 Batch System survey results ==== | ||
[http://www.zoomerang.com/Survey/WEB22CE9DXKZDT/ Survey link] : The deadline was June 30th 2011. | [http://www.zoomerang.com/Survey/WEB22CE9DXKZDT/ Survey link] : The deadline was June 30th 2011, but the survey is still open. It will be closed in the next days. | ||
* 230 surveys submitted (including information from 238 sites) | * 230 surveys submitted (including information from 238 sites) | ||
* Question: Which | * ''Question: Which batch system are you currently deploying?'' | ||
{| cellspacing="0" cellpadding="5" border="0" | {| cellspacing="0" cellpadding="5" border="0" | ||
|- | |- | ||
|'''Torque/Maui''' || 151 | |'''Torque/Maui''' || 151 | ||
|- | |||
|'''Torque'''||40 | |'''Torque'''||40 | ||
|- | |||
|'''SGE'''||20 | |'''SGE'''||20 | ||
|- | |||
|'''LSF'''||18 | |'''LSF'''||18 | ||
|- | |||
|'''PBS-pro'''||13 | |'''PBS-pro'''||13 | ||
|- | |||
|'''PBS/Moab'''||7 | |'''PBS/Moab'''||7 | ||
|- | |||
|'''Slurm'''||5 | |'''Slurm'''||5 | ||
|- | |||
|'''Condor'''|| 3 | |'''Condor'''|| 3 | ||
|- | |||
|'''Load Leveler''' || 3 | |'''Load Leveler''' || 3 | ||
|} | |} | ||
*''Question: Are you planning to replace your batch system?'' | |||
{| cellspacing="0" cellpadding="5" border="0" | |||
|- | |||
|'''No plans''' || 205 | |||
|- | |||
|'''SGE'''||8 | |||
|- | |||
|'''Slurm'''||8 | |||
|- | |||
|'''Torque'''||4 | |||
|- | |||
|'''Maui'''||3 | |||
|- | |||
|'''Condor'''||2 | |||
|} | |||
==== 2.3 Purging of LB ==== | |||
glite-lb-purge fails on glite 3.2 LB (https://ggus.eu/tech/ticket_show.php?ticket=67151): even if the jobs are purged the database keeps increasing in size which is less than ideal. Patch ready for release in EMI, but currently not scheduled for release in gLite 3.2. | |||
The proposal is to reasses the impact of the issue flagged as "less urgent" in GGUS, in order to have the problem fixed in gLite 3.2 too. | |||
=== 3. AOB === | === 3. AOB === | ||
==== 3.1 gridops domain decomissioned ==== | |||
All the operations tools are no more reachable through the previous domain ''*.gridops.org''. <br> | |||
All the *.egi.eu aliases are already available, you can find them in the Tools wiki page [[Tools]] | |||
==== 3.2 ==== | |||
Next Meeting proposal: July 18th h 14:00 | |||
Next Meeting: | |||
<br> |
Latest revision as of 17:15, 29 November 2012
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Detailed agenda: Grid Operations Meeting 04 July 2011 14h00 Amsterdam time
- Indico page
- EVO details on indico
- Direct EVO link
- pass: gridops
1. Middleware releases and staged rollout
1.1 EMI-1 release status (Cristina)
- EMI Update 2 23.06.2011
- CREAM&CEMon v. 1.13.1
- EMI Update 3: 07.07.2011
- Storm SE (First release in EMI) v. 1.7.0
- L&B v. 3.0.12
- glite-proxyrenewal v. 1.3.21
- glite-MPI v. 1.0.1
- UNICORE UVOS v. 1.4.2
1.2. EMI/UMD current status
1.3. Staged Rollout (Mario)
1.3.1 gLite 3.1 series
- WMS 3.2.17: installed and in production, waiting for the staged rollout report
1.3.2 gLite 3.2 series
- gLexec: EA teste it, waiting for the staged rollout report
1.3.3 EMI1 - UMD1
- 27 products are in the UMDStore area, which means that staged rollout has been performed, and they will be in the UMD1 release.
- The products missing (at the time of this meeting) and under staged rollout, are: arc-ce, arc-clients and cream (from EMI update 2)
- We are now in the process of preparing the release: collect release notes, issues found in verification and staged rollout, workarounds, etc..
Staged-rollout | GGUS Tickets | DocDB ID | EA teams | ||||||||
RT ticket ID | Product - sw-rel Ticket | Verif | StgRllt | ET (Finish) | Verif | StgRllt | Verif | SR | UMDStore | done | waiting |
2269 | EMI.apel.sl5.x86_64 | DONE | DONE | 28-Jun | 551 | 607 | OK | 2 | |||
2431 | EMI.arc-ce.sl5.x86_64 | DONE | OnGoing | 5-Jul | 71120 | 608 | wait | 4 arc EA teams | |||
2493 | EMI.arc-clients.sl5.x86_64 | DONE | OnGoing | 5-Jul | 639 | wait | |||||
EMI.arc-infosys.sl5.x86_64 | OnGoing | 71129 | |||||||||
2303 | EMI.argus.sl5.x86_64 | DONE | DONE | 28-Jun | 572 | 604 | OK | 3 | |||
2270 | EMI.bdii-site.sl5.x86_64 | DONE | DONE | 23-Jun | 552 | 574 | OK | 1 | |||
2271 | EMI.bdii-top.sl5.x86_64 | DONE | DOME | 23-Jun | 553 | 575 | OK | 1 | |||
2343 | EMI.cluster.sl5.x86_64 | DONE | DONE | 28-Jun | 596 | 637 | OK | 1 | |||
2263 | EMI.cream.sl5.x86_64 | DONE | DONE | 28-Jun | 549 | 577 | OK -Supersed | 3 | |||
EMI.dcache.sl5.x86_64 | Not Started | ||||||||||
2300 | EMI.dgas.sl5.x86_64 | DONE | DONE | 28-Jun | 549 | 577 | OK | 1 | |||
2305 | EMI.dpm.sl5.x86_64 | DONE | DONE | 28-Jun | 71205 71353 71357 | 573 | 614 | OK | 2 | ||
2336 | EMI.glexec_wn.sl5.x86_64 | DONE | DONE | 28-Jun | 71569 | 594 | 618 | OK | 1 | ||
2347 | EMI.lb.sl5.x86_64 | DONE | DONE | 28-Jun | 71448 71449 | 597 | 605 | OK | 3 | ||
2342 | EMI.lfc_mysql.sl5.x86_64 | DONE | DONE | 28-Jun | 595 | 636 | OK | 1 | |||
EMI.lfc_oracle.sl5.x86_64 | Rejected | 71593 71607 | |||||||||
2323 | EMI.lsf-utils.sl5.x86_64 | DONE | DONE | 28-Jun | 586 | 577 | OK | 1 | |||
EMI.mpi.sl5.x86_64 | Rejected | 71304 | 566 | ||||||||
2273 | EMI.proxyrenewal.sl5.x86_64 | DONE | DONE | 23-Jun | 558 | 576 | OK | 1 | |||
2315 | EMI.torque-client.sl5.x86_64 | DONE | DONE | 28-Jun | 560 | 617 | OK | 3 | |||
2265 | EMI.torque-server.sl5.x86_64 | DONE | DONE | 23-Jun | 549 | 578 | OK | 1 | |||
2264 | EMI.torque-utils.sl5.x86_64 | DONE | DONE | 23-Jun | 549 | 579 | OK | 1 | |||
2262 | EMI.ui.sl5.x86_64 | DONE | DONE | 5-Jul | 72196 | 543 | 641 | OK | 1 | ||
2284 | EMI.unicore-client.sl5.x86_64 | DONE | DONE | 28-Jun | 539 | 630 | OK | 1 | |||
2285 | EMI.unicore-gateway.sl5.x86_64 | DONE | DONE | 28-Jun | 547 | 631 | OK | 2 | |||
2286 | EMI.unicore-hila.sl5.x86_64 | DONE | DONE | 28-Jun | 550 | 632 | OK | 1 | |||
EMI.unicore-registry.sl5.x86_64 | Rejected | 537 | |||||||||
2289 | EMI.unicore-tsi.sl5.x86_64 | DONE | DONE | 28-Jun | 548 | 634 | OK | 2 | |||
2290 | EMI.unicore-uvos.sl5.x86_64 | DONE | DONE | 28-Jun | 548 | 635 | OK | 1 | |||
2288 | EMI.unicore-ws.sl5.x86_64 | DONE | DONE | 28-Jun | 545 | 629 | OK | 2 | |||
2287 | EMI.unicore-xuudb.sl5.x86_64 | DONE | DONE | 28-Jun | 546 | 633 | OK | 1 | |||
2272 | EMI.voms_mysql.sl5.x86_64 | DONE | DONE | 23-Jun | 554 | 603 | OK | 2 | |||
EMI.voms_oracle.sl5.x86_64 | onHOLD | ||||||||||
EMI.wms.sl5.x86_64 | Rejected | 71168 71065 71190 | 567 | ||||||||
2314 | EMI.wn.sl5.x86_64 | DONE | DONE | 28-Jun | 71198 71167 | 71723 | 560 | 617 | OK | 3 | |
2489 | EMI.cream.sl5.x86_64 | DONE | OnGoing | 5-Jul | 625 | waiting | 1 | ||||
2498 | EMI.unicore-registry.sl5.x86_64 | DONE | DONE | 640 | 642 | OK | 1 |
1.4 Interoperability (Michaela)
UNICORE
- Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=498
- New servicetypes added to GOCDB, descriptions updated in last meeting. Now about to add more UNICORE services into GOCDB.
- https://goc.egi.eu/portal/index.php?Page_Type=View_Object&object_id=22973&grid_id=0
- https://goc.egi.eu/portal/index.php?Page_Type=View_Object&object_id=14727&grid_id=0
- Found problems with parsing of some characters in Service Point URL field.
- Unexpected UNICORE Nagios probes integration delay due to misjudging the amount of effort needed for actual last step integration. Deadline for SAM Update-12 release missed. Next deadline is SAM Update-13 which will be released around the end of July.
- NGI_BY had to draw back their permission to go open source with their UNICORE accounting solution. Hoping to have time to investigate further in this matter now after the first years EC review.
- UNICORE summit http://www.unicore.eu/summit/2011/ at Nicolaus Copernicus University, Torun, Poland, on July 7th - 8th
- Next meeting second or forth week of July. http://www.doodle.com/58w93q6xqi32pvqd
- Further information: UNICORE integration task force
Globus
- Last meeting was https://www.egi.eu/indico/conferenceDisplay.py?confId=496
- Reminder to all NGIs to tell their sites to register all their Globus GT5 services in GOCDB, since this is a good time now with the upcoming SAM/Nagios release.
- IGE will officially take over support for Nagios probes, details to be fixed.
- LRZ will be an EA for Globus.
- Looking for a future staged-rollout manager.
- Globus/IGE people now also in EMI ComputeAccounting working group.
- Next meeting second week of July. http://www.doodle.com/dk532e78uerp84nh
- Further information: Globus integration task force
ARC
Major problems in operations since this weekend due to waterfloding of NBI computerhall in Copenhagen infecting most NorduGrid infrastructure (GIIS, Mail, SVN, Download) except WWW. GIIS not working effects BDII services. Services went totally down from Saturday evening until Sunday afternoon. Emergency diesel power flooded as well. Some services still effected now: The one of the four global GIIS servers in Denmark and e.g NDGF-T1 mail server is also still down. Possible effect on all sites under http://www.nordugrid.org/monitor/ ARC-CEs in Copenhagen killed. d-Cache Pools in Denmark still kept alive. Most other ARC workernodes free and working fine, but no new jobs coming in. Weatherforcast for Denmark still bad after this worst Thunderstorm in history.
2. Operational Issues
2.1 Publishing site information in BDII
Most of the site in the EGI integrated infrastructure are correctly publishing SiteOtherInfo : GRID=EGI. There are still site that are publishing GRID=EGEE and the Resource infrastructure Provider name as EGEE_ROC instead of EGI_NGI:
GlueSiteOtherInfo: GRID=EGEE GlueSiteOtherInfo: EGEE_SERVICE=prod GlueSiteOtherInfo: EGEE_ROC=XXX
Should Be:
GlueSiteOtherInfo: EGEE_SERVICE=prod GlueSiteOtherInfo: EGI_NGI=XXX GlueSiteOtherInfo: GRID=EGI
The EGEE_ROC has to be always replaced by EGI_NGI. Sites that are publishing both GRID=EGEE andGRID=EGI should remove the first attribute.
- sites still using these values are available from gstat:
- Current manual
2.2 Batch System survey results
Survey link : The deadline was June 30th 2011, but the survey is still open. It will be closed in the next days.
- 230 surveys submitted (including information from 238 sites)
- Question: Which batch system are you currently deploying?
Torque/Maui | 151 |
Torque | 40 |
SGE | 20 |
LSF | 18 |
PBS-pro | 13 |
PBS/Moab | 7 |
Slurm | 5 |
Condor | 3 |
Load Leveler | 3 |
- Question: Are you planning to replace your batch system?
No plans | 205 |
SGE | 8 |
Slurm | 8 |
Torque | 4 |
Maui | 3 |
Condor | 2 |
2.3 Purging of LB
glite-lb-purge fails on glite 3.2 LB (https://ggus.eu/tech/ticket_show.php?ticket=67151): even if the jobs are purged the database keeps increasing in size which is less than ideal. Patch ready for release in EMI, but currently not scheduled for release in gLite 3.2. The proposal is to reasses the impact of the issue flagged as "less urgent" in GGUS, in order to have the problem fixed in gLite 3.2 too.
3. AOB
3.1 gridops domain decomissioned
All the operations tools are no more reachable through the previous domain *.gridops.org.
All the *.egi.eu aliases are already available, you can find them in the Tools wiki page Tools
3.2
Next Meeting proposal: July 18th h 14:00