Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Agenda-04-07-2011

From EGIWiki
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security

Detailed agenda: Grid Operations Meeting 04 July 2011 14h00 Amsterdam time

Minutes

1. Middleware releases and staged rollout

1.1 EMI-1 release status (Cristina)

Slides from Cristina

  • EMI Update 2 23.06.2011
    • CREAM&CEMon v. 1.13.1
  • EMI Update 3: 07.07.2011
    • Storm SE (First release in EMI) v. 1.7.0
    • L&B v. 3.0.12
    • glite-proxyrenewal v. 1.3.21
    • glite-MPI v. 1.0.1
    • UNICORE UVOS v. 1.4.2

1.2. EMI/UMD current status

1.3. Staged Rollout (Mario)

1.3.1 gLite 3.1 series
  • WMS 3.2.17: installed and in production, waiting for the staged rollout report
1.3.2 gLite 3.2 series
  • gLexec: EA teste it, waiting for the staged rollout report
1.3.3 EMI1 - UMD1
  • 27 products are in the UMDStore area, which means that staged rollout has been performed, and they will be in the UMD1 release.
  • The products missing (at the time of this meeting) and under staged rollout, are: arc-ce, arc-clients and cream (from EMI update 2)
  • We are now in the process of preparing the release: collect release notes, issues found in verification and staged rollout, workarounds, etc..


Staged-rollout         GGUS Tickets DocDB ID   EA teams
RT ticket ID Product - sw-rel Ticket Verif StgRllt ET (Finish) Verif StgRllt Verif SR UMDStore done waiting
2269 EMI.apel.sl5.x86_64 DONE DONE 28-Jun     551 607 OK 2  
2431 EMI.arc-ce.sl5.x86_64 DONE OnGoing 5-Jul 71120   608 wait   4 arc EA teams
2493 EMI.arc-clients.sl5.x86_64 DONE OnGoing 5-Jul     639 wait    
  EMI.arc-infosys.sl5.x86_64 OnGoing     71129            
2303 EMI.argus.sl5.x86_64 DONE DONE 28-Jun     572 604 OK 3  
2270 EMI.bdii-site.sl5.x86_64 DONE DONE 23-Jun     552 574 OK 1  
2271 EMI.bdii-top.sl5.x86_64 DONE DOME 23-Jun     553 575 OK 1  
2343 EMI.cluster.sl5.x86_64 DONE DONE 28-Jun     596 637 OK 1  
2263 EMI.cream.sl5.x86_64 DONE DONE 28-Jun     549 577 OK -Supersed 3  
  EMI.dcache.sl5.x86_64 Not Started                  
2300 EMI.dgas.sl5.x86_64 DONE DONE 28-Jun     549 577 OK 1  
2305 EMI.dpm.sl5.x86_64 DONE DONE 28-Jun 71205 71353 71357   573 614 OK 2  
2336 EMI.glexec_wn.sl5.x86_64 DONE DONE 28-Jun 71569   594 618 OK 1  
2347 EMI.lb.sl5.x86_64 DONE DONE 28-Jun 71448 71449   597 605 OK 3  
2342 EMI.lfc_mysql.sl5.x86_64 DONE DONE 28-Jun     595 636 OK 1  
  EMI.lfc_oracle.sl5.x86_64 Rejected     71593 71607            
2323 EMI.lsf-utils.sl5.x86_64 DONE DONE 28-Jun     586 577 OK 1  
  EMI.mpi.sl5.x86_64 Rejected     71304   566        
2273 EMI.proxyrenewal.sl5.x86_64 DONE DONE 23-Jun     558 576 OK 1  
2315 EMI.torque-client.sl5.x86_64 DONE DONE 28-Jun     560 617 OK 3  
2265 EMI.torque-server.sl5.x86_64 DONE DONE 23-Jun     549 578 OK 1  
2264 EMI.torque-utils.sl5.x86_64 DONE DONE 23-Jun     549 579 OK 1  
2262 EMI.ui.sl5.x86_64 DONE DONE 5-Jul   72196 543 641 OK 1  
2284 EMI.unicore-client.sl5.x86_64 DONE DONE 28-Jun     539 630 OK 1  
2285 EMI.unicore-gateway.sl5.x86_64 DONE DONE 28-Jun     547 631 OK 2  
2286 EMI.unicore-hila.sl5.x86_64 DONE DONE 28-Jun     550 632 OK 1  
  EMI.unicore-registry.sl5.x86_64 Rejected         537        
2289 EMI.unicore-tsi.sl5.x86_64 DONE DONE 28-Jun     548 634 OK 2  
2290 EMI.unicore-uvos.sl5.x86_64 DONE DONE 28-Jun     548 635 OK 1  
2288 EMI.unicore-ws.sl5.x86_64 DONE DONE 28-Jun     545 629 OK 2  
2287 EMI.unicore-xuudb.sl5.x86_64 DONE DONE 28-Jun     546 633 OK 1  
2272 EMI.voms_mysql.sl5.x86_64 DONE DONE 23-Jun     554 603 OK 2  
  EMI.voms_oracle.sl5.x86_64 onHOLD                  
  EMI.wms.sl5.x86_64 Rejected     71168 71065 71190   567        
2314 EMI.wn.sl5.x86_64 DONE DONE 28-Jun 71198 71167 71723 560 617 OK 3  
2489 EMI.cream.sl5.x86_64 DONE OnGoing 5-Jul     625   waiting 1  
2498 EMI.unicore-registry.sl5.x86_64 DONE DONE       640 642 OK 1  

1.4 Interoperability (Michaela)

UNICORE
Globus
ARC

Major problems in operations since this weekend due to waterfloding of NBI computerhall in Copenhagen infecting most NorduGrid infrastructure (GIIS, Mail, SVN, Download) except WWW. GIIS not working effects BDII services. Services went totally down from Saturday evening until Sunday afternoon. Emergency diesel power flooded as well. Some services still effected now: The one of the four global GIIS servers in Denmark and e.g NDGF-T1 mail server is also still down. Possible effect on all sites under http://www.nordugrid.org/monitor/ ARC-CEs in Copenhagen killed. d-Cache Pools in Denmark still kept alive. Most other ARC workernodes free and working fine, but no new jobs coming in. Weatherforcast for Denmark still bad after this worst Thunderstorm in history.

2. Operational Issues

2.1 Publishing site information in BDII

Most of the site in the EGI integrated infrastructure are correctly publishing SiteOtherInfo : GRID=EGI. There are still site that are publishing GRID=EGEE and the Resource infrastructure Provider name as EGEE_ROC instead of EGI_NGI:

GlueSiteOtherInfo: GRID=EGEE
GlueSiteOtherInfo: EGEE_SERVICE=prod
GlueSiteOtherInfo: EGEE_ROC=XXX

Should Be:

GlueSiteOtherInfo: EGEE_SERVICE=prod
GlueSiteOtherInfo: EGI_NGI=XXX
GlueSiteOtherInfo: GRID=EGI

The EGEE_ROC has to be always replaced by EGI_NGI. Sites that are publishing both GRID=EGEE andGRID=EGI should remove the first attribute.

2.2 Batch System survey results

Survey link : The deadline was June 30th 2011, but the survey is still open. It will be closed in the next days.

  • 230 surveys submitted (including information from 238 sites)
  • Question: Which batch system are you currently deploying?
Torque/Maui 151
Torque 40
SGE 20
LSF 18
PBS-pro 13
PBS/Moab 7
Slurm 5
Condor 3
Load Leveler 3
  • Question: Are you planning to replace your batch system?
No plans 205
SGE 8
Slurm 8
Torque 4
Maui 3
Condor 2

2.3 Purging of LB

glite-lb-purge fails on glite 3.2 LB (https://ggus.eu/tech/ticket_show.php?ticket=67151): even if the jobs are purged the database keeps increasing in size which is less than ideal. Patch ready for release in EMI, but currently not scheduled for release in gLite 3.2. The proposal is to reasses the impact of the issue flagged as "less urgent" in GGUS, in order to have the problem fixed in gLite 3.2 too.

3. AOB

3.1 gridops domain decomissioned

All the operations tools are no more reachable through the previous domain *.gridops.org.
All the *.egi.eu aliases are already available, you can find them in the Tools wiki page Tools

3.2

Next Meeting proposal: July 18th h 14:00