Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "NGI DE CH Operations Center:Operations Meeting:16032012"

From EGIWiki
Jump to navigation Jump to search
 
(10 intermediate revisions by the same user not shown)
Line 14: Line 14:
  VoOps:
  VoOps:
  Av/Re= 97%
  Av/Re= 97%
  UNI BONN=69%
  UNI BONN only has 69%
   
   
  BDII: Av/Re=99,3%
  BDII: Av/Re=99,3%
Line 21: Line 21:
  https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1969
  https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1969
  https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1978
  https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1978
High rate of unknown Nagios tests. We also contacted sites MPI-K, UNI-Dresden, UNI-Karlsruhe, UNI-Siegen per email. Reason: May be
connection problems
The NGI-DE Nagios was also down during our (KIT, FZKA-LCG2) downtime last week. We will recalculate the availability numbers for
March.
* Staged rollout/updates
* Staged rollout/updates


Line 30: Line 37:
  [[Operations_Surveys#Usage_and_future_maintenance_of_deployed_software|Usage and future maintenance of deployed software]]
  [[Operations_Surveys#Usage_and_future_maintenance_of_deployed_software|Usage and future maintenance of deployed software]]
  [[Operations/Platform Deployment Survey]]
  [[Operations/Platform Deployment Survey]]
For the surveys we need your feedback untill 20st of March via Email


==Round the sites==
==Round the sites==
Line 53: Line 62:
* MPPMU (Cesare Delle Fratte)
* MPPMU (Cesare Delle Fratte)
   - ntr
   - ntr
   - a few problems with CREAMS and ATLA jobs. Problems are solved now.
   - a few problems with CREAMS and ATLAS jobs. Problems are solved now.
* RWTH Aachen
* RWTH Aachen
* SCAI (Andre Gemuend)
* SCAI (Andre Gemuend)
   - Problems with SE: DPM daemon died. We will update EMI-DPM
   - Problems with SE: DPM daemon died. We will update EMI-DPM
   - ROD also filed two tickets (one concerning the DPM daemon problem and one concerning the CREAM CE) instead of only one for the  
   - ROD also filed two tickets (one concerning the DPM daemon problem and one concerning the related CREAM CE) instead of only one  
dpm daemon:  
for the dpm daemon:  
* Uni Bonn
* Uni Bonn
* Uni Dortmund
* Uni Dortmund
* Uni Dresden (Ralph Mueller Pfeeferkorn)
* Uni Dresden (Ralph Mueller Pfeeferkorn)
   - 16/3/12 Last week we had a two days downtime. We updated CREAM, Apel to the EMI release, dCache was updated to version 1.9.12 and dCache update included the upgrade from SL4 to SL5. Now all seems to be fine now  
   - 16/3/12 Last week we had a two days downtime. We updated CREAM, Apel to the EMI release, dCache was updated to version 1.9.12  
and dCache update included the upgrade from SL4 to SL5. Now all seems to be fine now  
   - Problems with EMI release: After the update sometimes the Nagios test fails, with error message "job could not be submitted" or  
   - Problems with EMI release: After the update sometimes the Nagios test fails, with error message "job could not be submitted" or  
  something like that. Comment by KIT: We also tested the BDII in EMI in preproduction. It was not described in the documentation  
  something like that. Comment by KIT: We also tested the BDII in EMI in preproduction. It was not described in the documentation  
Line 81: Line 91:
welcome LRZ Daniel Waldmann
welcome LRZ Daniel Waldmann


* Any problematic tickets?
* Any problematic tickets? No
* Handover of the ROD shift
* Handover of the ROD shift
* ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
* ROD shift schedule https://wiki.egi.eu/wiki/NGI_DE_CH_Operations_Center:Operations_Teams#Shifts_rotation_table
Line 90: Line 100:
  12 19.03 25.03 Team2, FhG
  12 19.03 25.03 Team2, FhG


*Nagios<-->Dashboard issue
*Nagios<-->Dashboard issue  
  https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=2039
  (Nagios send information/notification to the dashboard and the information/notification is not displayed correctly at the dashboard.
https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=2038
In consequence the tickets hang in the dashboard longer than the problem persists)
  https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=2039
  https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=2038


==AOB==
==AOB==
  - Reminder: Operation workshop in April: We hope many sites participate. You can register now. See announcement section above.
  - short Telco meeting before the workshop at 13. April


If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.
If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.

Latest revision as of 10:32, 22 March 2012

Operations Meeting Main

Introduction

  • Minutes of last meeting

Announcements

Feb:
VoOps:
Av/Re= 97%
UNI BONN only has 69%

BDII: Av/Re=99,3%
 
  • Monitoring
https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1969
https://helpdesk.ngi-de.eu/index.php?mode=ticket_info&ticket_id=1978
High rate of unknown Nagios tests. We also contacted sites MPI-K, UNI-Dresden, UNI-Karlsruhe, UNI-Siegen per email. Reason: May be 
connection problems
The NGI-DE Nagios was also down during our (KIT, FZKA-LCG2) downtime last week. We will recalculate the availability numbers for 
March.
  • Staged rollout/updates
UMD:
http://www.eu-emi.eu/emi-1-kebnekaise-updates/-/asset_publisher/Ir6q/content/update-14-16-03-2012
http://www.eu-emi.eu/emi-1-kebnekaise-updates/-/asset_publisher/Ir6q/content/update-13-17-02-2012
  • Survey
Usage and future maintenance of deployed software
Operations/Platform Deployment Survey
For the surveys we need your feedback untill 20st of March via Email

Round the sites

NGI-DE
  • BMRZ-FRANKFURT (Uni Frankfurt)
  • DESY-HH
  • DESY-ZN
  • FZJuelich
  • Goegrid
  • GSI
  • ITWM (Martin Braun)
 - 09.03 -> ITWM will not be able to attend the phone conference on 9/3/12. There is nothing special to report.
 - 16/3/12 ntr
  • KIT (GridKa, FZK-LCG2, Dimitri Nilsen, Tobias Koenig)
 - 9.03 -> Announcement: downtime 13.03 - 15.03, full site maintainance
 - 16/3/12 downtime succeeded with two hours delay. During the downtime one central router was replaced by a new model, also we did 
some dCache updates. disk firmware and BIOS upgrades, Tape TSM update, complete cluster was reinstalled, changes of the central 
power supply, some LFC and 3D-DB were migrated to Oracle version 11g
  • KIT (Uni Karlsruhe)
  • LRZ
  • MPI-K
  • MPPMU (Cesare Delle Fratte)
 - ntr
 - a few problems with CREAMS and ATLAS jobs. Problems are solved now.
  • RWTH Aachen
  • SCAI (Andre Gemuend)
 - Problems with SE: DPM daemon died. We will update EMI-DPM
 - ROD also filed two tickets (one concerning the DPM daemon problem and one concerning the related CREAM CE) instead of only one 
for the dpm daemon: 
  • Uni Bonn
  • Uni Dortmund
  • Uni Dresden (Ralph Mueller Pfeeferkorn)
 - 16/3/12 Last week we had a two days downtime. We updated CREAM, Apel to the EMI release, dCache was updated to version 1.9.12 
and dCache update included the upgrade from SL4 to SL5. Now all seems to be fine now 
 - Problems with EMI release: After the update sometimes the Nagios test fails, with error message "job could not be submitted" or 
something like that. Comment by KIT: We also tested the BDII in EMI in preproduction. It was not described in the documentation 
that you need 3GB RAM.
  • Uni Freiburg
  • Uni Mainz-Maigrid
  • Uni Siegen
  • Uni Wuppertal
SwiNG
  • CSCS
  • PSI
  • Switch

Note: please update your entry at https://wiki.egi.eu/wiki/NGI_DE:Sites if needed.

Status ROD

welcome LRZ Daniel Waldmann

09	 27.02	 04.03	 Team5, LRZ	
10	 05.03	 11.03	 Team6, CSCS/NGI_CH	
11	 12.03	 18.03	 Team1, DESY	
12	 19.03	 25.03	 Team2, FhG
  • Nagios<-->Dashboard issue
(Nagios send information/notification to the dashboard and the information/notification is not displayed correctly at the dashboard. 
In consequence the tickets hang in the dashboard longer than the problem persists)
 https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=2039
 https://helpdesk.ngi-de.eu/?mode=ticket_info&ticket_id=2038

AOB

 - Reminder: Operation workshop in April: We hope many sites participate. You can register now. See announcement section above.
 - short Telco meeting before the workshop at 13. April

If you have additional topics to be discussed during the meeting, please submit them in advance via our email list email list.