Difference between revisions of "TSA2.5 Deployed Middleware Support Unit"

From EGIWiki
Jump to: navigation, search
(Ticket handling procedure)
(Redirected page to EGI DMSU)
 
(42 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== TPM cheat sheet ==
+
#REDIRECT[[EGI DMSU]].
 
 
{| border="1" style="text-align:center" class="sortable" class="wikitable"
 
|+ TPM DMSU Cheat Sheet
 
|-
 
!width="100" align="left" |Comp\Flavor !!width="75"|CESNET !! width="75"|JUELICH !! width="75"|INFN !! width="75"|BADW-LRZ!! width="75"|NDGF
 
|-
 
!align="left"| ARC
 
|  ||  ||  ||  || *
 
|-
 
!align="left"| BDII
 
|  ||  || * ||  || 
 
|-
 
!align="left"| CREAM
 
|  ||  || * ||  || 
 
|-
 
!align="left"| dCache
 
|  ||  ||  ||  || *
 
|-
 
!align="left"| DGAS
 
|  ||  || * ||  || 
 
|-
 
!align="left"| DPM
 
| * ||  ||  ||  || 
 
|-
 
!align="left"| FTS
 
|  ||  || * ||  || 
 
|-
 
!align="left"| GLOBUS
 
|  ||  ||  || * || 
 
|-
 
!align="left"| Gridsite
 
| *  ||  ||  ||  || 
 
|-
 
!align="left"| LB
 
| * ||  ||  ||  || 
 
|-
 
!align="left"| LFC
 
|  ||  || * ||  || 
 
|-
 
!align="left"| MyProxy
 
| * ||  ||  ||  || 
 
|-
 
!align="left"| SGAS
 
|  ||  ||  ||  || *
 
|-
 
!align="left"| STORM
 
|  ||  || * ||  || 
 
|-
 
!align="left"| UNICORE
 
|  || * ||  ||  || 
 
|-
 
!align="left"| VOMS
 
|  * ||  || * ||  || 
 
|-
 
!align="left"| WMS
 
| * ||  || * ||  || 
 
|}
 
 
 
== People working for DMSU  ==
 
 
 
The DMSU people are categorized into two groups:
 
 
 
*Assigners, i.e. the prime technical contact person pr. partner and also the one to conduct the initial assignment of tickets following the TPM assignment
 
*Resolvers, extra experts at each partners to which the assigner can delegate tasks when needed either due to technical reasons or due to overload of one of the other staff members.
 
 
 
The lost of people, with a short list of expertise areas relevant for DMSU follow here:
 
 
 
=== CESNET  ===
 
 
 
Assigner: Aleš Křenek, ljocha at ics dot muni dot cz<br> EGI SSO login: ljocha<br> Expertise: LB, DPM
 
 
 
Resolvers:
 
 
 
*Zdeněk Salvet, salvet at ics dot muni dot cz (LB, WMS, MyProxy)
 
*Zdeněk Šustr, sustr4 at cesnet dot cz (LB, WMS)
 
*Daniel Kouřil, kouril at ics dot muni dot cz (Gridsite, MyProxy, VOMS and security in general)
 
 
 
=== JUELICH  ===
 
 
 
''Assigners:'' Rebecca Breu (rbreu), Mathilde Romberg (romberg)
 
 
 
''Expertise:'' UNICORE
 
 
 
''Resolvers:''
 
 
 
*Jason Milad Daivandy
 
 
 
=== INFN  ===
 
 
 
Assigner: Alessandro Paolini (alessandro.paolini at cnaf.infn.it)<br> EGI SSO login: apaolini<br> Expertise: CREAM, LFC, VOMS
 
 
 
Resolvers:
 
 
 
*Andrea Cristofori (andrea.cristofori at cnaf.infn.it)(DGAS,WMS)
 
*David Rebatto (david.rebatto at mi.infn.it)(CREAM)
 
*Sara Bertocco (sara.bertocco at pd.infn.it) (CREAM, WMS)
 
*Sergio Traldi (sergio.traldi at pd.infn.it)(CREAM, WMS)
 
*Elisabetta Ronchieri (elisabetta.ronchieri at cnaf.infn.it) (STORM)
 
*Stefano Dal Pra (stefano.dalpra at cnaf.infn.it)(STORM)
 
*Paolo Veronesi (paolo.veronesi at cnaf.infn.it) (BDII, FTS)
 
 
 
=== BADW-LRZ  ===
 
 
 
Assigner: Emmanouil Paisios<br> EGI SSO login: epaisios<br> Expertise: Globus
 
 
 
Resolvers:
 
 
 
*None (Emmanouil is responsible for everything here)
 
 
 
=== NDGF  ===
 
 
 
Assigner: Jens Larsson<br> EGI SSO login: jens<br> Expertise: ARC, dCache
 
 
 
Resolvers:
 
 
 
*Gerd Behrmann (dCache)
 
*Daniel Johansson (ARC)
 
*Henrik Thostrup (SGAS, ARC)
 
 
 
== Ticket handling procedure ==
 
 
 
The first mandatory step of DMSU work on a ticket is understanding
 
what is the reason of the reported problem.
 
The outcome of the analysis is '''documented with the ticket''',
 
preferably as a response to the user.
 
 
 
The analysis may or may not include thorough reproduction of
 
the problem; it is left to common sense.
 
 
 
During the analysis DMSU also assesses the priority of the ticket
 
(see bellow) and adjusts ''Type of problem'' and ''Ticket category''
 
fields eventually.
 
 
 
DMSU expertise should cover most tickets.
 
For remaining tough issues
 
developers (i.e. the 3rd line support) can be involved.
 
However, the control on the ticket is still kept
 
within DMSU, i.e. the  ticket is '''not reassigned''' to another support unit.
 
 
 
If solution of the problem does not induce changes in code, documentation,
 
default configuration etc., i.e. release of anything by the technology
 
provider, '''DMSU closes the ticket'''.
 
 
 
Otherwise, the ticket is '''reassigned''' to the appropriate 3rd line support
 
unit.
 
 
 
== Ticket priorities ==
 
 
 
== Followup of tickets with 3rd line support units ==
 
 
 
== DMSU shifts  ==
 
 
 
The main purpose of DMSU shift is no surprise: keep the things running, not to leave an important issue without fast reaction etc.
 
 
 
The shifts are held by groups of people with expertise on different middleware stacks. However, due to the prevailing gLite-related traffic in DMSU only gLite shifts are formally organized currently, the other stacks are handled on the best effort basis.
 
 
 
The specific duties of the person on shift are:
 
 
 
*to follow incoming emails from GGUS, being able to react within approx. 2 hours in normal working hours
 
*to identify "top priority" and "very urgent" issues, not only by the priority set by the submitter but also by using common sense, and to make sure an appropriate expert starts looking into the issue; this includes assigning the ticket to a specific person
 
*to keep checking that there is reasonable response time, namely as a reaction to further submitter's correspondence; it should be almost immediate on "top priority", and we can probably afford upto 1 week for "less urgent"
 
 
 
One person holds the shift for one week, the duty is passed to the other on Monday afternoon.
 
 
 
=== Shift schedule  ===
 
 
 
{| style="width: 173px; height: 119px;"
 
|-
 
| Oct 10
 
| Aleš Křenek
 
|-
 
| Oct 17
 
| Alessandro Paolini
 
|-
 
| Oct 24
 
| Zdeněk Salvet
 
|-
 
| Oct 31
 
| Alessandro Paolini
 
|-
 
| Nov 7
 
| Aleš Křenek
 
|-
 
| Nov 14
 
| Sergio Traldi
 
|-
 
| Nov 21
 
| Zdeněk Salvet
 
|-
 
| Nov 28
 
| Sara Bertocco
 
|-
 
| Dec 5
 
| Zdeněk Salvet
 
|-
 
| Dec 12
 
| INFN
 
|-
 
| Dec 19
 
| Aleš Křenek
 
|-
 
| Dec 26
 
| best effort
 
|-
 
| Jan 2
 
| Aleš Křenek
 
|-
 
| Jan 9
 
| Alessandro Paolini
 
|-
 
| Jan 16
 
| Zdeněk Salvet
 
|-
 
|}
 
 
 
== DMSU Digests ==
 
 
 
Brief description and indexing of issues solved within DMSU that are likely to have broader impact on EGI Operations.
 
 
 
Maintained on separate page [[Middleware_issues_and_solutions]]
 
 
 
== Operations Documentation ==
 
 
 
DMSU contributes to maintenance of EGI [[Operations_Manuals]], in particular
 
 
 
* [[MAN05]] BDII high-availability
 
* [[WMS_best_practices]]
 
* [[VOMS_Replication]]
 
 
 
 
 
== Systems available for DMSU ==
 
 
 
In order to debug issues and design workaround availability to some systems is needed. Below is a list of systems available for the DMSU staff pr partner.
 
 
 
=== CESNET ===
 
 
 
'''prague_cesnet_lcg2'''
 
 
 
* # nodes/cores: 20/80
 
* OS: SL 5.2
 
* Batch system: PBSPro
 
* Grid m/w: gLite 3.1
 
* EA Site: Y
 
 
 
https://goc.gridops.org/site/list?id=48
 
 
 
'''{floi1,floi2}.egee.cesnet.cz'''
 
 
 
Virtual machines for experimental services, can be scratched and reinstalled
 
as required.
 
 
 
* # nodes/cores: 2/2
 
* OS: SL 5.3 x86_64
 
* Batch system: N/A
 
* Scheduler: N/A
 
* Grid m/w and flavour: LB 2.0 (of gLite 3.2)
 
* EA Site: N
 
 
 
=== JUELICH ===
 
 
 
=== INFN ===
 
 
 
# nodes/cores: a) 4/8 b) 3/6
 
OS: a) SL4 x86_32 b) SL5 x86_64
 
Batch system: torque/pbs
 
Scheduler: maui
 
Grid m/w and flavour: a) INFNGRID 3.1 (based on gLite 3.1) b) INFNGRID 3.2 (based on gLite 3.2)
 
EA Site: Y (for STORM service) https://goc.gridops.org/site/list?id=95
 
 
 
=== BADW-LRZ ===
 
Linux Cluster
 
    * Nodes/Cores: 938/5532
 
    * OS: SUSE Linux Enterprise Server 10
 
    * Batch System/Scheduler: SGE 6.2
 
    * Globus 4.0.7
 
    * EA Site: N
 
 
 
=== NDGF ===
 
Smokerings
 
 
 
    * Nodes/Cores: 66/528
 
    * CentOS 5.7
 
    * Torque (to be replaced by SLURM)
 
    * MOAB (to be replaced by SLURM)
 
    * ARC 1.1
 
    * EA Site
 
    * https://goc.gridops.org/node/list?id=7055071
 
 
 
== Obsolete stuff ==
 
 
 
Not used anymore but keeping the old links here.
 
 
 
=== Relevant tickets ===
 
 
 
[[Relevant tickets]] passed through DMSU and assigned to other support units are gathered [[Relevant_tickets| here]]
 
 
 
=== Meetings ===
 
 
 
* [[100603 DMSU Kickoff Meeting, Amsterdam]]
 
* [[100608 DMSU Weekly Assigner Meeting]]
 
* [[100615 DMSU Weekly Assigner Meeting]]
 
* [[100622 DMSU Weekly Assigner Meeting]]
 
* [[100817 DMSU Weekly Assigner Meeting]]
 
* [[100824 DMSU Weekly Assigner Meeting]]
 
* [[100831 DMSU Weekly Assigner Meeting]]
 
* [[100907 DMSU Weekly Assigner Meeting]]
 
* [[100921 DMSU Weekly Assigner Meeting]]
 
* [[100928 DMSU Weekly Assigner Meeting]]
 
* [[101005 DMSU Weekly Assigner Meeting]]
 
* [[101026 DMSU Weekly Assigner Meeting]]
 
* [[110125 DMSU Weekly Assigner Meeting]]
 
 
 
[[Category:DMSU]]
 
[[Category:WP5-SA2]]
 

Latest revision as of 21:18, 1 December 2012

Redirect to:

.