Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

GGUS:TPM/How to assign a new GGUS ticket

From EGIWiki
Revision as of 12:27, 2 May 2011 by Wbuehler (talk | contribs) (http://goc.grid.sinica.edu.tw/gocwiki/FTS_transfer_failed)
Jump to navigation Jump to search
GGUS-logo.jpg


GGUS wiki / GGUS Documentation


GGUS-TPM How to assign a new GGUS ticket


Security

Security Incident ticket

This might be a security challenge ticket or a real ticket, the flow is the same

  • Assignment
    • You should assign those tickets to the "Security Management" support unit
  • A note on the flow
    • After it is assigned to the Security management Unit, the appropriate ROC reassigns the ticket to themselves and follow up the problem.

A user is banned or has authentication problems at a site

Access problem of a single user on one or more sites. A user belonging to a recognized VO has problem submitting jobs at one or more sites in the grid.

  • Assignment
    1. If the problem description shows a case of user ban at one or multiple sites
      • the ticket should be assigned to the ROC but the Security Management GGUS unit should be involved in addition, using the "involve others" field. Their e-mail address can be found on https://gus.fzk.de/pages/resp_unit_info.php
    2. If a savannah bug is the problem cause
      • open the savannah bug if necessary
      • Add the exact savannah bug number in "Related issues" and close the GGUS ticket as 'unsolved'.

Replica Management

FTS transfer failed

A file transfer has failed. If not provided you should ask for a verbose ouput of the command that failed. Most of the failures are related to SRM failures.

  • Assignment
    1. Look in the output if the failure message is related to the FTS web service itself
      • There are error message of the type:
        • "Failed to determine the interface version of the service:"
      • Assign the ticket to the ROC responsible for the FTS. Extract the FTS hostname from the endpoint (variable after the -s option) or ask the user which endpoint he used.
    2. Look in the output if the failure message is related to a SRM get call
      • Assign the ticket to the ROC responsible for the site where the source file was.
    3. Look in the output if the failure message is related to a SRM put call
      • Assign the ticket to the ROC responsible for the site where the destination file was.
  • A note on the flow
    • The ticket should stay with the ROC. If it turns out to be a problem with the mass storage system behind, it can go the castor or dcache units.
  • TPM/lcg_cr fails
  • LFC catalogue
  • SRM failed. A buggy version of d-cache/srm/sbin/srm

VO software

As of Feb 2009 due to https://savannah.cern.ch/task/?8846 one may be able to see which applications a VO supports on its VO-ID card, if this gets implemented. For TPMs, we'll try to build a list of VO-Supported packages per LHC Experiment VO.

Alice

  • AliEn, MonaLisa, xrootd, specific services of the experiment software installation.

ATLAS

  • PANDA Athena, CMT, DQ2, AMI, Atlas installation tools

CMS

  • PhEDEx CMSSW SiteDB DBS DataDiscovery ProdAgent CRAB JobRobot

LHCb

  • Gauss Boole Brunel DaVinci ConditionDB SetupProject LHCb CMT SQLlite Dirac-install

Database (3D project)

  • Streams Monitor Error Report

Installation Upgrade Problem

  • A new release is out and a site is not managing to upgrade successfully

vomrs/voms Problem

SAM-related Problems

  • Who is running What (the SAM Service Map)

Other Unspecified Problem

Operations Manual Update Request

  • Short characterization of the ticket
    • Assignment
    • The ticket has a request for a change in the Operations Manual assign the ticket to the COD support unit.


Access problem of a single user on one or more sites

A user belonging to a recognized VO has problem submitting jobs at one or more sites in the grid.

  • Assignment
  1. If the problem description shows a case of user ban at one or multiple sites
    • the ticket should be assigned to the ROC but the Security Management GGUS unit should be involved in addition, using the "involve others" field. Their e-mail address can be found on https://gus.fzk.de/pages/resp_unit_info.php
  2. If a savannah bug is the problem cause
    • open the savannah bug if necessary
    • Add the exact savannah bug number in "Related issues" and close the GGUS ticket as 'unsolved'.

Installed Capacity tickets

In March 2009 LCG will start rolling out a new scheme for publishing information about installed capacity in the information system - meaning things like the total number of CPUs at a site, their ratings using the new HEP benchmark, the total amount of disk space, and the way resources are shared between VOs. This is described in detail the following document:

and also in a talk at the March GDB:

This will be supported by a new version of YAIM and much of it will be automatic, but sysadmins will have to configure some of the information by hand, and hence they may submit tickets if some things are unclear. It's hard to predict in advance what will cause problems, but likely candidates are the move to the new benchmark, dealing with internal scaling of cpu and wallclock times in the batch system, defining the sharing between different VOs, and the definition of installed disk space.

  • Assignment
    • GGUS tickets related to these kind of issues should be assigned to the GLUE support unit.
  • A note on the flow
    • Tickets will probably be open either by site admins or redirected to the GLUE unit if the site admins find that a ticket to report the Installed Capacity has been assigned to them, but they do not know what to do.