Difference between revisions of "NGI CZ:DDM for auger"

From EGIWiki
Jump to: navigation, search
(Operational issues)
(Distributed Data Management for VO auger)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= Distributed Data Management for VO auger  =
 
= Distributed Data Management for VO auger  =
  
 +
== Bulk production using DIRAC (since 2016) ==
 +
Files are registered in DFC. We try to minimize the number of SEs used for a long term storage to improve reliability. Some productions also may be copied to iRODS in CC IN2P3.
 +
 +
== Bulk productions until 2015 ==
 
Files created at grid are stored at some Storage Element (SE) and registered at LFC. The production system also registers if the simulation job finished OK and that results are available. When files are consolidated to CC IN2P3, they are first copied to SRM at Lyon and registered in LFC under different name and then copied to SRB and registered in SimDB. The system should be revisited in 2014.  
 
Files created at grid are stored at some Storage Element (SE) and registered at LFC. The production system also registers if the simulation job finished OK and that results are available. When files are consolidated to CC IN2P3, they are first copied to SRM at Lyon and registered in LFC under different name and then copied to SRB and registered in SimDB. The system should be revisited in 2014.  
  
 
== Operational issues  ==
 
== Operational issues  ==
  
* 201607 There were 16 files declared as lost at praguelcg2 site (golias100.farm.particle.cz). These files were found missing during a check of consistency of DPM DB with the content on disk servers.
+
*201705 - [https://elog.grid.cesnet.cz/Auger/ ELOG]is available for Auger Distributed Computing Issues
* 201512 Reported loss of 467 files at KIT (gridka-dCache.fzk.de). 286 files have another replica, 181 files were unique. Those unique 181 files were deleted from LFC.  
+
*201607 There were 16 files declared as lost at praguelcg2 site (golias100.farm.particle.cz). These files were found missing during a check of consistency of DPM DB with the content on disk servers.  
* 201407 [[NGI CZ:WuppertalEnd|End of support at Wuppertal]] site: all file at grid-se.physik.uni-wuppertal.de must be deleted
+
*201512 Reported loss of 467 files at KIT (gridka-dCache.fzk.de). 286 files have another replica, 181 files were unique. Those unique 181 files were deleted from LFC.  
 +
*201407 [[NGI CZ:WuppertalEnd|End of support at Wuppertal]] site: all file at grid-se.physik.uni-wuppertal.de must be deleted
 +
 
 +
== Bulk copy tools ==
 +
Tools to copy files registered in LFC via FTS are documented in [https://github.com/piratte/auger-FTS-utils github].
 +
 
 +
Tools to copy/delete files registered in DFC are part of the DIRAC.
 +
 
 +
== Bulk deletion tools ==
 +
 
 +
=== delInLFC.py ===
 +
A python script to be used for bulk deletions in the LFC. Typically we use it when a site reports loss of files or when a site is decommissioned. It takes as an argument either an SE name (if all files from a given SE should be deleted) or a filename of a file, which contains list of files to be deleted. It also supports a dry run, which does not delete anything and only reports what would be deleted. The script deletes almost 15 files per second (performance obtained from a deletion of 3M files from lfc1.egee.cesnet.cz).
 +
 
 +
Examples of usage:
 +
<pre>nohup /usr/bin/time python /home/chudoba/lfc/delInLFC.py -f files_to_be_deleted.txt > lfc_deletion.`date +"%Y%m%d%H%M"`.log 2>&1 &</pre>
 +
 
 +
The format of the file files_to_be_deleted.txt:
 +
<pre>srm://se-cafpegrid.ugr.es/dpm/ugr.es/home/auger//grid/auger/prod/Photon_gr171/en17.500/th60.70/076701/DAT767016.tar.gz
 +
srm://se-cafpegrid.ugr.es/dpm/ugr.es/home/auger//grid/auger/prod/Photon_gr171/en17.500/th60.70/076712/DAT767122.tar.gz</pre>
 +
 
 +
Note: I did a deletion with an auger production role. Several files were not deleted because they were written with sgm role. So a second run of the deletion script must be run with sgm role. Example to get a list of files which were not deleted:
 +
 
 +
<pre>grep ERR lfc_deletion.201605231817.log > lfc1_deletion_list2
 +
sed -i 's#ERR.*srm#srm#' lfc1_deletion_list2</pre>
  
 +
Note: I have not covered other sources of errors, because there were none.
  
 +
Example how to produce a list of files to be deleted based on the SE name:
 +
<pre>/usr/bin/time python /home/chudoba/lfc/delInLFC.py -d -s grid-se.physik.uni-wuppertal.de > lfc1_deletion_wupp.`date +"%Y%m%d%H%M"`.log 2>&1</pre>
  
  
  
 
--[[User:Chudoba|Chudoba]] 08:51, 4 August 2016 (CEST)
 
--[[User:Chudoba|Chudoba]] 08:51, 4 August 2016 (CEST)

Latest revision as of 09:10, 24 May 2017

Distributed Data Management for VO auger

Bulk production using DIRAC (since 2016)

Files are registered in DFC. We try to minimize the number of SEs used for a long term storage to improve reliability. Some productions also may be copied to iRODS in CC IN2P3.

Bulk productions until 2015

Files created at grid are stored at some Storage Element (SE) and registered at LFC. The production system also registers if the simulation job finished OK and that results are available. When files are consolidated to CC IN2P3, they are first copied to SRM at Lyon and registered in LFC under different name and then copied to SRB and registered in SimDB. The system should be revisited in 2014.

Operational issues

  • 201705 - ELOGis available for Auger Distributed Computing Issues
  • 201607 There were 16 files declared as lost at praguelcg2 site (golias100.farm.particle.cz). These files were found missing during a check of consistency of DPM DB with the content on disk servers.
  • 201512 Reported loss of 467 files at KIT (gridka-dCache.fzk.de). 286 files have another replica, 181 files were unique. Those unique 181 files were deleted from LFC.
  • 201407 End of support at Wuppertal site: all file at grid-se.physik.uni-wuppertal.de must be deleted

Bulk copy tools

Tools to copy files registered in LFC via FTS are documented in github.

Tools to copy/delete files registered in DFC are part of the DIRAC.

Bulk deletion tools

delInLFC.py

A python script to be used for bulk deletions in the LFC. Typically we use it when a site reports loss of files or when a site is decommissioned. It takes as an argument either an SE name (if all files from a given SE should be deleted) or a filename of a file, which contains list of files to be deleted. It also supports a dry run, which does not delete anything and only reports what would be deleted. The script deletes almost 15 files per second (performance obtained from a deletion of 3M files from lfc1.egee.cesnet.cz).

Examples of usage:

nohup /usr/bin/time python /home/chudoba/lfc/delInLFC.py -f files_to_be_deleted.txt > lfc_deletion.`date +"%Y%m%d%H%M"`.log 2>&1 &

The format of the file files_to_be_deleted.txt:

srm://se-cafpegrid.ugr.es/dpm/ugr.es/home/auger//grid/auger/prod/Photon_gr171/en17.500/th60.70/076701/DAT767016.tar.gz
srm://se-cafpegrid.ugr.es/dpm/ugr.es/home/auger//grid/auger/prod/Photon_gr171/en17.500/th60.70/076712/DAT767122.tar.gz

Note: I did a deletion with an auger production role. Several files were not deleted because they were written with sgm role. So a second run of the deletion script must be run with sgm role. Example to get a list of files which were not deleted:

grep ERR lfc_deletion.201605231817.log > lfc1_deletion_list2
sed -i 's#ERR.*srm#srm#' lfc1_deletion_list2

Note: I have not covered other sources of errors, because there were none.

Example how to produce a list of files to be deleted based on the SE name:

/usr/bin/time python /home/chudoba/lfc/delInLFC.py -d -s grid-se.physik.uni-wuppertal.de > lfc1_deletion_wupp.`date +"%Y%m%d%H%M"`.log 2>&1


--Chudoba 08:51, 4 August 2016 (CEST)