Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

VT VAPOR:Progress Report

From EGIWiki
Jump to navigation Jump to search

Progress Report

VT Reporting guidelines

Reporting period: April 7th to 25th 2014 - LAST REPORT

The project was officially terminated on April 15th 2014, as a result this is the last progress report.

The main developer, Flavien Forestier, finished his contract on April 10th. From now on, the project manager, Franck Michel, will perform maintenance and bug fixing, but no more new feature will be developed.

Progress

A bug was found in the Dark data and lost files management. It's been fixed and set in production.

The migration of VAPOR MySQL database from I3S to the MySQL farm of the IN2P3 Computing Center in Lyon has been tested successfully. A discussion is going on as to the possibility to also use an instance of the data integration Lavoisier service at IN2P3 Computing Center, in addition to the one at I3S. This would help gain stability and performance.


Reporting period: April 7th to 25th 2014

The project was officially terminated on April 15th 2014, as a result this is the last progress report.

The main developer, Flavien Forestier, finished his contract on April 10th. From now on, the project manager, Franck Michel, will perform maintenance and bug fixing, but no more new feature will be developed.

Progress

A bug was found in the Dark data and lost files management. It's been fixed and set in production.

The migration of VAPOR MySQL database from I3S to the MySQL farm of the IN2P3 Computing Center in Lyon has been tested successfully. A discussion is going on as to the possibility to also use an instance of the data integration Lavoisier service at IN2P3 Computing Center, in addition to the one at I3S. This would help gain stability and performance.


Reporting period: Mar. 24th to April 4st 2014

Project is due to end on 15th of April. As already mentioned previously, the user community features (users database, user's life cycle management) will not be started by the end of the project this time.

Progress

  • Dark data and lost files handling completed:
    • Development has been completed with improvements of errors management and reporting
    • It is now deployed on the production instance and available for all supported VOs
  • Addition of an home page with links for each supported VO
  • In the JobMonitor tabular view, addition of a new feature to display the errors that occurred for each CE queue during the selected period (is case there was at least one error).

Plans for next period

Start migrating the MySQL database of VAPOR from the I3S instance to a MySQL farm of servers of the IN2P3 Computing Center in Lyon.

Main developer, Flavien Forestier will be finishing his contract on April 10th. Passed this date, the project manager Franck Michel, will perform maintenance and bug fixing, but no more new feature will be developed.

Problems we encounter, but can solve

none.

Problems and issues we need external help with

None.

Reporting period: Mar. 10th to 21st 2014

Project is due to end on 15th of April. As already mentioned previously, the user community features (users database, user's life cycle management) will not be started by the end of the project this time.

Progress

  • Enable support of VOs vo.france-grilles.fr
  • Development of the dark data and lost files handling is still not completed:
    • Continued improvement of errors reporting
    • Discuss bugs in GFAL2 API with developers
    • Complete upgrade to support the gsiftp access protocol instead of srm
  • Apply a list of safety recommendations to the configuration of the web server at I3S

Plans for next period

  • Complete the dark data and lost files management and start using it in production for biomed (Same as in last period)
  • JobMonitor: add the ability to see job errors

Problems we encounter, but can solve

none.

Problems and issues we need external help with

None.

Reporting period: Feb. 24th to Mar. 7th 2014

Project is due to end on 15th of April, marked by the leaving of Flavien Forestier, developer. As already said previously in emails to the partners, the user community features (users database, user's life cycle management) will not be started by the end of the project in mid-April.

Progress

  • Add support of 3 new VOs: compchem, enm.eu, vlemed
  • Continue the development/improvement of the dark data and lost files handling:
    • Improve errors reporting
    • Integrate new version of GFAL2 API and start checking with gsiftp protocol

Plans for next period

  • Complete the dark data and lost files management and start using it in production for biomed.
  • Enable the support of VO vo.france-grille.fr VO.

Problems we encounter, but can solve

none.

Problems and issues we need external help with

None.

Reporting period: Feb. 10th 21st 2014

Progress

  • Continue the development of dark data and lost files handling:
    • Improve fault tolerance and errors reporting, add control of SE check rate not to overload the machine
    • Continue work on GridFTP usage: new version of GFAL2 API to be delivered soon to fix issues.
  • Re-check the results of the JobMonitor against the Nagios alarms to see if JobMonitor does no longer fail where Nagios probes work (follow-up of the issue fixed during the last period).
  • Continue the investigation about high error rates of VAPOR's JobMonitor: several critical issues have been fixed with delivery of 2 versions of JSAGA. Now errors reported by VAPOR seem much more reliable. Some work is on-going about remaining issues.

At last, we managed to get in touch with the right person, Maria Alandes Pradillo from CERN, to help us find out how to build the gsiftp url to access an SE, and when it is applicable or not (mentioned as a problem in the last report).

Plans for next period

  • Complete the dark data and lost files management and start using it in production for biomed.
  • Enable one or 2 other VOs in VAPOR.

Problems we encounter, but can solve

none.

Problems and issues we need external help with

None.

Reporting period: Jan. 27th to Feb. 7th 2014

Progress

  • VO shiwa-workflow.eu now enabled in VAPOR. This second VO helps find a few bugs in the multi-VO management. This should be easier when opening to other VOs.
  • Keep on the development of the handling of dark data and lost files: make a difference between LFC and SE files using SE and LFC file dumps, progress on web reports publication.
  • Continue the investigation about high error rates of VAPOR's JobMonitor against the Nagios alarms as no significant decrease of the errors have been seen follow the fixing of several issues during the last periods). New issue found:
    • JSAVA input/output files are not transfered properly on some computing elements. Under investigation.

Plans for next period

  • Keep on the development of the handling of dark data and lost files: web reports, GridFTP access.
  • Re-check the results of the JobMonitor against the Nagios alarms to see if JobMonitor does no longer fail where Nagios probes work (follow-up of the issue fixed during the last period).

Problems we encounter, but can solve

We cannot get sufficient information as to how to build the URL that allows to access a Storage Element with gsiftp protocol. This is needed to complete the feature about Dark data and lost files. It is surprisingly difficult to find out what information to use from a top BDII to build this URL.

Action: we have discussions with GFAL2 API developers about this, but they not GridFTP experts. A discussion with some french site admins did not help much. Now going to GGUS: https://ggus.eu/ws/ticket_info.php?ticket=101086

Problems and issues we need external help with

None.

Reporting period: Jan. 13th to 24th 2014

Progress

  • Fixed problem of too long delay in the computing of the white list of CEs.
  • Code cleanness: improvement of comments in the code, clean up of files and properties.
  • Started the development of the next Data Management feature: handling of dark data and lost files: develop a dump file script i.e. recursively list the content of an SE using the GFAL2-python API.
  • Fixed a long lasting issue of excessive failure rate on JobMonitor: several issues were found:
    • Wrong synchronisation of the machine due to ntp port blocked by firewall.
    • GridFTP ports blocked by firewall.
    • JSAGA generated proxy certificate issue.
    • JAVA 7 incompatibility: on version until Jdk7u5 will manage to generate a proxy certificate.

Plans for next period

  • Keep on the development of the handling of dark data and lost files: make a diff between LFC and SE files using SE and LFC file dumps.
  • Re-check the results of the JobMonitor against the Nagios alarms to see if JobMonitor does no longer fail where Nagios probes work (follow-up of the issue fixed during the last period).

Problems we encounter, but can solve

None.

Problems and issues we need external help with

None.

Reporting period: Dec. 9th 2013 to Jan. 10th 2014

Progress

During this long period, rather few has happened due to the Christmas vacation (main developer was on leave from Dec. 9th to Jan 2nd).

Nevertheless, a first version of VAPOR has been finalized and set on line on Dec. 22nd 2013. It supports the biomed VO, and biomed technical support team is now encouraged to use it. The web application is hosted as the Computing Center in Lyon (IN2P3), while the data collecting services are hosted at the production server in I3S in Sophia Antipolis.

Other tasks performed:

  • new release of the JobMonitor deployed. It fixes flaws and simplifies the configuration.
  • new release of the Lavoisier data integration service deployed. Comes with several important optimizations and bug fixes.
  • Complete the integration of the Running Ratio feature: initially it required the data to be located on the same server as the webapp, now data is exploited remotely.
  • Misc. other bug fixes and improvements.
  • Important rework of the VAPOR Install and Configuration guide.

Plans for next period

  • Investigate a problem of too long delay in the computing of the white list of CEs (2 to 3 minutes depending on parameters).
  • Code cleanness: improve insufficient comments in the code, clean up unneeded files and properties.
  • Start the development of the next Data Management feature: handling of dark data and lost files.

Problems we encounter, but can solve

None.

Problems and issues we need external help with

None.

Reporting period: Nov. 26th to Dec. 9th

Progress

  • All along this period an important work has consisted in the setting up of a production-ready set of VAPOR services, that includes: fine tune logs configuration, configuration of log rotates, firewall configuration, set up of a backup procedure of the VAPOR VM
  • Bug fixing on the scan of full storage elements (sort files of expired/suspended users)
  • Investigation and fixing of a problem of high CPU consumption of the JobMonitorng tool
  • 2 days face-to-face meeting with Operations Portal team (2 and 3 of December), integration of VAPOR with the Operations Portal at the CC IN2P3 (Lyon, France)
    • Decision made on where the VAPOR data production services should be hosted, i.e. at I3S or at CC IN2P3
    • Installation of the VAPOR webapp on the web server of the CC IN2P3 and link from the Operations Portal. Now accessible for VO biomed only from: https://operations-portal.egi.eu/vapor?vo=biomed
    • Address web design and graphical homogeneity issues
    • Start migration to Lavoisier 2.1 data integration service (used as third party tool within VAPOR)

Plans for next period

Very few should now happen until the end of the month as the main developer will be on leave from Dec. 9th to 31st.

Problems we encounter, but can solve

None.

Problems and issues we need external help with

None.

Reporting period: Nov. 12th to Nov. 25th 2013

Progress

  • Complete developments on the scan of storage elements (sort files of expired/suspended users)
  • In preparation of the 2 days meeting with Operations Portal team (2 and 3 of December),
    • Configure all VAPOR services for the biomed VO on the dedicated VM deployed recently: this helped fix issues in some of the tools to make them VO independent (some were initially developed for biomed and thus were specific for it): distinguish proxy certificate files, separate log files etc.
    • Refactoring of some configuration files to make them more intuitive, and update documentation appropriately.

Plans for next period

  • Complete the deployment of all current VAPOR service for the VO biomed.
  • 2 days meeting with Operations Portal team (2 and 3 of December), integration of VAPOR with the Operations Portal

Problems we encounter, but can solve

None.

Problems and issues we need external help with

None.

Reporting period: Oct. 30th to Nov. 11th 2013

Progress

  • Continued the developments of VO Data Management features: period mostly dedicated to the scan of Storage Elements based on the file catalog
    • continued the customization and integration of the tool used to scan storage elements filling up.
    • development of web pages to display the reports
  • Continue investigation of the use of GFAL2 python and GFAL FS.

Plans for next period

  • Still a few developments to do on the scan of storage elements (sort files of expired/suspended users)
  • In preparation of the 2 days meeting with Operations Portal team (2 and 3 of December), configure VAPOR services on the dedicated VM deployed recently.

Problems we encounter, but can solve

None.

Problems and issues we need external help with

None.

Reporting period: Oct. 16th to 29th 2013

Progress

  • Overall review of the consistency in terms of GUI, styles etc. of the features related to Resource status indicators and statistical reports.
  • Fixing of few minor issues
  • Deployment of a virtual machine with SL64 and EMI3 UI planned to host monitoring services of VAPOR.
  • Developments: VO Data Management features:
    • Started customization and integration of a tool used to scan storage elements filling up.
    • On-going discussion on the use of GFAL2 python and GFAL FS, together with GFAL2 team and VAPOR's partner CNRS Creatis.
    • Start development of SE consistency checking (dark data, lost files) using the GFAL2 API.

Plans for next period

  • Continue the developments of VO Data Management features.
  • Continue deployment of the virtual machine dedicated to host monitoring services of VAPOR.

Problems we encounter, but can solve

None.

Problems and issues we need external help with

None.

Reporting period: Oct. 2nd to 15th 2013

Progress

  • Developments: completed most if the remaining features related to Resource status indicators and statistical reports:
    • report of all resource supporting the VO,
    • report resources not in proper production,
    • report computing elements with the 444444 issue (number of jobs waiting or running is 444444),
    • report storage elements which publish negative space values.

in addition to look and feel and ergonomic improvements (tooltips, styles, use cookies to remember columns selected by user etc.).

  • Investigation about the GFAL2 API to address VO Data Management features.
  • Started writing the application architecture description.

Plans for next period

A few minor issues need to be fixed on pages described above, along with an overall review of the consistency in terms of GUI, styles etc. Developments: start development of the VO Data Management features: dealing with storage elements filling up.

Problems we encounter, but can solve

None.

Problems and issues we need external help with

None.

Reporting period: Sep. 18th to Oct. 1st 2013

Progress

  • Face to face meeting with the EGI Operation Portal team at the TF13 in Madrid:
    • Demonstration of VAPOR current status
    • Decision to organize a 2-days integration session by end of November/beginning of December to make VAPOR's current status publicly accessible.
  • Developments:
    • Added 2 new web pages to show (1) the current status of resources supporting the VO (CE, SE, WMS, VOMS, LFC), and (2) the list of resources which status is currently not in production (consolidate BDII/GOCDB statuses)
    • Improvements in the Lavoisier VO-views to consolidate data from the BDII and GOCDB.

Plans for next period

  • Complete the new web pages of the current status of resources supporting the VO and the list of resources which status is currently not in production.
  • Enrich look & feel with graphical icons for instance to mention tooltips
  • Start writing application architecture description

Problems we encounter, but can solve

The lack of experience of the main developer is being handled and things are improving. Continuous support is being provided.

Problems and issues we need external help with

None.

Reporting period: Sept. 4th to 17th 2013

Progress

  • Continue the customization of the JobMonitor tool by partner IPHC to support a multi-vo environment. Several phone conferences with the developer.
  • Global actions on the web application:
    • Continued to improve the ergonomy and look and feel of web pages: enrich helps, add customizable tooltips.
    • Tests have revealed weaknesses in the application development: an important action has consisted in improving the safety of the application and adding much more log traces to be able to follow up on the application life.
  • VO Operations, reports of availability of resources:
    • Added some reports in the web pages to show the "running ratio" R(R/W)
    • Almost completed the Lavoisier VO-views to consolidate data from the BDII and GOCDB.
  • VO data management: continuation of the study on VO Data management, very rich discussion with developers of GFAL2 in GGUS ticket #97076

Plans for next period

  • Face to face meeting with the developers of the EGI Operation Portal to discuss future steps.
  • Add 2 new web pages to show (1) the current status of resources supporting the VO (CE, SE, WMS, VOMS, LFC), and (2) the list of resources which status is currently not in production (consolidate BDII/GOCDB statuses)

Problems we encounter, but can solve

The lack of experience of the main developer is being handled and things are improving, although the code production and quality remain lower than expected. Action: heavy support is being provided.

Problems and issues we need external help with

None.

Reporting period: Aug. 21st - Sept. 3rd

Progress

  • Follow up of the improvement of the ergonomy and look and feel of web pages.
  • VO Operations, reports of availability of resources:
    • completed the web page and code dedicated to the production of white list of CEs based on job monitoring reports.
    • Writing of VO-views to consolidate the data from the BDII and GOCDB, using the Lavoisier data integration service. Numerous interactions with the developers of the tool.
  • VO data management: continuation of the study on VO Data management (dark data in particular), discussions with sites admins (FR, UK) to refine the procedures

Plans for next period

  • VO Operations, reports of availability of resources: still some reports to add in the web pages to show the "running ratio" R(R/W)

Problems we encounter, but can solve

  • The lack of experience of the main developer is being handled and things are imporving, altough the code production remains slower than expected. Action: heavy support is being provided to him.
  • Complexity of the Lavoisier data integration service, but the development team is very helpful and reactive.

Problems and issues we need external help with

None.

Reporting period: Aug. 7th to 20th

Progress

  • Work with partner IPHC to upgrade tool JobMonitor to support a multi-vo environment. Several phone conferences.
  • Improvements of the ergonomy of web pages.
  • Improve the management of misc. errors.
  • VO Operations, reports of availability of resources:
    • development of new web pages to report running and waiting jobs, and ratio R(R/W).
  • VO data management:
    • testing of scripts to detect and clean up VO dark data
    • discussions held with sites admins (FR IPHC, UK QMUL) in order to refine the procedure to deal with dark data and lost files. Study of good practices from HEP VOs.

Plans for next period

  • VO Operations management features (resources availability): add new reports on ratio R(R/W), complete the production of the white list of CEs.
  • Continue study on VO Data management (dark data).

Problems we encounter, but can solve

Lack of experience of the main developer in terms of development good practices and application design. This has slown down the activity but this is handled.

Problems and issues we need external help with

None.

Reporting period: July 24th - Aug. 6th

Progress

  • This period was used to make a strong focus on code quality including code review, cleaning up of code, and the first commit of the current version after code and environment cleaning up.
  • Documentation: write a document to describe the application environment and installation procedure.

Development:

  • VO Operations management features
    • resources availability: continue development of reports on running and waiting jobs, and ratio R(R/W).
    • VO data management: start development of scripts to detect and clean up VO dark data.
  • Focus on styles and look and feel of web pages

Plans for next period

Complete VO Operations management features (resources availability): reports on running and waiting jobs, and ratio R(R/W); white list of CEs Continue work on VO Data management (dark data).

Problems we encounter, but can solve

Lack of experience of the main developer in terms of development good practices and application design. This has slown down the activity but this is handled.

Problems and issues we need external help with

None.

Reporting period: Jun. 26th - Jul. 23rd

Progress

Low activity due to summer vacation period: 2 weeks vacation for Flavien Forestier (developer) and 3.5 weeks for Franck Michel (project manager) => 10 days work for Flavien and 2 days for Franck.

  • VO Operations management features (resources availability) :
    • continue development of web pages to view results of the Job Monitor tool of IPHC partner: chart for the history report, table view by computing element.
    • development of web page to view results of the CE monitor: view of number of running, waiting jobs and ration R(R/W) (chart)
    • discuss the parameters of the white list of CEs with IPHC.
  • Deployment of the dev and test environment on a virtual machine.

Plans for next period

  • Refine existing with better ergonomy and presentation
  • Focus on styles and look and feel of web pages
  • Display a white list of CEs

Problems we encounter, but can solve

None.

Problems and issues we need external help with

None.

Reporting period: June 19th to 25th

Exceptionally this is a one-week report as I'll be on vacation starting end of this week.

Progress

  • Get skills about using the Twitter Bootstrap software, about Ajax technologies, and javascript graphical librairies (Dygraph)
  • Continue the development of web pages for the VO Operations management features about resources availability: chart for the results of the job monitor of IPHC (number of jobs ok, ko, timed out).
  • Further install and configuration of the virtual machine deployed last week to host developments and tests of VAPOR.

Plans for next period

Very few will be done in the next period due to summer vacations of Franck Michel (back 22 July), and Flavien Forestier (VAPOR developer, back 15 July).

Problems we encounter, but can solve

none.

Problems and issues we need external help with

none.

Reporting period: June 5th to 18th

Progress

  • Work with partner IPHC to upgrade tool JobMonitor to support a multi-vo environment. Several phone conferences.
  • 2 days face-to-face meeting in Lyon (France) between I3S and EGI Operations Portal and VO Operations Dashboard developers team: https://indico.egi.eu/indico/conferenceDisplay.py?confId=1721. Goal: technical discussions on the way to integrate VAPOR developments into the existing VO Operations Dashboard.
  • Also in Lyon, discussion with biomed LFC manager about Data Management procedures to set up.
  • Continued technical phone conferences with partner CNRS IPHC: the existing job monitor tool is being customized to be more general (initially dedicated to biomed).
  • Prototyping of first web pages for the VO Operations management features about resources availability, using both the job monitor of IPHC and the data integrator web service from EGI Operations Portal (Lavoisier).
  • Deployment of a virtual machine at I3S to host developments and tests of VAPOR.

Plans for next period

  • Keep on developing the VO Operations management features: continue integration of job monitor along with appropriate web pages in tabular and chart formats.
  • Start development of web pages to report evolution of running and waiting jobs in the VO in grapical charts
  • Start using technologies such as Twitter Bootstrap and Ajax to make a good-looking, user friendly and reactive web interface.

Problems we encounter, but can solve

The work on the VO data management procedures has been started in Lyon with discussions with LFC manager. Further work will be postponed later during the summer.

Problems and issues we need external help with

Reporting period: May 22nd - June 5th

Progress

  • Decision made on the priorities of the developments, following the discussion with partner VOs about the Functional specification of VAPOR features :
    1. VO Operations management > Report GOCDB and BDII status and Monitor resources availability.
    2. VO Operations management > VO Data Management procedures.
    3. Users database implementation.
  • Developments started on point 1: 2 conference calls held during the period with partner CNRS IPHC that develops a tool to monitor CEs.
  • 6 days (2x3 days) of training courses for Franck Michel during this period.

Plans for next period

  • 2 days face to face meeting with VO Operations Dashboard developers team in Lyon (France), to bootstrap developments within a common environment.
  • Will try to organise calls with VO AUGER who shoed interest for VAPOR.
  • Find volunteer among partners to start working on the possible procedures that can be envisaged about data management (priority 2).

Problems we encounter, but can solve

Difficulty to find someone among partners to start working on the VO data management procedures. Call to be organised with partners CNRS Creatis and CNRS IPHC.

Problems and issues we need external help with

None.

Reporting period: May 8th to 21st

Progress

  • 4 national holidays in the last two weeks explain a rather light advance in this period.
  • Work continued on the Functional specification of VAPOR features : restructuring, additionals, more in depth details, additional related material.
  • Conference held with France Grille VO to get their opinion on the features of VAPOR and the possibility that they use it in the future. Meetings list updated: https://indico.egi.eu/indico/categoryDisplay.py?categId=100
  • Conference with AMC can't be done for now due to constraints of AMC. Will be rescheduled in July.
  • Face to face meeting scheduled on 5 and 6 of June with VO Operations Dashboard developers team in Lyon (France): the idea is to bootstrap joint developments.
  • Fix and improve existing tools to be integrated into VAPOR about GOCDB and BDII status report.
  • Self training of Flavien Forestier continued regarding grid technologies, Symphony2 framework and related development technologies.

Plans for next period

  • Will try to organise calls with 2 other VOs that showed interest in VAPOR: AUGER
  • Flavien Forestier to continue self training on dev technos, and acquire strong in-depth knowledge about functional features and technical solutions.
  • Conatct members of the project to investigate possible technical solutions on data management.
  • Contact biomed members about the tool used to perform CE monitoring tools.

Problems we encounter, but can solve

None.

Problems and issues we need external help with

None.

Reporting period: Apr. 22nd - May 7th

Progress

Plans for next period

  • Meetings with partners to continue: call scheduled with France Grille VO. Call with AMC to be scheduled.
  • Flavien Forestier to start self training on Symphony2 framework.

Problems we encounter, but can solve

None.

Problems and issues we need external help with

None.