Difference between revisions of "VT Scientific Publications Repository Current Practices"

From EGIWiki
Jump to navigation Jump to search
Line 231: Line 231:


=== XSEDE ===
=== XSEDE ===
* How to acknowledge XSEDE (see article 9 and 9.1) : https://www.xsede.org/usage-policies
 
[[Category:Scientific_Publications_Repository]]
[[Category:Scientific_Publications_Repository]]
;Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)
: XSEDE attempts to collect publications from all Principal Investigators (i.e., project leads) who have allocations on the various resources. XSEDE resources are managed by approximately six Service Providers across the U.S. Any researcher at most U.S.-based research institutions can request an allocation. Research from any scientific and engineering domain is eligible for allocations; the results must be publishable in the open literature (i.e., no proprietary commercial work or confidential government research).
; What information is collected
: At this time, we do not have a formal database for storing publication data, though such a system has been proposed in the past and may happen in the not-to-distant future. The current storage "schema" is a list of references (associated with the allocated project number) in a Microsoft Word document and attached as an appendix to the XSEDE quarterly report.
; Describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure
: We do not typically distinguish between different types of publications in our current collection process, though all types of publications are generally included. Most of the publication collection effort focuses on the publications by the scientific user community. XSEDE *does* however separately collect publications written by XSEDE-supported staff; these staff publications are typically focused on engineering of the infrastructure.
; Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)
: No.
; How do researchers know that they need to provide the information?
: As part of the user responsibilities agreement, which all users must acknowledge and accept in the XSEDE portal, users are instructed to acknowledge XSEDE in relevant publications and report those publications to XSEDE. The most common time for a researcher to report such publications is during the allocation process, when they request a renewal of an allocation. The review panel uses publications (submitted, accepted, in progress, as well as published) as part of their evaluation. The allocation policies also state that project leads should submit a final report at the conclusion of each project (though this is not strictly enforced).; How do researchers provide the information? (the tool they use)
; How do researchers provide the information? (the tool they use)
: The primary tool is submission of a PDF file included as part of their allocation request. Allocation requests are accepted via XSEDE's "POPS" system.
; How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
: That is up to the researcher. XSEDE does provide sample language that can be used to acknowledge XSEDE support at https://www.xsede.org/how-to-acknowledge-xsede
; What are the obligations for researchers? Where are they defined? How do the researchers accept them?
: As noted, all users must accept a user responsibilities agreement (a) when they first get their XSEDE user portal account, and (b) once per year thereafter. The user responsibilities are at: https://www.xsede.org/usage-policies
: In addition, the policies for allocations further describe obligations associated with use of the resources. The policies are described at: https://www.xsede.org/web/guest/allocation-policies
; When do researchers provide the data?
: As noted, they typically provide this when they request renewal allocations. Allocations are usually made for 12-month periods, therefore, they provide this information annually. However, different projects renew at different times throughout the year; XSEDE makes large-scale allocations four times per year, so our primary collection frequency is quarterly.
; Limitations/difficulties of the current process
: The limitations of the current process include: The lack of a formal mechanism for capturing publications at the end of a project (i.e., formally collecting a "Final Report") means that we miss many publications from short-term projects, which includes many smaller projects. In addition, the lack of a robust database system for storing publication data limits the "cleanliness" of the data set (duplications inevitably exist) as well as the amount of reporting an analysis that we can do. A more challenging issue is verifying whether all of the reported publications do, in fact, acknowledge XSEDE or the source of their computational support.
; Success stories of the current process
: XSEDE's long history (dating back to the early days of the NSF Supercomputer Centers program) of collecting this data for renewal allocation requests means that users and reviewers recognize the importance of submitting the information. For continuing projects, we (probably) have a very high compliance rate, though this is hard to verify.
; How many publications are already collected and since when
: XSEDE has access to the lists of publications associated with its reports and TeraGrid reports dating back to 2005. The total number of publications in these documents is approximately 11,000. However, duplicates and other incorrect entries probably reduce that number by a small percentage.
; Information is used to show the need and befit of the German grid infrastructure
: The two current use are (a) for review of allocation requests by the review panel and (b) for XSEDE's reporting purposes, notably to its funding agency, the National Science Foundation.

Revision as of 09:36, 20 July 2012

VT Scientific Publications Repository: VT Scientific Publications Repository Tasks Meetings & Actions



Collect Best Practices

This is a list of questions to document best practices from established processes to collect scientific publications that were possible thanks to the use of e-infrastructures

  • Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)
  • What information is collected
    • describe the schema and collect also values for the enumerations, e.g., list of disciplines, type of publications)
    • describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure
  • Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)
  • How do researchers know that they need to provide the information?
  • How do researchers provide the information? (the tool they use)
  • How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
  • What are the obligations for researchers? Where are they defined? How do the researchers accept them?
  • When do researchers provide the data? (e.g., yearly base, as soon as they publish)
  • Limitations/difficulties of the current process
  • Success stories of the current process
  • How many publications are already collected and since when
  • How the information is used


Best Practices

NGI France

  • Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)

All disciplines

At least one author working in a French laboratory or organism

All infrastructures (production or research infrastructures)

  • What information is collected
    • describe the schema and collect also values for the enumerations, e.g., list of disciplines, type of publications)

The elements are : Publication type, Subject, Title, Author(s) (name, first name), Laboratory with team, address, Abstract, Fulltext language, Production date, Conference or book title, keywords...

    • describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure

the tool used is built to ask for discipline, type of publications and keywords and allows to search for them. There are menus to get the publications with selection criteria and order to present them. A publication may have several disciplines and keywords

  • Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)

NO

  • How do researchers know that they need to provide the information?

We advertise by all means : NGI "France Grilles" web site, "France Grilles" newsletter, letter of the director, e-mails in the lists (labs directors, users, operational and sites teams...), presentations at the grid days... and of course reminders.

  • How do researchers provide the information? (the tool they use)
  1. They have to reference (or put) their publications in a national tool build for all the research community in France - all organisms or universities. The tools is http://hal.archives-ouvertes.fr/
  2. They have to inform a functional email address or G Romier and to give the hal identifier or the title of the publication
  • How do researchers describe the use of the infrastructures in the publication? (provide guidelines)

All possibilities but we ask for an acknowledgement that is advertised on the NGI web site. In the HAL tool, a keyword is also possible to search for the publication.

  • What are the obligations for researchers? Where are they defined? How do the researchers accept them?

There are no obligations concerning the NGI. There are no obligations concerning the organisms or universities except for several.

  • When do researchers provide the data? (e.g., yearly base, as soon as they publish)

It depends on their organisms and context.

  • Limitations/difficulties of the current process

On a voluntary basis

  • Success stories of the current process

about 700 publications collected from HAL to the collection in several months. Several researchers referenced systematically their publications to put them in the collection. It is a major success and we hope to convince others soon!

  • How many publications are already collected and since when

today 9 July 2012 : 240 articles – 485 references - But all existing publications in HAL are currently not in the collection (especially concerning HEP) this is not enough to be representative of the work done on/with the help of the grid in France Grid projects exist since 2001 in France and the collection begins in 2002.

  • How the information is used

used to produce indicators to evaluate the impact of the grid and to search for information about the grid users

NGI Netherlands

Input collected by SC from a conversation with Peter Michielse, Deputy Director Policies and Strategy, BiGGrid (NL)

  • Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)

Researchers that use computing resources in the Netherlands

  • What information is collected (Describe the schema and collect also values for the enumerations, e.g., list of disciplines, type of publications; describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure)

Papers published by the research teams in the previous year

  • Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)

No

  • How do researchers know that they need to provide the information?
  1. Peter sends a general email (to about 40 researchers) around Christmas time, asking them to provide the information. Reminder is sent mid-January.
  2. Additionally, all researchers have to report on the outcomes of their research as per the agreement established when they fill a request to use the computing resources.
  • How do researchers provide the information? (the tool they use)
  1. They reply to the email.
  2. They fill the outcome report online.
  • How do researchers describe the use of the infrastructures in the publication? (provide guidelines)

They don’t in this instance. They have to fill an official resource request form before they use the resources.

  • What are the obligations for researchers? Where are they defined? How do the researchers accept them?

The researchers are required to report on the outcomes of their research, as part of their ‘contract’ with NCF (the Computer Science arm of the Dutch funding agency, NWO)

  • When do researchers provide the data? (e.g., yearly base, as soon as they publish)
  1. Yearly basis
  2. Whenever they report on the outcome of the research
  • Limitations/difficulties of the current process / Success stories of the current process

Peter says this works quite well

  • How many publications are already collected and since when

Difficult to tell - hundreds

  • How the information is used

To demonstrate the usage of grid computing.

NGI Turkey

  • Scope: identify the scope of collecting publications

The collection of the publications is a “measurement technique” that allows to get measurable metrics about the usage scientific results of the e-Infrastructure. NGI-TR aims to collect all scientific products of the users.

  • What information is collected

NGI-TR has been using a two levels mechanism for the collection of the scientific publication: The first level collection is based on the user declaration through a web interface that could be filled by any of the users.

The user having a scientific publication can choose the different web forms for the publication declaration depending on the type of publication.

The publications are classified as “scientific journal”, “conference proceeding”, “thesis” and “other”. The web forms allow to the collection of the following information depending on the publication type that could be submitted by the user.

Scientific Journal: Name of the Journal, Journal ISSN, Authors, Title, Year, Number, Section, Page

Conference Proceeding: Name of the conference, Date and location, Proceeding ISBN, Authors, Title, Number, Section, Page

Thesis: Title, Author, Class (PhD or MS), University, Faculty, Department, Year, Advisor(s) Other: Authors, Title, Year, Publication / presentation / magazine information.

Regarding NGI-TR experience, the collection of the publications depending on the user declaration is not completely sufficient method. Although the web page of the NGI provides an open and simple web form for the publication deceleration and sends regular e-mails to the user community to remind the importance of the publication declaration only less feedback could be collected through this method. NGI-TR uses the Web of Science repository provided by Web of Knowledge as the second level for the collection of the missing publications as well as for checking the declared publications by the users.

The “Advanced Search Tool” of the Web of Science gives the publication list regarding indicated keywords and time span.

The following citation databases are checked for the publication search:

Science Citation Index Expanded (SCI-EXPANDED) Social Sciences Citation Index (SSCI) Arts & Humanities Citation Index (A&HCI) Conference Proceedings Citation Index- Science (CPCI-S) Conference Proceedings Citation Index- Social Science & Humanities (CPCI-SSH)

The achieved list is elaborated to match the authors of the publications with the e-Infrastructure users. If any of the users never uses the infrastructure, the related publication should be removed from the list.

If the publication consists of any “acknowledgement” referring to directly e-Infrastructure or NGI, the publication should be kept in the list.

If there is no acknowledgement in the publication, the account of the user as the author of the publication should be checked. If the account has not been used actively for a long time again the publication should be removed from the list.


  • Do you collect the impact factor? If so, how?

No

  • How do researchers know that they need to provide the information?

Regarding infrastructure user policy of NGI Turkey, all users should refer to the infrastructure in all their scientific products that are studied by means of the infrastructure and they have to provide feedback about the scientific publication to NGI via publication submission web tool or e-mail. All the users have been informed by the NGI about the user policy, acknowledgement or reference process together with acknowledgement samples at the opening of user account level and additionally NGI regularly sends e-mails to remind the process.

  • How do researchers provide the information? (the tool they use)

User can fill the web form or send an e-mail to inform the NGI about the publication.

  • How do researchers describe the use of the infrastructures in the publication? (provide guidelines)

Acknowledgement samples are on the NGI web site.

  • What are the obligations for researchers? Where are they defined? How do the researchers accept them?

NGI user policy document defines the user responsibilities and the user has to accept this policy by on-line for the account opening.

  • When do researchers provide the data? (e.g., yearly base, as soon as they publish)

As soon as they publish

  • Limitations/difficulties of the current process / Success stories of the current process

It is difficult to get feedback from the users.

  • How many publications are already collected and since when

326 since 2004

  • How the information is used

The infrastructure usage is reported in every six months to the funding body.


NGI-DE (Germany)

bwGRiD is a community in Germany and used as a German example here.

Scope
identify the scope of collecting publications (infrastructure, geographical boundaries, communities)
No process established right now, publications are collected by the communities itself
What information is collected
bwGRiD-Community: standard bibtex-information is collected, academic discipline is missing at the moment (and difficult to add, maybe "main academic discipline" and "more academic disciplines" would help)
Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)
No.
How do researchers know that they need to provide the information?
(bwGRiD) researchers are told, when they get the account
How do researchers provide the information? (the tool they use)
(bwGRiD) they send an email with a bibtex-attachment
How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
(bwGRiD) they can use a techreport-reference in the publication. No guideline at the moment
What are the obligations for researchers? Where are they defined? How do the researchers accept them?
not known.
When do researchers provide the data?
not known.
Limitations/difficulties of the current process
No automatic process, the slots for references in journals are limited
Success stories of the current process
(bwGRiD) over 220 publications are collected since 2008 http://www.bw-grid.de/publikationen/
How many publications are already collected and since when
All communities produced over 600 publications collected on community-pages since 2005 http://dgi-2.d-grid.de/publikationen.php
How the information is used
Information is used to show the need and befit of the German grid infrastructure

EGI-InSPIRE

XSEDE

Scope
identify the scope of collecting publications (infrastructure, geographical boundaries, communities)
XSEDE attempts to collect publications from all Principal Investigators (i.e., project leads) who have allocations on the various resources. XSEDE resources are managed by approximately six Service Providers across the U.S. Any researcher at most U.S.-based research institutions can request an allocation. Research from any scientific and engineering domain is eligible for allocations; the results must be publishable in the open literature (i.e., no proprietary commercial work or confidential government research).
What information is collected
At this time, we do not have a formal database for storing publication data, though such a system has been proposed in the past and may happen in the not-to-distant future. The current storage "schema" is a list of references (associated with the allocated project number) in a Microsoft Word document and attached as an appendix to the XSEDE quarterly report.
Describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure
We do not typically distinguish between different types of publications in our current collection process, though all types of publications are generally included. Most of the publication collection effort focuses on the publications by the scientific user community. XSEDE *does* however separately collect publications written by XSEDE-supported staff; these staff publications are typically focused on engineering of the infrastructure.
Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)
No.
How do researchers know that they need to provide the information?
As part of the user responsibilities agreement, which all users must acknowledge and accept in the XSEDE portal, users are instructed to acknowledge XSEDE in relevant publications and report those publications to XSEDE. The most common time for a researcher to report such publications is during the allocation process, when they request a renewal of an allocation. The review panel uses publications (submitted, accepted, in progress, as well as published) as part of their evaluation. The allocation policies also state that project leads should submit a final report at the conclusion of each project (though this is not strictly enforced).; How do researchers provide the information? (the tool they use)
How do researchers provide the information? (the tool they use)
The primary tool is submission of a PDF file included as part of their allocation request. Allocation requests are accepted via XSEDE's "POPS" system.
How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
That is up to the researcher. XSEDE does provide sample language that can be used to acknowledge XSEDE support at https://www.xsede.org/how-to-acknowledge-xsede
What are the obligations for researchers? Where are they defined? How do the researchers accept them?
As noted, all users must accept a user responsibilities agreement (a) when they first get their XSEDE user portal account, and (b) once per year thereafter. The user responsibilities are at: https://www.xsede.org/usage-policies
In addition, the policies for allocations further describe obligations associated with use of the resources. The policies are described at: https://www.xsede.org/web/guest/allocation-policies
When do researchers provide the data?
As noted, they typically provide this when they request renewal allocations. Allocations are usually made for 12-month periods, therefore, they provide this information annually. However, different projects renew at different times throughout the year; XSEDE makes large-scale allocations four times per year, so our primary collection frequency is quarterly.
Limitations/difficulties of the current process
The limitations of the current process include: The lack of a formal mechanism for capturing publications at the end of a project (i.e., formally collecting a "Final Report") means that we miss many publications from short-term projects, which includes many smaller projects. In addition, the lack of a robust database system for storing publication data limits the "cleanliness" of the data set (duplications inevitably exist) as well as the amount of reporting an analysis that we can do. A more challenging issue is verifying whether all of the reported publications do, in fact, acknowledge XSEDE or the source of their computational support.
Success stories of the current process
XSEDE's long history (dating back to the early days of the NSF Supercomputer Centers program) of collecting this data for renewal allocation requests means that users and reviewers recognize the importance of submitting the information. For continuing projects, we (probably) have a very high compliance rate, though this is hard to verify.
How many publications are already collected and since when
XSEDE has access to the lists of publications associated with its reports and TeraGrid reports dating back to 2005. The total number of publications in these documents is approximately 11,000. However, duplicates and other incorrect entries probably reduce that number by a small percentage.
Information is used to show the need and befit of the German grid infrastructure
The two current use are (a) for review of allocation requests by the review panel and (b) for XSEDE's reporting purposes, notably to its funding agency, the National Science Foundation.