Difference between revisions of "VT Scientific Publications Repository Current Practices"

From EGIWiki
Jump to: navigation, search
(XSEDE)
Line 22: Line 22:
 
*How the information is used
 
*How the information is used
  
 +
<br>
  
 
== Best Practices  ==
 
== Best Practices  ==
Line 29: Line 30:
 
*Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)<br>
 
*Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)<br>
  
All disciplines<br>
+
All disciplines<br>  
  
 
At least one author working in a French laboratory or organism  
 
At least one author working in a French laboratory or organism  
Line 36: Line 37:
  
 
*What information is collected  
 
*What information is collected  
**describe the schema and collect also values for the enumerations, e.g., list of disciplines, type of publications)  
+
**describe the schema and collect also values for the enumerations, e.g., list of disciplines, type of publications)
  
The elements are :
+
The elements are&nbsp;: Publication type, Subject, Title, Author(s) (name, first name), Laboratory with team, address, Abstract, Fulltext language, Production date, Conference or book title, keywords...  
Publication type, Subject, Title, Author(s) (name, first name), Laboratory with team, address, Abstract, Fulltext language, Production date, Conference or book title, keywords...
 
  
 
**describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure
 
**describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure
Line 75: Line 75:
  
 
*Success stories of the current process
 
*Success stories of the current process
about 700 publications collected from HAL to the collection in several months.
+
 
Several researchers referenced systematically their publications to put them in the collection. It is a major success and we hope to convince others soon!  
+
about 700 publications collected from HAL to the collection in several months. Several researchers referenced systematically their publications to put them in the collection. It is a major success and we hope to convince others soon!  
  
 
*How many publications are already collected and since when
 
*How many publications are already collected and since when
  
today 9 July 2012&nbsp;: 240 articles – 485 references - But all existing publications in HAL are currently not in the collection (especially concerning HEP)  
+
today 9 July 2012&nbsp;: 240 articles – 485 references - But all existing publications in HAL are currently not in the collection (especially concerning HEP) this is not enough to be representative of the work done on/with the help of the grid in France Grid projects exist since 2001 in France and the collection begins in 2002.  
this is not enough to be representative of the work done on/with the help of the grid in France
 
Grid projects exist since 2001 in France and the collection begins in 2002.  
 
  
 
*How the information is used
 
*How the information is used
  
used to produce indicators to evaluate the impact of the grid and to search for information about the grid users
+
used to produce indicators to evaluate the impact of the grid and to search for information about the grid users  
 +
 
 +
=== NGI Netherlands  ===
  
=== NGI Netherlands ===
 
 
Input collected by SC from a conversation with Peter Michielse, Deputy Director Policies and Strategy, BiGGrid (NL)  
 
Input collected by SC from a conversation with Peter Michielse, Deputy Director Policies and Strategy, BiGGrid (NL)  
  
* Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)
+
*Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)
Researchers that use computing resources in the Netherlands
+
 
 +
Researchers that use computing resources in the Netherlands  
 +
 
 +
*What information is collected (Describe the schema and collect also values for the enumerations, e.g., list of disciplines, type of publications; describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure)
 +
 
 +
Papers published by the research teams in the previous year
  
* What information is collected (Describe the schema and collect also values for the enumerations, e.g., list of disciplines, type of publications; describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure)
+
*Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)
Papers published by the research teams in the previous year
 
  
* Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)
+
No  
No
 
  
* How do researchers know that they need to provide the information?
+
*How do researchers know that they need to provide the information?
# Peter sends a general email (to about 40 researchers) around Christmas time, asking them to provide the information. Reminder is sent mid-January.
 
# Additionally, all researchers have to report on the outcomes of their research as per the agreement established when they fill a request to use the computing resources.
 
  
* How do researchers provide the information? (the tool they use)
+
#Peter sends a general email (to about 40 researchers) around Christmas time, asking them to provide the information. Reminder is sent mid-January.
# They reply to the email.
+
#Additionally, all researchers have to report on the outcomes of their research as per the agreement established when they fill a request to use the computing resources.
# They fill the outcome report online.
 
  
* How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
+
*How do researchers provide the information? (the tool they use)
They don’t in this instance. They have to fill an official resource request form before  they use the resources.
 
  
* What are the obligations for researchers? Where are they defined? How do the researchers accept them?
+
#They reply to the email.
The researchers are required to report on the outcomes of their research, as part of their ‘contract’ with NCF (the Computer Science arm of the Dutch funding agency, NWO)
+
#They fill the outcome report online.
 +
 
 +
*How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
 +
 
 +
They don’t in this instance. They have to fill an official resource request form before they use the resources.
 +
 
 +
*What are the obligations for researchers? Where are they defined? How do the researchers accept them?
 +
 
 +
The researchers are required to report on the outcomes of their research, as part of their ‘contract’ with NCF (the Computer Science arm of the Dutch funding agency, NWO)  
  
 
*When do researchers provide the data? (e.g., yearly base, as soon as they publish)
 
*When do researchers provide the data? (e.g., yearly base, as soon as they publish)
# Yearly basis
 
# Whenever they report on the outcome of the research
 
  
* Limitations/difficulties of the current process / Success stories of the current process
+
#Yearly basis
Peter says this works quite well
+
#Whenever they report on the outcome of the research
  
* How many publications are already collected and since when
+
*Limitations/difficulties of the current process / Success stories of the current process
Difficult to tell - hundreds
+
 
 +
Peter says this works quite well
 +
 
 +
*How many publications are already collected and since when
 +
 
 +
Difficult to tell - hundreds  
  
 
*How the information is used
 
*How the information is used
To demonstrate the usage of grid computing.
 
  
=== NGI Turkey ===
+
To demonstrate the usage of grid computing.
 +
 
 +
=== NGI Turkey ===
 +
 
 +
*Scope: identify the scope of collecting publications
  
* Scope: identify the scope of collecting publications
+
The collection of the publications is a “measurement technique” that allows to get measurable metrics about the usage scientific results of the e-Infrastructure. NGI-TR aims to collect all scientific products of the users.
  
The collection of the publications is a “measurement technique” that allows to get measurable metrics about the usage scientific results of the e-Infrastructure. NGI-TR aims to collect all scientific products of the users.
+
*What information is collected
  
* What information is collected
+
NGI-TR has been using a two levels mechanism for the collection of the scientific publication: The first level collection is based on the user declaration through a web interface that could be filled by any of the users.
  
NGI-TR has been using a two levels mechanism for the collection of the scientific publication:
+
The user having a scientific publication can choose the different web forms for the publication declaration depending on the type of publication.  
The first level collection is based on the user declaration through a web interface that could be filled by any of the users.
 
  
The user having a scientific publication can choose the different web forms for the publication declaration depending on the type of publication.
+
The publications are classified as “scientific journal”, “conference proceeding”, “thesis” and “other”. The web forms allow to the collection of the following information depending on the publication type that could be submitted by the user.  
  
The publications are classified as “scientific journal”, “conference proceeding”, “thesis” and “other”. The web forms allow to the collection of the following information depending on the publication type that could be submitted by the user.
+
Scientific Journal: Name of the Journal, Journal ISSN, Authors, Title, Year, Number, Section, Page
  
Scientific Journal: Name of the Journal, Journal ISSN, Authors, Title, Year, Number, Section, Page
+
Conference Proceeding: Name of the conference, Date and location, Proceeding ISBN, Authors, Title, Number, Section, Page  
  
Conference Proceeding: Name of the conference, Date and location, Proceeding ISBN, Authors, Title, Number, Section, Page
+
Thesis: Title, Author, Class (PhD or MS), University, Faculty, Department, Year, Advisor(s) Other: Authors, Title, Year, Publication / presentation / magazine information.
  
Thesis: Title, Author, Class (PhD or MS), University, Faculty, Department, Year, Advisor(s)
+
Regarding NGI-TR experience, the collection of the publications depending on the user declaration is not completely sufficient method. Although the web page of the NGI provides an open and simple web form for the publication deceleration and sends regular e-mails to the user community to remind the importance of the publication declaration only less feedback could be collected through this method. NGI-TR uses the Web of Science repository provided by Web of Knowledge as the second level for the collection of the missing publications as well as for checking the declared publications by the users.  
Other: Authors, Title, Year, Publication / presentation / magazine information.
 
  
Regarding NGI-TR experience, the collection of the publications depending on the user declaration is not completely sufficient method. Although the web page of the NGI provides an open and simple web form for the publication deceleration and sends regular e-mails to the user community to remind the importance of the publication declaration only less feedback could be collected through this method.
+
The “Advanced Search Tool” of the Web of Science gives the publication list regarding indicated keywords and time span.  
NGI-TR uses the Web of Science repository provided by Web of Knowledge as the second level for the collection of the missing publications as well as for checking the declared publications by the users.
 
  
The “Advanced Search Tool” of the Web of Science gives the publication list regarding indicated keywords and time span.
+
The following citation databases are checked for the publication search:
  
The following citation databases are checked for the publication search:
+
Science Citation Index Expanded (SCI-EXPANDED) Social Sciences Citation Index (SSCI) Arts &amp; Humanities Citation Index (A&amp;HCI) Conference Proceedings Citation Index- Science (CPCI-S) Conference Proceedings Citation Index- Social Science &amp; Humanities (CPCI-SSH)
  
Science Citation Index Expanded (SCI-EXPANDED)
+
The achieved list is elaborated to match the authors of the publications with the e-Infrastructure users. If any of the users never uses the infrastructure, the related publication should be removed from the list.
Social Sciences Citation Index (SSCI)
 
Arts & Humanities Citation Index (A&HCI)
 
Conference Proceedings Citation Index- Science (CPCI-S)
 
Conference Proceedings Citation Index- Social Science & Humanities (CPCI-SSH)
 
  
The achieved list is elaborated to match the authors of the publications with the e-Infrastructure users. If any of the users never uses the infrastructure, the related publication should be removed from the list.
+
If the publication consists of any “acknowledgement” referring to directly e-Infrastructure or NGI, the publication should be kept in the list.  
  
If the publication consists of any “acknowledgement” referring to directly e-Infrastructure or NGI, the publication should be kept in the list.
+
If there is no acknowledgement in the publication, the account of the user as the author of the publication should be checked. If the account has not been used actively for a long time again the publication should be removed from the list.  
  
If there is no acknowledgement in the publication, the account of the user as the author of the publication should be checked. If the account has not been used actively for a long time again the publication should be removed from the list.
+
<br>
 +
 
 +
*Do you collect the impact factor? If so, how?
 +
 
 +
No
 +
 
 +
*How do researchers know that they need to provide the information?
  
 +
Regarding infrastructure user policy of NGI Turkey, all users should refer to the infrastructure in all their scientific products that are studied by means of the infrastructure and they have to provide feedback about the scientific publication to NGI via publication submission web tool or e-mail. All the users have been informed by the NGI about the user policy, acknowledgement or reference process together with acknowledgement samples at the opening of user account level and additionally NGI regularly sends e-mails to remind the process.
  
* Do you collect the impact factor? If so, how?
+
*How do researchers provide the information? (the tool they use)
No
 
  
* How do researchers know that they need to provide the information?
+
User can fill the web form or send an e-mail to inform the NGI about the publication.
  
Regarding infrastructure user policy of NGI Turkey, all users should refer to the infrastructure in all their scientific products that are studied by means of the infrastructure and they have to provide feedback about the scientific publication to NGI via publication submission web tool or e-mail. All the users have been informed by the NGI about the user policy, acknowledgement or reference process together with acknowledgement samples at the opening of user account level and additionally NGI regularly sends e-mails to remind the process.
+
*How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
  
* How do researchers provide the information? (the tool they use)
+
Acknowledgement samples are on the NGI web site.  
User can fill the web form or send an e-mail to inform the NGI about the publication.
 
  
* How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
+
*What are the obligations for researchers? Where are they defined? How do the researchers accept them?
Acknowledgement samples are on the NGI web site.
 
  
* What are the obligations for researchers? Where are they defined? How do the researchers accept them?
+
NGI user policy document defines the user responsibilities and the user has to accept this policy by on-line for the account opening.  
NGI user policy document defines the user responsibilities and the user has to accept this policy by on-line for the account opening.
 
  
 
*When do researchers provide the data? (e.g., yearly base, as soon as they publish)
 
*When do researchers provide the data? (e.g., yearly base, as soon as they publish)
As soon as they publish
 
  
* Limitations/difficulties of the current process / Success stories of the current process
+
As soon as they publish
It is difficult to get feedback from the users.
 
  
* How many publications are already collected and since when
+
*Limitations/difficulties of the current process / Success stories of the current process
326 since 2004
+
 
 +
It is difficult to get feedback from the users.
 +
 
 +
*How many publications are already collected and since when
 +
 
 +
326 since 2004  
  
 
*How the information is used
 
*How the information is used
The infrastructure usage is reported in every six months to the funding body.
 
  
 +
The infrastructure usage is reported in every six months to the funding body.
  
=== NGI-DE (Germany) ===
+
<br>
''bwGRiD is a community in Germany and used as a German example here.''
 
  
;Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)  
+
=== NGI-DE (Germany)  ===
: No process established right now, publications are collected by the communities itself
+
 
; What information is collected  
+
''bwGRiD is a community in Germany and used as a German example here.''
: bwGRiD-Community: standard bibtex-information is collected, academic discipline is missing at the moment (and difficult to add, maybe "main academic discipline" and "more academic disciplines" would help)
+
 
; Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)  
+
;Scope
: No.
+
:identify the scope of collecting publications (infrastructure, geographical boundaries, communities)  
; How do researchers know that they need to provide the information?  
+
:No process established right now, publications are collected by the communities itself  
: (bwGRiD) researchers are told, when they get the account
+
;What information is collected  
; How do researchers provide the information? (the tool they use)  
+
:bwGRiD-Community: standard bibtex-information is collected, academic discipline is missing at the moment (and difficult to add, maybe "main academic discipline" and "more academic disciplines" would help)  
: (bwGRiD) they send an email with a bibtex-attachment
+
;Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http
; How do researchers describe the use of the infrastructures in the publication? (provide guidelines)  
+
://wokinfo.com/)  
:(bwGRiD) they can use a techreport-reference in the publication. No guideline at the moment
+
:No.  
; What are the obligations for researchers? Where are they defined? How do the researchers accept them?  
+
;How do researchers know that they need to provide the information?  
: not known.
+
:(bwGRiD) researchers are told, when they get the account  
; When do researchers provide the data?
+
;How do researchers provide the information? (the tool they use)  
: not known.
+
:(bwGRiD) they send an email with a bibtex-attachment  
; Limitations/difficulties of the current process  
+
;How do researchers describe the use of the infrastructures in the publication? (provide guidelines)  
: No automatic process, the slots for references in journals are limited
+
:(bwGRiD) they can use a techreport-reference in the publication. No guideline at the moment  
; Success stories of the current process  
+
;What are the obligations for researchers? Where are they defined? How do the researchers accept them?  
: (bwGRiD) over 220 publications are collected since 2008 http://www.bw-grid.de/publikationen/  
+
:not known.  
; How many publications are already collected and since when  
+
;When do researchers provide the data?  
: All communities produced over 600 publications collected on community-pages since 2005 http://dgi-2.d-grid.de/publikationen.php  
+
:not known.  
;How the information is used
+
;Limitations/difficulties of the current process  
: Information is used to show the need and befit of the German grid infrastructure
+
:No automatic process, the slots for references in journals are limited  
 +
;Success stories of the current process  
 +
:(bwGRiD) over 220 publications are collected since 2008 http://www.bw-grid.de/publikationen/  
 +
;How many publications are already collected and since when  
 +
:All communities produced over 600 publications collected on community-pages since 2005 http://dgi-2.d-grid.de/publikationen.php  
 +
;How the information is used  
 +
:Information is used to show the need and befit of the German grid infrastructure
 +
 
 +
=== EGI-InSPIRE  ===
 +
 
 +
*'''Referencing EGI-InSPIRE''' can be found on https://www.egi.eu/about/egi-inspire/templates/
 +
 
 +
=== XSEDE  ===
 +
 
 +
;Scope
 +
:identify the scope of collecting publications (infrastructure, geographical boundaries, communities)
 +
:XSEDE attempts to collect publications from all Principal Investigators (i.e., project leads) who have allocations on the various resources. XSEDE resources are managed by approximately six Service Providers across the U.S. Any researcher at most U.S.-based research institutions can request an allocation. Research from any scientific and engineering domain is eligible for allocations; the results must be publishable in the open literature (i.e., no proprietary commercial work or confidential government research).
 +
;What information is collected
 +
:At this time, we do not have a formal database for storing publication data, though such a system has been proposed in the past and may happen in the not-to-distant future. The current storage "schema" is a list of references (associated with the allocated project number) in a Microsoft Word document and attached as an appendix to the XSEDE quarterly report.
 +
;Describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure
 +
:We do not typically distinguish between different types of publications in our current collection process, though all types of publications are generally included. Most of the publication collection effort focuses on the publications by the scientific user community. XSEDE *does* however separately collect publications written by XSEDE-supported staff; these staff publications are typically focused on engineering of the infrastructure.
 +
;Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http
 +
://wokinfo.com/)
 +
:No.
 +
;How do researchers know that they need to provide the information?
 +
:As part of the user responsibilities agreement, which all users must acknowledge and accept in the XSEDE portal, users are instructed to acknowledge XSEDE in relevant publications and report those publications to XSEDE. The most common time for a researcher to report such publications is during the allocation process, when they request a renewal of an allocation. The review panel uses publications (submitted, accepted, in progress, as well as published) as part of their evaluation. The allocation policies also state that project leads should submit a final report at the conclusion of each project (though this is not strictly enforced).; How do researchers provide the information? (the tool they use)
 +
;How do researchers provide the information? (the tool they use)
 +
:The primary tool is submission of a PDF file included as part of their allocation request. Allocation requests are accepted via XSEDE's "POPS" system.
 +
;How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
 +
:That is up to the researcher. XSEDE does provide sample language that can be used to acknowledge XSEDE support at https://www.xsede.org/how-to-acknowledge-xsede
 +
;What are the obligations for researchers? Where are they defined? How do the researchers accept them?
 +
:As noted, all users must accept a user responsibilities agreement (a) when they first get their XSEDE user portal account, and (b) once per year thereafter. The user responsibilities are at: https://www.xsede.org/usage-policies
 +
:In addition, the policies for allocations further describe obligations associated with use of the resources. The policies are described at: https://www.xsede.org/web/guest/allocation-policies
 +
;When do researchers provide the data?
 +
:As noted, they typically provide this when they request renewal allocations. Allocations are usually made for 12-month periods, therefore, they provide this information annually. However, different projects renew at different times throughout the year; XSEDE makes large-scale allocations four times per year, so our primary collection frequency is quarterly.
 +
;Limitations/difficulties of the current process
 +
:The limitations of the current process include: The lack of a formal mechanism for capturing publications at the end of a project (i.e., formally collecting a "Final Report") means that we miss many publications from short-term projects, which includes many smaller projects. In addition, the lack of a robust database system for storing publication data limits the "cleanliness" of the data set (duplications inevitably exist) as well as the amount of reporting an analysis that we can do. A more challenging issue is verifying whether all of the reported publications do, in fact, acknowledge XSEDE or the source of their computational support.
 +
;Success stories of the current process
 +
:XSEDE's long history (dating back to the early days of the NSF Supercomputer Centers program) of collecting this data for renewal allocation requests means that users and reviewers recognize the importance of submitting the information. For continuing projects, we (probably) have a very high compliance rate, though this is hard to verify.
 +
;How many publications are already collected and since when
 +
:XSEDE has access to the lists of publications associated with its reports and TeraGrid reports dating back to 2005. The total number of publications in these documents is approximately 11,000. However, duplicates and other incorrect entries probably reduce that number by a small percentage.
 +
;Information is used to show the need and befit of the German grid infrastructure
 +
:The two current use are (a) for review of allocation requests by the review panel and (b) for XSEDE's reporting purposes, notably to its funding agency, the National Science Foundation.
  
=== EGI-InSPIRE ===
+
<br>
  
* '''Referencing EGI-InSPIRE''' can be found on https://www.egi.eu/about/egi-inspire/templates/
+
=== NCAR/CISL  ===
  
=== XSEDE ===
+
;Scope
;Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)  
+
:identify the scope of collecting publications (infrastructure, geographical boundaries, communities)  
: XSEDE attempts to collect publications from all Principal Investigators (i.e., project leads) who have allocations on the various resources. XSEDE resources are managed by approximately six Service Providers across the U.S. Any researcher at most U.S.-based research institutions can request an allocation. Research from any scientific and engineering domain is eligible for allocations; the results must be publishable in the open literature (i.e., no proprietary commercial work or confidential government research).
+
:NCAR, specifically the Computational and Information Systems Laboratory (CISL) at NCAR, collects publications from users of its production HPC environment. HPC users are primarily from U.S.-based university researchers in the atmospheric and related sciences.  
; What information is collected  
+
;What information is collected, Describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure
: At this time, we do not have a formal database for storing publication data, though such a system has been proposed in the past and may happen in the not-to-distant future. The current storage "schema" is a list of references (associated with the allocated project number) in a Microsoft Word document and attached as an appendix to the XSEDE quarterly report.
+
:We collect simply bibliographic references to the publications. We ask users to separate their publications into three different categories: dissertations/theses, peer-reviewed publications, other publications (including posters, presentations).
; Describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure
+
:We are currently exploring using a system from the NCAR Library to store these collected publications. For now, the publications will be entered into this system manually by NCAR/CISL staff.
: We do not typically distinguish between different types of publications in our current collection process, though all types of publications are generally included. Most of the publication collection effort focuses on the publications by the scientific user community. XSEDE *does* however separately collect publications written by XSEDE-supported staff; these staff publications are typically focused on engineering of the infrastructure.
+
;Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http
; Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)  
+
://wokinfo.com/)  
: No.
+
: No. However, we hope to leverage the NCAR Library's system to collect impact factor and other data in the future.
; How do researchers know that they need to provide the information?  
+
;How do researchers know that they need to provide the information?  
: As part of the user responsibilities agreement, which all users must acknowledge and accept in the XSEDE portal, users are instructed to acknowledge XSEDE in relevant publications and report those publications to XSEDE. The most common time for a researcher to report such publications is during the allocation process, when they request a renewal of an allocation. The review panel uses publications (submitted, accepted, in progress, as well as published) as part of their evaluation. The allocation policies also state that project leads should submit a final report at the conclusion of each project (though this is not strictly enforced).; How do researchers provide the information? (the tool they use)
+
: We currently conduct an annual survey of current and recent users of our production HPC environment. We are currently tightening our process for returning users and asking them for recently completed publications at the time they apply for a new allocation.
; How do researchers provide the information? (the tool they use)
+
;How do researchers provide the information? (the tool they use)  
: The primary tool is submission of a PDF file included as part of their allocation request. Allocation requests are accepted via XSEDE's "POPS" system.
+
:We use a web-based survey tool, Opinio, hosted at NCAR.
; How do researchers describe the use of the infrastructures in the publication? (provide guidelines)  
+
;How do researchers describe the use of the infrastructures in the publication? (provide guidelines)  
: That is up to the researcher. XSEDE does provide sample language that can be used to acknowledge XSEDE support at https://www.xsede.org/how-to-acknowledge-xsede
+
:We provide guidance via a web page, https://www2.cisl.ucar.edu/docs/acknowledging-ncarcisl
 +
: The URL for this web page, as well as the requirement to acknowledge NCAR/CISL support, is included in their allocation award letters.
 +
: With our new procurement, we are trying several new approaches to see if it helps improve compliance and/or tracking of publications. First, we are offering users the option of *citing* their resource use, instead of including text in acknowledgments. Second, we have associated an "Archival Resource Key" or ARK with our primary resource and instructed users to include this unique string in their citations or acknowledgments. The citation option may increase compliance, particularly in cases where acknowledgments are not appropriate. Use of the ARK is designed to make web searches easier -- finding a unique string (like a DOI) is more robust than finding the many ways acknowledgments may be phrased.
 +
: Whether the use of the ARK has any benefit or effect at all -- that remains to be seen. Ask me again in a year or so.
 
; What are the obligations for researchers? Where are they defined? How do the researchers accept them?  
 
; What are the obligations for researchers? Where are they defined? How do the researchers accept them?  
: As noted, all users must accept a user responsibilities agreement (a) when they first get their XSEDE user portal account, and (b) once per year thereafter. The user responsibilities are at: https://www.xsede.org/usage-policies
+
:Our user responsibilities are defined at https://www2.cisl.ucar.edu/docs/responsibilities. However, there is no formal acceptance of these obligations.
: In addition, the policies for allocations further describe obligations associated with use of the resources. The policies are described at: https://www.xsede.org/web/guest/allocation-policies
+
;When do researchers provide the data?  
; When do researchers provide the data?
+
:As noted we conduct a survey annually, and are now asking when they return for new project allocations. In the past, our letters asked for users to send articles upon publication, but this happened so infrequently (and we had no way to confirm or enforce this rule) that we stopped doing so.
: As noted, they typically provide this when they request renewal allocations. Allocations are usually made for 12-month periods, therefore, they provide this information annually. However, different projects renew at different times throughout the year; XSEDE makes large-scale allocations four times per year, so our primary collection frequency is quarterly.
+
;Limitations/difficulties of the current process  
; Limitations/difficulties of the current process  
+
:Many. We have no way to verify the completeness of our collection. In addition, we have not typically verified that the publications acknowledge support for NCAR/CISL computing support.  
: The limitations of the current process include: The lack of a formal mechanism for capturing publications at the end of a project (i.e., formally collecting a "Final Report") means that we miss many publications from short-term projects, which includes many smaller projects. In addition, the lack of a robust database system for storing publication data limits the "cleanliness" of the data set (duplications inevitably exist) as well as the amount of reporting an analysis that we can do. A more challenging issue is verifying whether all of the reported publications do, in fact, acknowledge XSEDE or the source of their computational support.
+
;Success stories of the current process  
; Success stories of the current process  
+
:Difficult to say. We have been conducting the survey regularly for a number of years, and I believe (but cannot confirm) that we get a reasonably large fraction of our user's publications.
: XSEDE's long history (dating back to the early days of the NSF Supercomputer Centers program) of collecting this data for renewal allocation requests means that users and reviewers recognize the importance of submitting the information. For continuing projects, we (probably) have a very high compliance rate, though this is hard to verify.
+
;How many publications are already collected and since when  
; How many publications are already collected and since when  
+
:Several hundred per year, at least since FY2008.
: XSEDE has access to the lists of publications associated with its reports and TeraGrid reports dating back to 2005. The total number of publications in these documents is approximately 11,000. However, duplicates and other incorrect entries probably reduce that number by a small percentage.
+
;Information is used to show the need and befit of the German grid infrastructure  
; Information is used to show the need and befit of the German grid infrastructure
+
:Primarily, these publication counts are used as part of the NCAR/CISL annual reporting process and budget review process, both within NCAR and for presentation to the National Science Foundation (NSF), our primary funding sponsor.
: The two current use are (a) for review of allocation requests by the review panel and (b) for XSEDE's reporting purposes, notably to its funding agency, the National Science Foundation.
 
  
 
[[Category:Scientific_Publications_Repository]]
 
[[Category:Scientific_Publications_Repository]]

Revision as of 16:28, 2 August 2012

VT Scientific Publications Repository: VT Scientific Publications Repository Tasks Meetings & Actions



Collect Best Practices

This is a list of questions to document best practices from established processes to collect scientific publications that were possible thanks to the use of e-infrastructures

  • Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)
  • What information is collected
    • describe the schema and collect also values for the enumerations, e.g., list of disciplines, type of publications)
    • describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure
  • Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)
  • How do researchers know that they need to provide the information?
  • How do researchers provide the information? (the tool they use)
  • How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
  • What are the obligations for researchers? Where are they defined? How do the researchers accept them?
  • When do researchers provide the data? (e.g., yearly base, as soon as they publish)
  • Limitations/difficulties of the current process
  • Success stories of the current process
  • How many publications are already collected and since when
  • How the information is used


Best Practices

NGI France

  • Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)

All disciplines

At least one author working in a French laboratory or organism

All infrastructures (production or research infrastructures)

  • What information is collected
    • describe the schema and collect also values for the enumerations, e.g., list of disciplines, type of publications)

The elements are : Publication type, Subject, Title, Author(s) (name, first name), Laboratory with team, address, Abstract, Fulltext language, Production date, Conference or book title, keywords...

    • describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure

the tool used is built to ask for discipline, type of publications and keywords and allows to search for them. There are menus to get the publications with selection criteria and order to present them. A publication may have several disciplines and keywords

  • Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)

NO

  • How do researchers know that they need to provide the information?

We advertise by all means : NGI "France Grilles" web site, "France Grilles" newsletter, letter of the director, e-mails in the lists (labs directors, users, operational and sites teams...), presentations at the grid days... and of course reminders.

  • How do researchers provide the information? (the tool they use)
  1. They have to reference (or put) their publications in a national tool build for all the research community in France - all organisms or universities. The tools is http://hal.archives-ouvertes.fr/
  2. They have to inform a functional email address or G Romier and to give the hal identifier or the title of the publication
  • How do researchers describe the use of the infrastructures in the publication? (provide guidelines)

All possibilities but we ask for an acknowledgement that is advertised on the NGI web site. In the HAL tool, a keyword is also possible to search for the publication.

  • What are the obligations for researchers? Where are they defined? How do the researchers accept them?

There are no obligations concerning the NGI. There are no obligations concerning the organisms or universities except for several.

  • When do researchers provide the data? (e.g., yearly base, as soon as they publish)

It depends on their organisms and context.

  • Limitations/difficulties of the current process

On a voluntary basis

  • Success stories of the current process

about 700 publications collected from HAL to the collection in several months. Several researchers referenced systematically their publications to put them in the collection. It is a major success and we hope to convince others soon!

  • How many publications are already collected and since when

today 9 July 2012 : 240 articles – 485 references - But all existing publications in HAL are currently not in the collection (especially concerning HEP) this is not enough to be representative of the work done on/with the help of the grid in France Grid projects exist since 2001 in France and the collection begins in 2002.

  • How the information is used

used to produce indicators to evaluate the impact of the grid and to search for information about the grid users

NGI Netherlands

Input collected by SC from a conversation with Peter Michielse, Deputy Director Policies and Strategy, BiGGrid (NL)

  • Scope: identify the scope of collecting publications (infrastructure, geographical boundaries, communities)

Researchers that use computing resources in the Netherlands

  • What information is collected (Describe the schema and collect also values for the enumerations, e.g., list of disciplines, type of publications; describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure)

Papers published by the research teams in the previous year

  • Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http://wokinfo.com/)

No

  • How do researchers know that they need to provide the information?
  1. Peter sends a general email (to about 40 researchers) around Christmas time, asking them to provide the information. Reminder is sent mid-January.
  2. Additionally, all researchers have to report on the outcomes of their research as per the agreement established when they fill a request to use the computing resources.
  • How do researchers provide the information? (the tool they use)
  1. They reply to the email.
  2. They fill the outcome report online.
  • How do researchers describe the use of the infrastructures in the publication? (provide guidelines)

They don’t in this instance. They have to fill an official resource request form before they use the resources.

  • What are the obligations for researchers? Where are they defined? How do the researchers accept them?

The researchers are required to report on the outcomes of their research, as part of their ‘contract’ with NCF (the Computer Science arm of the Dutch funding agency, NWO)

  • When do researchers provide the data? (e.g., yearly base, as soon as they publish)
  1. Yearly basis
  2. Whenever they report on the outcome of the research
  • Limitations/difficulties of the current process / Success stories of the current process

Peter says this works quite well

  • How many publications are already collected and since when

Difficult to tell - hundreds

  • How the information is used

To demonstrate the usage of grid computing.

NGI Turkey

  • Scope: identify the scope of collecting publications

The collection of the publications is a “measurement technique” that allows to get measurable metrics about the usage scientific results of the e-Infrastructure. NGI-TR aims to collect all scientific products of the users.

  • What information is collected

NGI-TR has been using a two levels mechanism for the collection of the scientific publication: The first level collection is based on the user declaration through a web interface that could be filled by any of the users.

The user having a scientific publication can choose the different web forms for the publication declaration depending on the type of publication.

The publications are classified as “scientific journal”, “conference proceeding”, “thesis” and “other”. The web forms allow to the collection of the following information depending on the publication type that could be submitted by the user.

Scientific Journal: Name of the Journal, Journal ISSN, Authors, Title, Year, Number, Section, Page

Conference Proceeding: Name of the conference, Date and location, Proceeding ISBN, Authors, Title, Number, Section, Page

Thesis: Title, Author, Class (PhD or MS), University, Faculty, Department, Year, Advisor(s) Other: Authors, Title, Year, Publication / presentation / magazine information.

Regarding NGI-TR experience, the collection of the publications depending on the user declaration is not completely sufficient method. Although the web page of the NGI provides an open and simple web form for the publication deceleration and sends regular e-mails to the user community to remind the importance of the publication declaration only less feedback could be collected through this method. NGI-TR uses the Web of Science repository provided by Web of Knowledge as the second level for the collection of the missing publications as well as for checking the declared publications by the users.

The “Advanced Search Tool” of the Web of Science gives the publication list regarding indicated keywords and time span.

The following citation databases are checked for the publication search:

Science Citation Index Expanded (SCI-EXPANDED) Social Sciences Citation Index (SSCI) Arts & Humanities Citation Index (A&HCI) Conference Proceedings Citation Index- Science (CPCI-S) Conference Proceedings Citation Index- Social Science & Humanities (CPCI-SSH)

The achieved list is elaborated to match the authors of the publications with the e-Infrastructure users. If any of the users never uses the infrastructure, the related publication should be removed from the list.

If the publication consists of any “acknowledgement” referring to directly e-Infrastructure or NGI, the publication should be kept in the list.

If there is no acknowledgement in the publication, the account of the user as the author of the publication should be checked. If the account has not been used actively for a long time again the publication should be removed from the list.


  • Do you collect the impact factor? If so, how?

No

  • How do researchers know that they need to provide the information?

Regarding infrastructure user policy of NGI Turkey, all users should refer to the infrastructure in all their scientific products that are studied by means of the infrastructure and they have to provide feedback about the scientific publication to NGI via publication submission web tool or e-mail. All the users have been informed by the NGI about the user policy, acknowledgement or reference process together with acknowledgement samples at the opening of user account level and additionally NGI regularly sends e-mails to remind the process.

  • How do researchers provide the information? (the tool they use)

User can fill the web form or send an e-mail to inform the NGI about the publication.

  • How do researchers describe the use of the infrastructures in the publication? (provide guidelines)

Acknowledgement samples are on the NGI web site.

  • What are the obligations for researchers? Where are they defined? How do the researchers accept them?

NGI user policy document defines the user responsibilities and the user has to accept this policy by on-line for the account opening.

  • When do researchers provide the data? (e.g., yearly base, as soon as they publish)

As soon as they publish

  • Limitations/difficulties of the current process / Success stories of the current process

It is difficult to get feedback from the users.

  • How many publications are already collected and since when

326 since 2004

  • How the information is used

The infrastructure usage is reported in every six months to the funding body.


NGI-DE (Germany)

bwGRiD is a community in Germany and used as a German example here.

Scope
identify the scope of collecting publications (infrastructure, geographical boundaries, communities)
No process established right now, publications are collected by the communities itself
What information is collected
bwGRiD-Community: standard bibtex-information is collected, academic discipline is missing at the moment (and difficult to add, maybe "main academic discipline" and "more academic disciplines" would help)
Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http
//wokinfo.com/)
No.
How do researchers know that they need to provide the information?
(bwGRiD) researchers are told, when they get the account
How do researchers provide the information? (the tool they use)
(bwGRiD) they send an email with a bibtex-attachment
How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
(bwGRiD) they can use a techreport-reference in the publication. No guideline at the moment
What are the obligations for researchers? Where are they defined? How do the researchers accept them?
not known.
When do researchers provide the data?
not known.
Limitations/difficulties of the current process
No automatic process, the slots for references in journals are limited
Success stories of the current process
(bwGRiD) over 220 publications are collected since 2008 http://www.bw-grid.de/publikationen/
How many publications are already collected and since when
All communities produced over 600 publications collected on community-pages since 2005 http://dgi-2.d-grid.de/publikationen.php
How the information is used
Information is used to show the need and befit of the German grid infrastructure

EGI-InSPIRE

XSEDE

Scope
identify the scope of collecting publications (infrastructure, geographical boundaries, communities)
XSEDE attempts to collect publications from all Principal Investigators (i.e., project leads) who have allocations on the various resources. XSEDE resources are managed by approximately six Service Providers across the U.S. Any researcher at most U.S.-based research institutions can request an allocation. Research from any scientific and engineering domain is eligible for allocations; the results must be publishable in the open literature (i.e., no proprietary commercial work or confidential government research).
What information is collected
At this time, we do not have a formal database for storing publication data, though such a system has been proposed in the past and may happen in the not-to-distant future. The current storage "schema" is a list of references (associated with the allocated project number) in a Microsoft Word document and attached as an appendix to the XSEDE quarterly report.
Describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure
We do not typically distinguish between different types of publications in our current collection process, though all types of publications are generally included. Most of the publication collection effort focuses on the publications by the scientific user community. XSEDE *does* however separately collect publications written by XSEDE-supported staff; these staff publications are typically focused on engineering of the infrastructure.
Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http
//wokinfo.com/)
No.
How do researchers know that they need to provide the information?
As part of the user responsibilities agreement, which all users must acknowledge and accept in the XSEDE portal, users are instructed to acknowledge XSEDE in relevant publications and report those publications to XSEDE. The most common time for a researcher to report such publications is during the allocation process, when they request a renewal of an allocation. The review panel uses publications (submitted, accepted, in progress, as well as published) as part of their evaluation. The allocation policies also state that project leads should submit a final report at the conclusion of each project (though this is not strictly enforced).; How do researchers provide the information? (the tool they use)
How do researchers provide the information? (the tool they use)
The primary tool is submission of a PDF file included as part of their allocation request. Allocation requests are accepted via XSEDE's "POPS" system.
How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
That is up to the researcher. XSEDE does provide sample language that can be used to acknowledge XSEDE support at https://www.xsede.org/how-to-acknowledge-xsede
What are the obligations for researchers? Where are they defined? How do the researchers accept them?
As noted, all users must accept a user responsibilities agreement (a) when they first get their XSEDE user portal account, and (b) once per year thereafter. The user responsibilities are at: https://www.xsede.org/usage-policies
In addition, the policies for allocations further describe obligations associated with use of the resources. The policies are described at: https://www.xsede.org/web/guest/allocation-policies
When do researchers provide the data?
As noted, they typically provide this when they request renewal allocations. Allocations are usually made for 12-month periods, therefore, they provide this information annually. However, different projects renew at different times throughout the year; XSEDE makes large-scale allocations four times per year, so our primary collection frequency is quarterly.
Limitations/difficulties of the current process
The limitations of the current process include: The lack of a formal mechanism for capturing publications at the end of a project (i.e., formally collecting a "Final Report") means that we miss many publications from short-term projects, which includes many smaller projects. In addition, the lack of a robust database system for storing publication data limits the "cleanliness" of the data set (duplications inevitably exist) as well as the amount of reporting an analysis that we can do. A more challenging issue is verifying whether all of the reported publications do, in fact, acknowledge XSEDE or the source of their computational support.
Success stories of the current process
XSEDE's long history (dating back to the early days of the NSF Supercomputer Centers program) of collecting this data for renewal allocation requests means that users and reviewers recognize the importance of submitting the information. For continuing projects, we (probably) have a very high compliance rate, though this is hard to verify.
How many publications are already collected and since when
XSEDE has access to the lists of publications associated with its reports and TeraGrid reports dating back to 2005. The total number of publications in these documents is approximately 11,000. However, duplicates and other incorrect entries probably reduce that number by a small percentage.
Information is used to show the need and befit of the German grid infrastructure
The two current use are (a) for review of allocation requests by the review panel and (b) for XSEDE's reporting purposes, notably to its funding agency, the National Science Foundation.


NCAR/CISL

Scope
identify the scope of collecting publications (infrastructure, geographical boundaries, communities)
NCAR, specifically the Computational and Information Systems Laboratory (CISL) at NCAR, collects publications from users of its production HPC environment. HPC users are primarily from U.S.-based university researchers in the atmospheric and related sciences.
What information is collected, Describe if you distinguish between conference paper vs. journals or science based on the infrastructure vs. engineering of the infrastructure
We collect simply bibliographic references to the publications. We ask users to separate their publications into three different categories: dissertations/theses, peer-reviewed publications, other publications (including posters, presentations).
We are currently exploring using a system from the NCAR Library to store these collected publications. For now, the publications will be entered into this system manually by NCAR/CISL staff.
Do you collect the impact factor? If so, how? (researcher add it, link to a database of impact factors e.g., http
//wokinfo.com/)
No. However, we hope to leverage the NCAR Library's system to collect impact factor and other data in the future.
How do researchers know that they need to provide the information?
We currently conduct an annual survey of current and recent users of our production HPC environment. We are currently tightening our process for returning users and asking them for recently completed publications at the time they apply for a new allocation.
How do researchers provide the information? (the tool they use)
We use a web-based survey tool, Opinio, hosted at NCAR.
How do researchers describe the use of the infrastructures in the publication? (provide guidelines)
We provide guidance via a web page, https://www2.cisl.ucar.edu/docs/acknowledging-ncarcisl
The URL for this web page, as well as the requirement to acknowledge NCAR/CISL support, is included in their allocation award letters.
With our new procurement, we are trying several new approaches to see if it helps improve compliance and/or tracking of publications. First, we are offering users the option of *citing* their resource use, instead of including text in acknowledgments. Second, we have associated an "Archival Resource Key" or ARK with our primary resource and instructed users to include this unique string in their citations or acknowledgments. The citation option may increase compliance, particularly in cases where acknowledgments are not appropriate. Use of the ARK is designed to make web searches easier -- finding a unique string (like a DOI) is more robust than finding the many ways acknowledgments may be phrased.
Whether the use of the ARK has any benefit or effect at all -- that remains to be seen. Ask me again in a year or so.
What are the obligations for researchers? Where are they defined? How do the researchers accept them?
Our user responsibilities are defined at https://www2.cisl.ucar.edu/docs/responsibilities. However, there is no formal acceptance of these obligations.
When do researchers provide the data?
As noted we conduct a survey annually, and are now asking when they return for new project allocations. In the past, our letters asked for users to send articles upon publication, but this happened so infrequently (and we had no way to confirm or enforce this rule) that we stopped doing so.
Limitations/difficulties of the current process
Many. We have no way to verify the completeness of our collection. In addition, we have not typically verified that the publications acknowledge support for NCAR/CISL computing support.
Success stories of the current process
Difficult to say. We have been conducting the survey regularly for a number of years, and I believe (but cannot confirm) that we get a reasonably large fraction of our user's publications.
How many publications are already collected and since when
Several hundred per year, at least since FY2008.
Information is used to show the need and befit of the German grid infrastructure
Primarily, these publication counts are used as part of the NCAR/CISL annual reporting process and budget review process, both within NCAR and for presentation to the National Science Foundation (NSF), our primary funding sponsor.