Public Library of Science
2 files

PLOS Open Science Indicators

Version 7 2024-06-27, 15:08
Version 6 2024-03-28, 09:49
Version 5 2023-12-13, 16:43
Version 4 2023-09-29, 14:22
Version 3 2023-06-26, 10:26
Version 2 2023-04-03, 14:19
Version 1 2022-12-12, 12:23
posted on 2024-06-27, 15:08 authored by Public Library of Science

This dataset contains article metadata and information about Open Science Indicators for approximately 112,000 research articles published in PLOS from 1 January 2018 to 31 March 2024 and a set of approximately 23,000 comparator articles published in non-PLOS journals. This is the seventh release of this dataset, which will be updated with new versions as newly published content is analysed.

This version of the Open Science Indicators dataset comprises of 3 components. The first, also included in earlier versions of the dataset, focuses on detection of three Open Science practices by analysing the XML of published research articles:

  • Sharing of research data, in particular data shared in data repositories
  • Sharing of code
  • Posting of preprints

The dataset provides data and code generation and sharing rates, the location of shared data and code (whether in Supporting Information or in an online repository). It also provides preprint sharing rates as well as details of the shared preprint, such as publication date, URL and preprint server used. Additional data fields are also provided for each article analysed, such as geographic information (‘Country’) and research topics (‘Discipline’).

The second component, first shared in version 4, contains a fourth Open Science Indicator - detection of protocol sharing. This is presented as a preliminary version of the data. The protocols dataset contains information on whether protocols sharing from the article has been detected and the sources of those protocols (i.e. where the protocol was shared).

The third component - and new to this version of the dataset - contains the fifth Open Science Indicator - detection of Study Registration sharing. The Study Registration dataset contains information on whether the article reports a study registration, the registry it was shared in and where in the article the mention of the registration was detected.

Further information on the methods used to collect and analyse the data can be found in Main Documentation folder for the main OSI dataset, the Preliminary Release for Protocols Indicator folder for protocols or the Preliminary Release for Study Registration Indicator folder for study registrations.

Further information on the principles and requirements for developing Open Science Indicators is available in

Data folders/files

Main Data Files folder

This folder contains the main OSI dataset files PLOS-Dataset_v7_Jun24.csv and Comparator-Dataset_v7_Jun24.csv, which contain 

  • descriptive metadata, e.g. article title, publication data, author countries, is taken from the article .xml files
  • additional information around the Open Science Indicators derived algorithmically, using Natural Language Processing

and the OSI-Summary-statistics_v7_Jun24.xlsx file contains the summary data for both PLOS-Dataset_v7_Jun24.csv and Comparator-Dataset_v7_Jun24.csv.

Main Documentation folder

This file contains documentation related to the main data files. The file OSI-Methods-Statement_v7_Jun24.pdf describes the methods underlying the data collection and analysis. OSI-Column-Descriptions_v3_Dec23.pdf describes the fields used in PLOS-Dataset_v7_Jun24.csv and Comparator-Dataset_v7_Jun24.csv. OSI-Repository-List_v1_Dec22.xlsx lists the repositories and their characteristics used to identify specific repositories in the PLOS-Dataset_v7_Jun24.csv and Comparator-Dataset_v7_Jun24.csv repository fields.

Preliminary Release for Protocols Indicator folder

This folder contains files related to the new Indicator on protocol sharing. The file Protocols-Dataset_Sep23.csv contains data on protocol sharing pertaining to the PLOS and Comparator corpus of articles. The methods for developing this indicator are described in Protocols-Methods-Statement_Sep23.pdf. The Protocols-Column-Headings_Sep23.pdf file described the column headings used in Protocols-Dataset_Sep23.csv. A summary of the protocols dataset is given in Protocols-Summary-Statistics_Sep23.xlsx, which is used within the related blog post

Preliminary Release for Study Registration Indicator folder

This folder contains the files related to the fifth indicator on study registration. The file Study-Registration_Dataset_Jun24.csv contains the data on study registrations for both PLOS and Comparator articles. The methods for developing this indicator are described in Study-Registration-Methods-Statement_Jun24.pdf and the fields used in the dataset are described in Registration-Column-Headings_Jun24.pdf. A summary of the results of study registration are given in Study-Registration-Summary-Statistics_Jun24.xlsx.

Contact details for further information:
Iain Hrynaszkiewicz, Director, Open Research Solutions, PLOS, /
Lauren Cadwallader, Open Research Manager, PLOS, /


Thanks to Allegra Pearce, Tim Vines, Asura Enkhbayar and Scott Kerr of DataSeer for contributing to data acquisition and supporting information.


No external funding was received for this work.