Public Library of Science
8 files

PLOS Open Science Indicators

Version 7 2024-06-27, 15:08
Version 6 2024-03-28, 09:49
Version 5 2023-12-13, 16:43
Version 4 2023-09-29, 14:22
Version 3 2023-06-26, 10:26
Version 2 2023-04-03, 14:19
Version 1 2022-12-12, 12:23
posted on 2023-06-26, 10:26 authored by Public Library of Science

This dataset contains article metadata and information about Open Science Indicators for approximately 74,000 research articles published in PLOS from 1 January 2019 to 31 March 2023 and a set of approximately 8,000 comparator articles published in non-PLOS journals. This is the third release of this dataset, which will be updated with new versions as newly published content is analysed.

This version of the Open Science Indicators dataset focuses on detection of three Open Science practices by analysing the XML of published research articles:

  • Sharing of research data, in particular data shared in data repositories
  • Sharing of code
  • Posting of preprints

The dataset provides data and code generation and sharing rates, the location of shared data and code (whether in Supporting Information or in an online repository). It also provides preprint sharing rates as well as details of the shared preprint, such as publication date, URL and preprint server used. Additional data fields are also provided for each article analysed, such as geographic information (‘Country’) and research topics (‘Discipline’).

Further information on the methods used to collect and analyse the data can be found in OSI-Methods-Statement_v3_Jun23.pdf with accompanying information in OSI-Column-Descriptions_v2_Mar23.pdf and OSI-Repository-List_v1_Dec22.xlsx. Further information on the principles and requirements for developing Open Science Indicators is available in

The data files PLOS-Dataset_v3_Jun23.csv and Comparator-Dataset_v3_Jun23.csv contain 

  • descriptive metadata, e.g. article title, publication data, author countries, is taken from the article .xml files
  • additional information around the Open Science Indicators derived algorithmically, using Natural Language Processing.

The OSI-Summary-statistics_v3_Jun23.xlsx file contains the summary data for both PLOS-Dataset_v3_Jun23.csv and Comparator-Dataset_v3_Jun23.csv used in

The file OSI-Dimensions-FoR-data_v1_Jun23.xlsx contains data obtained on  3 May 2023, from Digital Science’s Dimensions platform, available at This data has been used to carry out the topic-based analysis presented in OSI-Summary-statistics_v3_Jun23.xlsx. The topics were matched to the articles in the PLOS-Dataset_v3_Jun23.csv and Comparator-Dataset_v3_Jun23.csv using the DOI. Topic-based analysis was conducted using the high-level topics denoted by a 2 digit prefix. Articles can have more than one topic assigned.

Contact details for further information:

Iain Hrynaszkiewicz, Director, Open Research Solutions, PLOS, /

Lauren Cadwallader, Open Research Manager, PLOS, /


Thanks to Allegra Pearce and Tim Vines of DataSeer for contributing to data acquisition and supporting information.


No external funding was received for this work.