Exploring open access coverage of Wikipedia cited research across the White Rose Universities. Altmetric.com and Unpaywall datasets
datasetposted on 08.04.2020 by Andrew Tattersall, Kate O'Neill, Christopher Carroll, Nick Sheppard, Thom Blake
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
These data were collected as part of a project by Andy Tattersall, Kate O'Neill, Chris Carroll (University of Sheffield) Nick Sheppard (University of Leeds) Thom Blake (University of York) which will be published sometime in 2020. The draft title of the paper is: Exploring open access coverage of Wikipedia cited research across the White Rose Universities.
A data request to Altmetric.com was submitted on the 16th April 2019 for entries that included authors from any of the three White Rose universities that are cited at least once in a Wikipedia entry. Data were tabulated and descriptive statistics were produced. The implications of the data are then discussed. We looked at just the White Rose universities of Leeds, Sheffield and York as they have their own shared Open Access repository in addition to their long history of collaboration, research focus; in addition that they are all members of the Russell Group of universities.
The Altmetric.com data was tabulated with discipline data extracted from university systems. Wikipedia page entries and embedded citations were collected by Altmetric.com using unique identifiers within the research such as a DOI, PubMed ID or ISBN, this also included the data the research was cited within a Wikipedia entry. They also collected further bibliographic data that included publication title and date. Data collection also included the individual Altmetric.com page corresponding to each Wikipedia citation.
We explored the number of Wikipedia citations by discipline for each of the three institutions. We note that the data that Altmetric.com supplies is only as good as the institutional and bibliometric journal that it harvests. Therefore as a consequence we found that certain fields were incomplete and we anticipate that based on a previous study by (Tattersall and Carroll 2018) that a percentage of the data in relation to institutional affiliation and date of publication to be inaccurate. (Tattersall and Carroll 2018) found by looking at citations within policy documents using Altmetric.com a sample of their data that as much as one third of data could be erroneous.
DOIs of all papers that included a Wikipedia citation were subsequently run against the Unpaywall API. Unpaywall is a not for profit service that maintains a database of links to full-text articles harvested from a range of open-access sources. Unpaywall's Simple Query Tool enabled us to submit a large number of DOIs which returned a set of results that we placed into spreadsheet that comprises of information on the open access status including ‘best_oa_url’ and ‘best_oa_licence’. For articles published under the gold model these will typically be the resolvable DOI under a Creative Commons licence whereas for accepted manuscripts from institutional repositories it tended to be the repository URL under a more restrictive licence, often no specific licence. For the purposes of this study, the primary field of interest is designated as ‘is_oa’ which enables us to ascertain the proportion of articles that are available open access (is_oa = TRUE) compared to those that are not (is_oa = FALSE). It is important to note also that any repository record that was under embargo at the time of data collection was returned is_oa = FALSE. Whether the OA version is gold (under a Creative Commons licence) or green (with a more restrictive or no specified licence) is also significant, as Wikipedia citations to gold articles would necessarily be open access with no further intervention, whereas Wikipedia citations to articles made open access from a repository will only be accessible directly from that citation if it includes the appropriate ‘best_oa_url’ which may need to be added manually.
Tattersall, A., Carroll, C. (2018) What can Altmetric.com tell us about policy citations or research? An analysis of Altmetric.com data for research articles from the University of Sheffield. Frontiers in Research Metrics and Analysis, 2, 9. https://doi.org/10.3389/frma.2017.00009
EthicsThere is no personal data or any that requires ethical approval
PolicyThe data complies with the institution and funders' policies on access and sharing
Sharing and access restrictionsThe data can be shared openly
- The file formats are open or commonly used
Methodology, headings and units
- Headings and units are explained in the files