Which Politicians Receive Abuse?
datasetposted on 20.05.2020 by Genevieve Gorrell, Mehmet Bakir, Ian Roberts, Mark Greenwood, Kalina Bontcheva
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The spreadsheets contain aggregate statistics for abusive language found in tweets to UK politicians in 2019. An overview spreadsheet is provided for each of the months of January to November ("per-mp-xxx-2019.csv" where xxx is the abbreviation for the month), with one row per MP, and a spreadsheet with data per day is provided for the campaign period of the UK 2019 general election, with one row per candidate, starting at the beginning of November and finishing on December 15th, a few days after the election ("campaign-period-per-cand-per-day.csv"). These spreadsheets list, for each individual, gender, party, the start and end times of the counts, tweets authored, retweets *by* the individual, replies by the individual, the number of times the individual was retweeted, replies received by the individual ("replyTo"), abusive tweets received in total and abusive tweets received in each of the categories sexist, racist and political.
Two additional spreadsheets focus on topics; "topics-of-cands.csv" and "topics-of-replies.csv". In the first, counts of tweets mentioning each of a set of topics are given, alongside counts of abusive tweets mentioning each topic, in tweets *by* each candidate. In the second, the counts are of replies received when a candidate mentions a topic, alongside abusive replies received when they mentioned that topic.
The data complement the forthcoming paper "Which Politicians Receive Abuse? Four Factors Illuminated in the UK General Election 2019", by Genevieve Gorrell, Mehmet E Bakir, Ian Roberts, Mark A Greenwood and Kalina Bontcheva. The way the data were acquired is described more fully in the paper.
Ethics approval was granted to collect the data through application 25371 at the University of Sheffield.
ESRC Grant number ES/T012714/1 "Responsible AI for Inclusive, Democratic Societies: A cross-disciplinary approach to detecting and countering abusive language online"
EthicsThe project has ethical approval and have included the number in the description field
PolicyThe data complies with the institution and funders' policies on access and sharing
Sharing and access restrictionsThe data can be shared openly
- The file formats are open or commonly used
Methodology, headings and units
- There is a readme.txt file describing the methodology, headings and units