Old Bailey Voices 1780-1880 (OBV) ============== Introduction ------------- The [Old Bailey Proceedings 1674-1913](http://www.oldbaileyonline.org) represent the largest body of direct recorded speech by non-elite people ever created. The [Old Bailey Corpus project](http://fedora.clarin-d.uni-saarland.de/oldbailey/) headed by Magnus Huber added detailed socio-linguistic tagging to a large sample of the Proceedings data. ***Old Bailey Voices (OBV)*** recombines the linguistic corpus with the Proceedings trials data. The dataset consists of a full text corpus and summary data for just over 21000 trials reported in the Proceedings between 1780 and 1880. The dataset was created for the [Voices of Authority](https://www.digitalpanopticon.org/Voices_in_the_Courtroom) research theme of the [Digital Panopticon](http://www.digitalpanopticon.org) project to enable researchers to associate individual defendants with their spoken words (or silences) in court and long-term outcomes, and to explore changing speech patterns in the courtroom. The dataset -------- **OBV** contains data from all **single defendant trials** (21023 defendants) in 227 sessions of the Old Bailey Proceedings between 1780 and 1880 which have had linguistic markup added by the Old Bailey Corpus project. 15860 of the 21023 trials contain tagged speech. For this project it was essential to correctly associate defendants with their spoken words (not a concern for the Old Bailey Corpus), as we intend to trace their histories and long-term outcomes using the Digital Panopticon's record linkage. The difficulty of ensuring this was done correctly in trials with multiple defendants led to the decision to restrict the dataset to single-defendant trials. The data has two components: * a new version of the tagged speech data from OBC with some additional tagging and OBO defendant IDs. * summary data for each defendant (obv_defendants_trials.tsv) containing biographical and trial information This is version 2.0 of the data (OBV2), January 2017. Summary trial data ------------------ **obv_defendants_trials.tsv** This contains data for *all* single defendants in OBC-tagged sessions, not just OBC-tagged trials. It includes information about whether there is OBC-tagged speech a) for the trial and b) for the defendant; word counts for the total text of a trial, all OBC-tagged speech, and for the defendant. For all trials it contains defendant name, gender and age and occupation (if tagged in OBO); offence, verdict and sentence; and the OBO defendant ID OBO offence, verdict and sentence data is simplified. In an OBO trial there may be multiple offences or outcomes per defendant; in order to flatten this, OBV data represents the most 'serious' offence and outcome (eg if sentenced to both imprisonment and a fine, it retains only imprisonment). ### data fields | Field label | Description | | ---------- | ------------ | | o2dtid | unique ID for data table (has no other meaning) | | obo_trial | OBO trial ID | | obo_deftid | OBO defendant ID | | sess_date | OBO session date (yyyymmdd) | | year | OBO session year | | trial_tagged | if the trial has tagged speech in OBC (1=yes, 0=no) | | def_spk | if there is tagged defendant speech (1=yes, 0=no) | | speech | if there is tagged defendant speech (untagged="no_speech") | | trial_u_count | Count of OBC "utterances" (tagged trials only) | | trial_speech_wc | Total wordcount for OBC-tagged speech | | trial_total_wc | Total wordcount for trial report in OBO (all trials) | | deft_u_count | Count of OBC utterances by defendant | | deft_total_wc | Total wordcount for OBC-tagged speech by defendant | | deft_u_q | Count of questions asked by defendant | | deft_u_a | Count of answers by defendant | | deft_u_d | Count of defence statements by defendant | | deft_u_s | Count of other statements by defendant | | deft_given | Defendant given name (as tagged in OBO) | | deft_surname | Defendant surname (OBO) | | deft_gender | Defendant gender (OBO) | | deft_age | Defendant age (OBO) (NULL if no tagged age) | | deft_occupation | Defendant occupation as tagged in OBO | | deft_offcat | Offence category as tagged in OBO | | deft_offsub | Offence sub-category as tagged in OBO | | deft_vercat | Verdict category as tagged in OBO | | deft_versub | Verdict sub-category as tagged in OBO | | deft_puncat | Sentence category as tagged in OBO | | deft_punsub | Sentence sub-category as tagged in OBO | Words data -------------- **obv_words_v2_28-01-2017.tsv.zip** The OBC XML data was converted to tabular format (in a MySQL database) for data preparation. One row of data = one OBC tagged utterance (<u> tags). Please note that there are 217,000 rows of data and I cannot give any guarantees that your favourite spreadsheet software can handle this amount of data very well. For the subset of single-defendant trials: * the original speaker roles assigned in OBC were checked and corrected/filled in where necessary, particularly focusing on accurately identifying defendants. * broad speech categories were added to each tagged utterance: question, answer, defendant's defence statement; other statement ### data fields | Field label | Description | | ----------------- | ------------------------ | | obv2wid | dataset unique ID | | sess_date | OBO session date | | year | OBO session year | | obo_trial | OBO trial ID | | obo_deftid | OBO defendant ID | | obc_u_no | OBC utterance number in trial | | obc_event | OBC event ID | | obc_speaker | OBC speaker ID | | obc_sex | OBC sex of speaker | | obc_hiscoLabel | OBC hisco data for speaker | | obc_hiscoCode | OBC hisco data for speaker | | obc_class | OBC hisco data for speaker | | obc_role | OBC speaker role | | obv_role | OBV speaker role | | words | text of words | | obv_words_type | OBV assigned words type | | words_count | OBC word count | | defendant | name of defendant in trial | OBV speaker roles: def=defendant; wv= witness or victim; lj = lawyer or judge; jur=juror OBV assigned words types: q=question; a=answer; d=prisoner's defence statement; s=other statement Proceedings creators data ------------------ Data extracted from OBC for scribes, publishers, printers and editors for each session, from the title page or front matter. ***obc2_producers.tsv*** | Field label | Description | | ---------- | ------------ | | year | OBO session year | | sess_date | OBO session date | | editor | name of editor | | printer | name of printer | | publisher | name of publisher | | scribe | name of scribe | Offence, verdict and sentence categories and subcategories -------------------------------------- This list is included for reference. For detailed information about all the categories please see the relevant background pages on crimes, verdicts and punishments at [Old Bailey Online](https://www.oldbaileyonline.org/static/Crime.jsp). ### Offences | offcat | offsubcat | | ------- | ---------- | | breakingPeace | assault | | breakingPeace | barratry | | breakingPeace | libel | | breakingPeace | other | | breakingPeace | riot | | breakingPeace | threateningBehaviour | | breakingPeace | vagabond | | breakingPeace | wounding | | damage | arson | | damage | other | | deception | bankrupcy | | deception | forgery | | deception | fraud | | deception | other | | deception | perjury | | kill | infanticide | | kill | manslaughter | | kill | murder | | kill | other | | kill | pettyTreason | | miscellaneous | concealingABirth | | miscellaneous | conspiracy | | miscellaneous | habitualCriminal | | miscellaneous | illegalAbortion | | miscellaneous | kidnapping | | miscellaneous | other | | miscellaneous | pervertingJustice | | miscellaneous | piracy | | miscellaneous | returnFromTransportation | | royalOffences | coiningOffences | | royalOffences | other | | royalOffences | religiousOffences | | royalOffences | seditiousLibel | | royalOffences | seditiousWords | | royalOffences | seducingAllegiance | | royalOffences | taxOffences | | royalOffences | treason | | sexual | assaultWithIntent | | sexual | assaultWithSodomiticalIntent | | sexual | bigamy | | sexual | indecentAssault | | sexual | keepingABrothel | | sexual | other | | sexual | rape | | sexual | sodomy | | theft | animalTheft | | theft | burglary | | theft | embezzlement | | theft | extortion | | theft | gameLawOffence | | theft | grandLarceny | | theft | housebreaking | | theft | mail | | theft | other | | theft | pettyLarceny | | theft | pocketpicking | | theft | receiving | | theft | shoplifting | | theft | simpleLarceny | | theft | stealingFromMaster | | theft | theftFromPlace | | violentTheft | highwayRobbery | | violentTheft | other | | violentTheft | robbery | ### Verdicts | vercat | versubcat | | ------- | ----------- | | guilty | | | guilty | chanceMedley | | guilty | insane | | guilty | lesserOffence | | guilty | manslaughter | | guilty | pleadedGuilty | | guilty | pleadedPartGuilty | | guilty | theftunder100s | | guilty | theftunder1s | | guilty | theftunder40s | | guilty | theftunder5s | | guilty | withRecommendation | | miscVerdict | | | miscVerdict | noAgreement | | miscVerdict | postponed | | miscVerdict | unfitToPlead | | notGuilty | | | notGuilty | accidentalDeath | | notGuilty | directed | | notGuilty | fault | | notGuilty | noEvidence | | notGuilty | nonComposMentis | | notGuilty | noProsecutor | | notGuilty | selfDefence | | notGuilty notGuilty | noEvidence | | specialVerdict | | ### Sentences | puncat | punsubcat | | -------- | ------------ | | corporal | | | corporal | pillory | | corporal | privateWhipping | | corporal | publicWhipping | | corporal | whipping | | death | | | death | burning | | death | deathAndDissection | | death | drawnAndQuartered | | death | executed | | death | hangingInChains | | death | respited | | death | respitedForPregnancy | | imprison | | | imprison | hardLabour | | imprison | houseOfCorrection | | imprison | insanity | | imprison | newgate | | imprison | otherInstitution | | imprison | penalServitude | | imprison | preventiveDetention | | miscPunish | | | miscPunish | branding | | miscPunish | fine | | miscPunish | forfeiture | | miscPunish | militaryNavalDuty | | miscPunish | sureties | | noPunish | | | noPunish | pardon | | noPunish | sentenceRespited | | transport | | Licence ----- This dataset is released under a Creative Commons Attribution-ShareAlike 4.0 International Licence [![License: CC BY-SA 4.0](https://licensebuttons.net/l/by-sa/4.0/80x15.png)](http://creativecommons.org/licenses/by-sa/4.0/)