SNuC Corpus The SNuC corpus contains: raw_recordings / This folder contains the original raw recordings (44.1 kHz, stereo) as they were recorded. There are 52 folders, one per participant. Each containing one .wav file per number read (e.g. 1035/1035_rec149.wav) and a metadata file (metadata.json) that includes the demographic information and file details (prompt, recording time, etc.). preprocessed_recordings/ The raw stereo files were downsampled to 16kHz and converted to mono files. There are 52 folders, one per participant. Each containing one .wav file per number read (e.g. 1035/1035_AI73926.wav). transcribed_recordings / This is the core set of recordings that we transcribed. There are 51 folders, one per participant. Each folder contains one .wav file per number read (e.g., 1035/1035_AI73926.wav). all_recordings.csv This CSV file contains information about each recording in the corpus: id, participantId, age, gender, accent_region, number_type, prompt, raw_audio, preprocessed_audio, transcribed_audio transcribed_recordings.csv This CSV file contains information about the transcribed recording in the corpus: id, participantId, age, gender, accent_region, number_type, prompt, transcribed_audio, transcription If you use the SNuC corpus in your work, please refer our LREC 2022 paper: Emma Barker, Jon Barker, Robert Gaizauskas, Ning Ma, Monica Lestari Paramita (2022). SNuC: The Sheffield Numbers Spoken Language Corpus. Proceedings of the 13th Edition of the Language Resources and Evaluation Conference (LREC2022). Marseille.