File(s) not publicly available
ShefCE: A Cantonese-English bilingual speech corpus
ShefCE is a Cantonese English bilingual parallel speech corpus recorded by L2 English learners in Hong Kong. 31 undergraduate to postgraduate students in Hong Kong aged 20-30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Details can be found in [1].
The corpus is available free of charge for academic research, teaching and non-commercial use. A data request form has to be signed and submitted to the University of Sheffield to use the data. Please find the details and the data request form at http://mini.dcs.shef.ac.uk/resources/shefce, and cite [1] when using the data.
[1] Raymond W. M. Ng, Alvin C.M. Kwan, Tan Lee and Thomas Hain, "ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment", in Proc. The 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
Funding
IIKE Fund@Sheffield, Google
History
Ethics
The project has ethical approval and have included the number in the description fieldPolicy
The data complies with the institution and funders' policies on access and sharingSharing and access restrictions
The data requires access restrictions, explained in the description field, files are not attachedData description
- The file formats are open or commonly used
Methodology, headings and units
- Headings and units are explained in the files