File(s) not publicly available
ShefCE: A Cantonese-English bilingual speech corpus
ShefCE is a Cantonese English bilingual parallel speech corpus recorded by L2 English learners in Hong Kong. 31 undergraduate to postgraduate students in Hong Kong aged 20-30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Details can be found in .
The corpus is available free of charge for academic research, teaching and non-commercial use. A data request form has to be signed and submitted to the University of Sheffield to use the data. Please find the details and the data request form at http://mini.dcs.shef.ac.uk/resources/shefce, and cite  when using the data.
 Raymond W. M. Ng, Alvin C.M. Kwan, Tan Lee and Thomas Hain, "ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment", in Proc. The 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
Read the peer-reviewed publication
IIKE Fund@Sheffield, Google
EthicsThe project has ethical approval and have included the number in the description field
PolicyThe data complies with the institution and funders' policies on access and sharing
Sharing and access restrictionsThe data requires access restrictions, explained in the description field, files are not attached
- The file formats are open or commonly used
Methodology, headings and units
- Headings and units are explained in the files