File(s) not publicly available

ShefCE: A Cantonese-English bilingual speech corpus

online resource
posted on 10.03.2017, 14:06 by Wai Man Ng, Alvin C.M. Kwan, Tan Lee, Thomas Hain

ShefCE is a Cantonese English bilingual parallel speech corpus recorded by L2 English learners in Hong Kong. 31 undergraduate to postgraduate students in Hong Kong aged 20-30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Details can be found in [1].

The corpus is available free of charge for academic research, teaching and non-commercial use. A data request form has to be signed and submitted to the University of Sheffield to use the data. Please find the details and the data request form at, and cite [1] when using the data.

[1] Raymond W. M. Ng, Alvin C.M. Kwan, Tan Lee and Thomas Hain, "ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment",  in Proc. The 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.


IIKE Fund@Sheffield, Google



The project has ethical approval and have included the number in the description field


The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

The data requires access restrictions, explained in the description field, files are not attached

Data description

  • The file formats are open or commonly used

Methodology, headings and units

  • Headings and units are explained in the files