File(s) not publicly available

Reason: The data is an open data for academic research, teaching and non-commercial use. User will need to register and sign a license agreement with the University of Sheffield

ShefCE: A Cantonese-English bilingual speech corpus

online resource
posted on 10.03.2017 by Wai Man Ng, Alvin C.M. Kwan, Tan Lee, Thomas Hain

ShefCE is a Cantonese English bilingual parallel speech corpus recorded by L2 English learners in Hong Kong. 31 undergraduate to postgraduate students in Hong Kong aged 20-30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Details can be found in [1].

The corpus is available free of charge for academic research, teaching and non-commercial use. A data request form has to be signed and submitted to the University of Sheffield to use the data. Please find the details and the data request form at, and cite [1] when using the data.

[1] Raymond W. M. Ng, Alvin C.M. Kwan, Tan Lee and Thomas Hain, "ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment",  in Proc. The 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.


IIKE Fund@Sheffield, Google