%0 Generic %A Ng, Wai Man %A Kwan, Alvin C.M. %A Lee, Tan %A Hain, Thomas %D 2017 %T ShefCE: A Cantonese-English bilingual speech corpus %U https://orda.shef.ac.uk/articles/dataset/ShefCE_A_Cantonese-English_bilingual_speech_corpus/4522907 %R 10.15131/shef.data.4522907.v1 %K Cantonese %K bilingualism %K English data sets %K Pronunciation changes %K Language learning %K Natural Language Processing %K Chinese Languages %K English Language %K English as a Second Language %X

ShefCE is a Cantonese English bilingual parallel speech corpus recorded by L2 English learners in Hong Kong. 31 undergraduate to postgraduate students in Hong Kong aged 20-30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Details can be found in [1].

The corpus is available free of charge for academic research, teaching and non-commercial use. A data request form has to be signed and submitted to the University of Sheffield to use the data. Please find the details and the data request form at http://mini.dcs.shef.ac.uk/resources/shefce, and cite [1] when using the data.

[1] Raymond W. M. Ng, Alvin C.M. Kwan, Tan Lee and Thomas Hain, "ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment",  in Proc. The 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.


%I The University of Sheffield