The Romanian speech synthesis (RSS) corpus was recorded in a hemianechoic chamber (anechoic walls and ceiling; floor partially anechoic) at the University of Edinburgh. We used three high quality studio microphones: a Neumann u89i (large diaphragm condenser), a Sennheiser MKH 800 (small diaphragm condenser with very wide bandwidth) and a DPA 4035 (headset-mounted condenser). Although the current release includes only speech data recorded via Sennheiser MKH 800, we may release speech data recorded via other microphones in the future. All recordings were made at 96 kHz sampling frequency and 24 bits per sample, then downsampled to 48 kHz sampling frequency. For recording, downsampling and bit rate conversion, we used ProTools HD hardware and software. We conducted 8 sessions over the course of a month, recording about 500 sentences in each session. At the start of each session, the speaker listened to a previously recorded sample, in order to attain a similar voice quality and intonation.
Thank you for accessing our database! To get a copy of the DB please download the .tgz files linked below
Please note that due to the high quality of the wav files it may take some time to download (~1.2GB - for RomanianDB.tgz).
Elena and Georgiana are two new additions to the database. Elena has approximately 1500 random newspaper utterances, which represent the rnd1, rnd2 and rnd3 subsets of RSS. And Georgiana contains the first 205 utterances of the rnd1 subset of RSS.
If you are using RSS in your work, please cite the following paper:
Adriana Stan, Junichi YAMAGISHI, Simon KING, Matthew AYLETT, The Romanian Speech Synthesis (RSS) corpus: building a high quality HMM-based speech synthesis system using a high sampling rate, Speech Communication vol 53, pg. 442-450, 2011, 2011, doi: 10.1016/j.specom.2010.12.002 pdf | bib