RWCP-SSD-Onomatopoeia
What is RWCP-SSD-Onomatopoeia?
RWCP-SSD-Onomatopoeia is the onomatopoeic word dataset including 155,568 onomatopoeic words for 105 kinds of environmental sounds (e.g., shaver sound, whistle sound) included in RWCP-SSD (Real World Computing Partnership-Sound Scene Database) [1]. The RWCP-SSD-Onomatopoeia also contains self-reported confidence scores and others-reported acceptance scores to the onomatopoeic word, which can be used to evaluate the appropriateness of onomatopoeic words. This dataset is designed and collected for researches on environmental sound synthesis using onomatopoetic words and environmental sound conversion into onomatopoeic words. For more information, refer to the paper [2] and github page.
Contents
The RWCP-SSD-Onomatopoeia dataset consists of the following contents:
・Onomatopoeic words for environmental sounds
We collected a total of 155,568 onomatopoeic words. Each onomatopoeic word was collected from Japanese speakers in katakana, which is a Japanese syllabary, and was converted to the phoneme representation.
・Self-reported confidence scores
We collected 155,568 confidence scores for onomatopoeic words workers themselves transcribed. The self-reported confidence score enables us to evaluate the appropriateness of onomatopoeic words on the basis of the judgement of the person giving the onomatopoeic words.
・Others-reported acceptance scores
We collected 548,367 acceptance scores for onomatopoeic words transcribed by others. The others-reported acceptance score was collected from more than five workers for onomatopoeic words with 4 or high confidence scores. The others-reported acceptance score enables us to evaluate the appropriateness of onomatopoeic words on the judgement of others.
・WorkerID
The dataset includes anonymized IDs of workers who gave onomatopoeic words, confidence scores, and acceptance scores.
Terms of use
The RWCP-SSD-Onomatopoeia may be used for
Research by academic institutions
Non-commercial research, including research conducted within commercial organisations
Personal use.
If you want to use for commercial purposes, please contact us. Re-distribution is not permitted, but you can use a part of this dataset (e.g., 〜10 onomatopoeic words) in your website or blog post. Please cite this paper when you use this dataset in your research paper, blog post, and preprint.
Download
The dataset can be downloaded at github.
Note that RWCP-SSD-Onomatopoeia does not contain sound files, which can be obtained from NII Speech Resources Consortium (NII-SRC). If you need any assistance, please do not hesitate to contact us.
Contact
Keisuke Imoto (Doshisha University)
E-mail: keisuke.imoto (at) ieee.org