What is RWCP-SSD-Onomatopoeia?
RWCP-SSD-Onomatopoeia is the onomatopoeic word dataset including 155,568 onomatopoeic words for 105 kinds of environmental sounds (e.g., shaver sound, whistle sound) included in RWCP-SSD (Real World Computing Partnership-Sound Scene Database) . The RWCP-SSD-Onomatopoeia also contains self-reported confidence scores and others-reported acceptance scores to the onomatopoeic word, which can be used to evaluate the appropriateness of onomatopoeic words. This dataset is designed and collected for researches on environmental sound synthesis using onomatopoetic words and environmental sound conversion into onomatopoeic words. For more information, refer to the paper  and github page.
The RWCP-SSD-Onomatopoeia dataset consists of the following contents:
・Onomatopoeic words for environmental sounds
We collected a total of 155,568 onomatopoeic words. Each onomatopoeic word was collected from Japanese speakers in katakana, which is a Japanese syllabary, and was converted to the phoneme representation.
・Self-reported confidence scores
We collected 155,568 confidence scores for onomatopoeic words workers themselves transcribed. The self-reported confidence score enables us to evaluate the appropriateness of onomatopoeic words on the basis of the judgement of the person giving the onomatopoeic words.
・Others-reported acceptance scores
We collected 548,367 acceptance scores for onomatopoeic words transcribed by others. The others-reported acceptance score was collected from more than five workers for onomatopoeic words with 4 or high confidence scores. The others-reported acceptance score enables us to evaluate the appropriateness of onomatopoeic words on the judgement of others.
The dataset includes anonymized IDs of workers who gave onomatopoeic words, confidence scores, and acceptance scores.
The RWCP-SSD-Onomatopoeia may be used for
Research by academic institutions
Non-commercial research, including research conducted within commercial organisations
If you want to use for commercial purposes, please contact us. Re-distribution is not permitted, but you can use a part of this dataset (e.g., 〜10 onomatopoeic words) in your website or blog post. Please cite this paper when you use this dataset in your research paper, blog post, and preprint.
Keisuke Imoto (Doshisha University)
E-mail: keisuke.imoto (at) ieee.org