Sound event synthesis using WaveNet
This is a demonstration of sound event synthesis (SES) using event labels based on the conditional WaveNet [1]. As the dataset, we used 10 different sound events (manual coffee grinder, cup clinking, alarm clock ringing, whistle, maracas, drum, electric shaver, trash box banging, tearing paper, bell ringing) contained in the RWCP-SSD (Real World Computing Partnership-Sound Scene Database) [2].
You can download a zip file of original and synthesized sounds from here.
・Manual coffee grinder
coffee_grinder_original.wav
Original sound
coffee_grinder_generated.wav
Synthesized sound
・Cup
cup_original.wav
Original sound
cup_generated.wav
Synthesized sound
・Clock
clock_original.wav
Original sound
clock_generated.wav
Synthesized sound
・Whistle
whistle_original.wav
Original sound
whistle_generated.wav
Synthesized sound
・Maracas
maracas_original.wav
Original sound
maracas_genarated.wav
Synthesized sound
・Drum
drum_original.wav
Original sound
drum_generated.wav
Synthesized sound
・Shaver
shaver_original.wav
Original sound
shaver_generated.wav
Synthesized sound
・Trash box
trashbox_original.wav
Original sound
trashbox_generated.wav
Synthesized sound
・Tearing paper
tearing_original.wav
Original sound
tearing_generated.wav
Synthesized sound
・Bell
bell_original.wav
Original sound
bell_generated.wav
Synthesized sound
[1] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “WaveNet: A Generative Model for Raw Audio,” arXiv preprint, arXiv:1609.03499, 2016.[2] S. Nakamura, K. Hiyane, F. Asano, and T. Endo, “Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-free Speech Recognition,” Proc. LanguageResources and Evaluation Conference (LREC), pp. 965–968, 2000.