Sound event synthesis using WaveNet
This is a demonstration of sound event synthesis (SES) using event labels based on the conditional WaveNet [1]. As the dataset, we used 10 different sound events (manual coffee grinder, cup clinking, alarm clock ringing, whistle, maracas, drum, electric shaver, trash box banging, tearing paper, bell ringing) contained in the RWCP-SSD (Real World Computing Partnership-Sound Scene Database) [2].
You can download a zip file of original and synthesized sounds from here.
・Manual coffee grinder

Original sound

Synthesized sound
・Cup

Original sound

Synthesized sound
・Clock

Original sound

Synthesized sound
・Whistle

Original sound

Synthesized sound
・Maracas

Original sound

Synthesized sound
・Drum

Original sound

Synthesized sound
・Shaver

Original sound

Synthesized sound
・Trash box

Original sound

Synthesized sound
・Tearing paper

Original sound

Synthesized sound
・Bell

Original sound

Synthesized sound
[1] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “WaveNet: A Generative Model for Raw Audio,” arXiv preprint, arXiv:1609.03499, 2016.[2] S. Nakamura, K. Hiyane, F. Asano, and T. Endo, “Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-free Speech Recognition,” Proc. LanguageResources and Evaluation Conference (LREC), pp. 965–968, 2000.