Keisuke Imoto :|| - Sound event synthesis using WaveNet

Sound event synthesis using WaveNet

This is a demonstration of sound event synthesis (SES) using event labels based on the conditional WaveNet [1]. As the dataset, we used 10 different sound events (manual coffee grinder, cup clinking, alarm clock ringing, whistle, maracas, drum, electric shaver, trash box banging, tearing paper, bell ringing) contained in the RWCP-SSD (Real World Computing Partnership-Sound Scene Database) [2].

You can download a zip file of original and synthesized sounds from here.

・Manual coffee grinder

coffee_grinder_original.wav

Original sound

coffee_grinder_generated.wav

Synthesized sound

・Cup

cup_original.wav

Original sound

cup_generated.wav

Synthesized sound

・Clock

clock_original.wav

Original sound

clock_generated.wav

Synthesized sound

・Whistle

whistle_original.wav

Original sound

whistle_generated.wav

Synthesized sound

・Maracas

maracas_original.wav

Original sound

maracas_genarated.wav

Synthesized sound

・Drum

drum_original.wav

Original sound

drum_generated.wav

Synthesized sound

・Shaver

shaver_original.wav

Original sound

shaver_generated.wav

Synthesized sound

・Trash box

trashbox_original.wav

Original sound

trashbox_generated.wav

Synthesized sound

・Tearing paper

tearing_original.wav

Original sound

tearing_generated.wav

Synthesized sound

・Bell

bell_original.wav

Original sound

bell_generated.wav

Synthesized sound

[1] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “WaveNet: A Generative Model for Raw Audio,” arXiv preprint, arXiv:1609.03499, 2016.[2] S. Nakamura, K. Hiyane, F. Asano, and T. Endo, “Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-free Speech Recognition,” Proc. LanguageResources and Evaluation Conference (LREC), pp. 965–968, 2000.

Google Sites

Report abuse