Additional Sound Event Labels of TUT Acoustic Scenes 2016 & 2017

What are TUT Acoustic Scenes 2016 & 2017 datasets?

TUT Acoustic Scenes 2016 & 2017 are environmental sound datasets for acoustic scene classification (ASC), which were originally recorded by Tampere University, Audio Research Group. Parts of sound clips in this dataset have strong event labels (e.g. TUT Sound Events 2016, TUT Sound Events 2017); however, many sound clips have no strong labels. We have annotated these sound clips using the same protocol as in [1] and [2].

File format

The file format of annotation files are the same as that of TUT Sound Events 2016 and TUT Sound Events 2017

Dataset for joint analysis of sound events and acoustic scenes

We have also constructed the dataset for joint analysis of sound events and acoustic scenes, which consists of the TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016/2017. This dataset contains 266 min. of sounds (192 min. for training, 74 min. for evaluation) including 4 acoustic scenes (City center, Home, Office, Residential area) and 25 sound events. This dataset is used in [3][4][5].

Download metadata

Metadata of the dataset for joint analysis of sound events and acoustic scenes (Development set and evaluation set)

Note that sound files of the TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016/2017 are not included in this metadata. Please download them from the DCASE Challenge Web page (or directly Zenodo).

More detailed information can be found here.

[1] A. Mesaros, T. Heittola, and T. Virtanen, “TUT Database for Acoustic Scene Classification and Sound Event Detection," Proc. European Signal Processing Conference (EUSIPCO), pp. 1128-1132.[2] A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, B. Raj, and T. Virtanen, “DCASE 2017 challenge setup: Tasks, datasets and baseline system,” Proc. Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, pp. 85-92, 2017. [3] Noriyuki Tonami, Keisuke Imoto, Masahiro Niitsuma, Ryosuke Yamanishi, and Yoichi Yamashita, ``Joint Analysis of Acoustic Events and Scenes Based on Multitask Learning,'' Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 333-337, 2019.[4] Keisuke Imoto, Noriyuki Tonami, Yuma Koizumi, Masahiro Yasuda, Ryosuke Yamanishi, and Yoichi Yamashita, ``Sound Event Detection by Multitask Learning of Sound Events and Scenes with Soft Scene Labels,'' Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 621-625, 2020.[5] Keisuke Imoto, Sakiko Mishima, Yumi Arai, and Reishi Kondo, ``Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance'' Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. xxxx-xxxx, 2021.