Resources

Datasets

Vocal Imitation Set: Thousands of vocal imitations of hundreds of sounds from the AudioSet ontology

The Vocal Imitation Set is a collection of crowd-sourced vocal imitations of a large set of diverse sounds collected from Freesound (https://freesound.org/), which were curated based on Google’s AudioSet ontology (https://research.google.com/audioset/). We expect that this dataset will help research communities obtain a better understanding of human’s vocal imitation and build a machine understand the imitations as humans do.

[Download link]

For citations, please use this reference:

Bongjun Kim, Madhav Ghei, Bryan Pardo, and Zhiyao Duan, “Vocal Imitation Set: a dataset of vocally imitated sound events using the AudioSet ontology,” Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), Nov. 2018.

Fine-grained Vocal Imitation Set

This dataset includes 763 crowd-sourced vocal imitations of 108 sound events. The sound event recordings were taken from a subset of Vocal Imitation Set.

While the Vocal Imitation Set only contains vocal imitations of a single reference recording per class, this new dataset contains vocal imitations of multiple reference recordings per class.

[Download link]

Codes

I-SED: Interactive Sound Event Detector

A human-in-the-loop interface for sound event annotation.
https://github.com/bongjun/ised

Audio embedding model

https://github.com/bongjun/M-VGGish

DCASE Challenge 2019 submission

Task5: Urban sound tagging
https://github.com/bongjun/dcase2019-task5