Deep Unsupervised Drum Transcription

2020-12-09

Deep learning-based models help to improve transcription systems. In this task, the score is estimated from the input audio. However, most of the current systems rely on supervised learning and require a large-scale annotated dataset.

A recent paper on arXiv.org suggests an unsupervised drum transcription system. It can test the estimation, measure the error, and correct itself, similarly to musicians learning to transcribe.

Drums. Image credit: floriansteffen via pxhere,com, CC0 Public Domain

During the experiments, the system achieved strong performance compared to current supervised and unsupervised approaches. It can be generalized to different datasets while maintaining high performance if the distribution of tracks by style is warranted. Thus, it can be used for real-life drum transcription tasks. Also, the system could be extended to other instruments and combined with instrument recognition.

We introduce DrummerNet, a drum transcription system that is trained in an unsupervised manner. DrummerNet does not require any ground-truth transcription and, with the data-scalability of deep neural networks, learns from a large unlabeled dataset. In DrummerNet, the target drum signal is first passed to a (trainable) transcriber, then reconstructed in a (fixed) synthesizer according to the transcription estimate. By training the system to minimize the distance between the input and the output audio signals, the transcriber learns to transcribe without ground truth transcription. Our experiment shows that DrummerNet performs favorably compared to many other recent drum transcription systems, both supervised and unsupervised.

