Deep Learning Approaches for Automatic Drum Transcription
Abstract
Drum transcription is the task of transcribing audio or music into drum notation. Drum notation is helpful to help drummers as instruction in playing drums and could also be useful for students to learn about drum music theories. Unfortunately, transcribing music is not an easy task. A good transcription can usually be obtained only by an experienced musician. On the other side, musical notation is beneficial not only for professionals but also for amateurs. This study develops an Automatic Drum Transcription (ADT) application using the segment and classify method with Deep Learning as the classification method. The segment and classify method is divided into two steps. First, the segmentation step achieved a score of 76.14% in macro F1 after doing a grid search to tune the parameters. Second, the spectrogram feature is extracted on the detected onsets as the input for the classification models. The models are evaluated using the multi-objective optimization (MOO) of macro F1 score and time consumption for prediction. The result shows that the LSTM model outperformed the other models with MOO scores of 77.42%, 86.97%, and 82.87% on MDB Drums, IDMT-SMT Drums, and combined datasets, respectively. The model is then used in the ADT application. The application is built using the FastAPI framework, which delivers the transcription result as a drum tab.
Downloads
References
Ian D., B. musical notation | Description, Systems, & Note Symbols | Britannica.com. https://www.britannica.com/art/musical-notation (1998).
Strayer, H. From Neumes to Notes: The Evolution of Music Notation. Musical Offerings 4, 1–14 (2013). DOI: https://doi.org/10.15385/jmo.2013.4.1.1
Hainsworth, S. W. & Macleod, M. D. The Automated Music Transcription Problem. 1–23 (2003).
Wu, C. W. et al. A Review of Automatic Drum Transcription. IEEE/ACM Transactions on Audio Speech and Language Processing vol. 26 1457–1483 Preprint at https://doi.org/10.1109/TASLP.2018.2830113 (2018). DOI: https://doi.org/10.1109/TASLP.2018.2830113
Vogl, R. Deep Learning Methods for Drum Transcription and Drum Pattern Generation. (2018).
Miron, M., Davies, M. E. P. & Gouyon, F. An open-source drum transcription system for Pure Data and Max MSP. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 221–225 (2013). doi:10.1109/ICASSP.2013.6637641. DOI: https://doi.org/10.1109/ICASSP.2013.6637641
Gillet, O. & Richard, G. Automatic transcription of drum loops. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings vol. 4 (2004).
Blaszke, M. & Kostek, B. Musical Instrument Identification Using Deep Learning Approach. Sensors 22, 3033 (2022). DOI: https://doi.org/10.3390/s22083033
Haidar-Ahmad, L. Music and Instrument Classification using Deep Learning Technics. (2018).
Benetos, E., Dixon, S., Duan, Z. & Ewert, S. Automatic Music Transcription: An Overview. IEEE Signal Processing Magazine vol. 36 20–30 Preprint at https://doi.org/10.1109/MSP.2018.2869928 (2019). DOI: https://doi.org/10.1109/MSP.2018.2869928
Klapuri, A. Introduction to music transcription. Signal Processing Methods for Music Transcription 3–20 Preprint at https://doi.org/10.1007/0-387-32845-9_1 (2006). DOI: https://doi.org/10.1007/0-387-32845-9_1
Dittmar, C. & Gärtner, D. Real-time transcription and separation of drum recordings based on NMF decomposition. in DAFx 2014 - Proceedings of the 17th International Conference on Digital Audio Effects 8 (2014).
Southall, C., Stables, R. & Hockman, J. Player vs transcriber: A game approach to data manipulation for automatic drum transcription. in Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018 58–65 (2018).
Southall, C., Stables, R. & Hockman, J. Automatic drum transcription using bi-directional recurrent neural networks. in Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016 591–597 (2016).
Vogl, R., Dorfer, M. & Knees, P. Recurrent Neural Networks for Drum Transcription. Proc. of the International Society for Music Information Retrieval Conference (ISMIR) 730–736 (2016).
Vogl, R., Dorfer, M., Widmer, G. & Knees, P. Drum transcription via joint beat and drum modeling using convolutional recurrent neural networks. in Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017 150–157 (2017).
Bello, J. P. et al. A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing 13, 1035–1046 (2005). DOI: https://doi.org/10.1109/TSA.2005.851998
Yao, Y. et al. Complexity vs. Performance: Empirical analysis of machine learning as a service. in Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC vol. Part F1319 384–397 (Association for Computing Machinery, 2017). DOI: https://doi.org/10.1145/3131365.3131372
Böck, S. & Widmer, G. Maximum filter vibrato suppression for onset detection. in DAFx 2013 - 16th International Conference on Digital Audio Effects (2013).
Kehtarnavaz, N. Frequency Domain Processing. in Digital Signal Processing System Design 175–196 (Academic Press, 2008). doi:10.1016/b978-0-12-374490-6.00007-6. DOI: https://doi.org/10.1016/B978-0-12-374490-6.00007-6
O’Shaughnessy, D. & Deng, L. Speech Processing: A Dynamic and Optimization-Oriented Approach. (2003).
Santurkar, S., Tsipras, D., Ilyas, A. & Madry, A. How does batch normalization help optimization? in Advances in Neural Information Processing Systems vols. 2018-Decem 2483–2493 (2018).
Huang, Y., Wang, W., Wang, L. & Tan, T. Multi-task deep neural network for multi-label learning. in 2013 IEEE International Conference on Image Processing, ICIP 2013 - Proceedings 2897–2900 (IEEE Computer Society, 2013). doi:10.1109/ICIP.2013.6738596. DOI: https://doi.org/10.1109/ICIP.2013.6738596
Heaton, J. The Number of Hidden Layers. Introduction to Neural Networks for Java 157-158 Preprint at (2008).
Bengio, Y., Courville, A. & Vincent, P. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1798–1828 (2013). DOI: https://doi.org/10.1109/TPAMI.2013.50
Sebt, M. v., Ghasemi, S. H. & Mehrkian, S. S. Predicting the number of customer transactions using stacked LSTM recurrent neural networks. Social Network Analysis and Mining 11, 1–13 (2021). DOI: https://doi.org/10.1007/s13278-021-00805-4
Sachdev, H. S. Choosing number of Hidden Layers and number of hidden neurons in Neural Networks. (2020).
Scarpiniti, M., Comminiello, D., Uncini, A. & Lee, Y. C. Deep recurrent neural networks for audio classification in construction sites. in European Signal Processing Conference vols. 2021-Janua 810–814 (2021). DOI: https://doi.org/10.23919/Eusipco47968.2020.9287802
Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M. & Plumbley, M. D. Detection and Classification of Acoustic Scenes and Events. IEEE Transactions on Multimedia 17, 1733–1746 (2015). DOI: https://doi.org/10.1109/TMM.2015.2428998
Lee, J., Kim, T., Park, J. & Nam, J. Raw Waveform-based Audio Classification Using Sample-level CNN Architectures. (2017) doi:10.48550/arxiv.1712.00866.
Maccagno, A. et al. A CNN Approach for Audio Classification in Construction Sites. in Smart Innovation, Systems and Technologies vol. 184 371–381 (Springer, 2021). DOI: https://doi.org/10.1007/978-981-15-5093-5_33
Palanisamy, K., Singhania, D. & Yao, A. Rethinking CNN Models for Audio Classification. (2020) doi:10.48550/arxiv.2007.11154.
FrontlineSolvers. Training an Artificial Neural Network. https://www.solver.com/training-artificial-neural-network-intro (2020).
UtahDeptSociology. The Normal Distribution - Sociology 3112 - Department of Sociology - The University of utah. https://soc.utah.edu/sociology3112/normal-distribution.php (2022).
Copyright (c) 2023 EMITTER International Journal of Engineering Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
Plagiarism Check
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.