ДОСЛІДЖЕННЯ ПРОЦЕСУ ТРАНСЛЯЦІЇ ВІЗУАЛЬНОГО МИСТЕЦТВА В МУЗИКУ ТА СТВОРЕННЯ КОЛЕКЦІЙ ДЛЯ ЛЮДЕЙ З ВАДАМИ ЗОРУ

N. Hryhorenko; N. Larionov; V. Bredikhin

doi:10.33042/2522-1809-2023-6-180-2-6

Authors

N. Hryhorenko O.M. Beketov National University of Urban Economy in Kharkiv
N. Larionov O.M. Beketov National University of Urban Economy in Kharkiv
V. Bredikhin O.M. Beketov National University of Urban Economy in Kharkiv

DOI:

https://doi.org/10.33042/2522-1809-2023-6-180-2-6

Keywords:

recurrent neural network, light music theory, spectrogram, generation of compositions

Abstract

This article explores the creation of music through the automated generation of sounds from images. The developed automatic image sound generation method is based on the joint use of neural networks and light-music theory. Translating visual art into music using machine learning models can be used to make extensive museum collections accessible to the visually impaired by translating artworks from an inaccessible sensory modality (sight) to an accessible one (hearing). Studies of other audio-visual models have shown that previous research has focused on improving model performance with multimodal information, as well as improving the accessibility of visual information through audio presentation, so the work process consists of two parts. The result of the work of the first part of the algorithm for determining the tonality of a piece is a graphic annotation of the transformation of the graphic image into a musical series using all colour characteristics, which is transmitted to the input of the neural network. While researching sound synthesis methods, we considered and analysed the most popular ones: additive synthesis, FM synthesis, phase modulation, sampling, table-wave synthesis, linear-arithmetic synthesis, subtractive synthesis, and vector synthesis. Sampling was chosen to implement the system. This method gives the most realistic sound of instruments, which is an important characteristic. The second task of generating music from an image is performed by a recurrent neural network with a two-layer batch LSTM network with 512 hidden units in each LSTM cell, which assembles spectrograms from the input line of the image and converts it into an audio clip. Twenty-nine compositions of modern music were used to train the network. To test the network, we compiled a set of ten test images of different types (abstract images, landscapes, cities, and people) on which the original musical compositions were obtained and stored. In conclusion, it should be noted that the composition generated from abstract images is more pleasant to the ear than the generation from landscapes. In general, the overall impression of the generated compositions is positive.

Author Biographies

N. Hryhorenko, O.M. Beketov National University of Urban Economy in Kharkiv

2nd year Master’s student of the Educational and Research Institute of Energy, Information and Transport Infrastructure

N. Larionov, O.M. Beketov National University of Urban Economy in Kharkiv

2nd year Master’s student of the Educational and Research Institute of Energy, Information and Transport Infrastructure

V. Bredikhin, O.M. Beketov National University of Urban Economy in Kharkiv

Candidate of Technical Sciences, Associate Professor, Department of Computer Science and Information Technology

References

Chervinska, N. (2022, August 12). Generating Music with AI: How it Works. Depositphotos. Retrieved from https://blog.depositphotos.com/ua/yak-shtuchnyj-intelekt-stvoryuye-muzyku.html

Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., Donahue, C., & Roberts, A. (2019). GANSynth: Adversarial Neural Audio Synthesis. Proceedings of the 7th International Conference on Learning Representations (ICLR) (17 p.). DOI: 10.48550/arXiv.1902.08710

Caivano, J. L. (1994). Color and Sound: Physical and Psychophysical Relations. Color Research and Application, 19(2), 126–132. DOI: 10.1111/j.1520-6378.1994.tb00072.x

Komarskyi, O. S., & Doroshenko, A. Yu. (2022). Recurrent neural network model for music generation. Problems in programming, 1, 87–93. DOI: 10.15407/pp.2022.01.87 [in Ukrainian]

Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music. Proceedings of the 35th International Conference on Machine Learning (ICML) (pp. 4364–4373). Proceedings of Machine Learning Research (PMLR). Retrieved from http://proceedings.mlr.press/v80/roberts18a/roberts18a.pdf

Yarovyi, M. V., & Nazarov, O. S. (2021). Frequency analysis in sound recognition tasks using neural networks. Proceedings of the 1st International Student Scientific Conference ‘Modern aspects and prospects for the development of science’: Vol. 2 (pp. 48–50). Youth Science League. Retrieved from https://ojs.ukrlogos.in.ua/index.php/liga/issue/view/16.04.2021/502 [in Ukrainian]

Bondarenko, A. I. (2015). Detection and analysis of acoustic events in electronic music (on the example of “Motus” by A. Zahaikevych). Issues in Cultural Studies, 31, 22–28. Retrieved from http://nbuv.gov.ua/UJRN/Pkl_2015_31_5 [in Ukrainian]

Kushch, E. V. (2013). About some aspects of functioning of electronic musical instruments in musical culture of the second half of the XX-th century. The Scientific Issues of Ternopil Volodymyr Hnatiuk National Pedagogical University. Series: Art Studies, 1, 17–23. Retrieved from http://dspace.tnpu.edu.ua/bitstream/123456789/3824/1/KUSHCH.pdf [in Ukrainian]

MasterClass. (2021, June 7). How to Sample Music: Step-by-Step Music Sampling Guide. Retrieved from https://www.masterclass.com/articles/how-to-sample-music

RESEARCH OF THE PROCESS OF VISUAL ART TRANSMISSION IN MUSIC AND THE CREATION OF COLLECTIONS FOR PEOPLE WITH VISUAL IMPAIRMENTS

Authors

DOI:

Keywords:

Abstract

Author Biographies

N. Hryhorenko, O.M. Beketov National University of Urban Economy in Kharkiv

N. Larionov, O.M. Beketov National University of Urban Economy in Kharkiv

V. Bredikhin, O.M. Beketov National University of Urban Economy in Kharkiv

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Developed By

Language

Information

Make a Submission