Real-Time Sound Visualization
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Sound plays a vital role in every aspect of human life since it is one of the primary sensory information that our auditory system collects and allows us to perceive the world. The human nervous system is so adept at hearing the differences between sounds. However, sound clustering and visualization is the most significant challenge, which is understanding how to recognize complex, high-dimensional audio data. Nowadays, sound clustering and visualization is the process of collecting and analyzing audio samples; that process is a prerequisite of sound classification which is the core of automatic speech recognition, virtual assistants, and text to speech applications. In this thesis, I propose a web-based platform to visualize and cluster similar sound samples like music notes and human speech in real-time. For visualizing high-dimensional data like audio, Mel-Frequency Cepstral Coefficients (MFCCs) were initially developed to represent the sounds made by the human vocal tract are extracted. Then, t-distributed Stochastic Neighbor Embedding (t-SNE), a dimensionality reduction technique was designed for high dimensional datasets is applied—finally, the proposed classification task performed by feeding a pre-built neural network as a classifier tool. The proposed platform also handles the problematic task that cluster similar audio samples played by musical instruments in the same family such as oboe and saxophone. The clustering results exhibit well accuracy; thus, enhancing the feasibility of the proposed approach.