Loading…
AES Show 2024 NY has ended
Exhibits+ badges provide access to the ADAM Audio Immersive Room, the Genelec Immersive Room, Tech Tours, and the presentations on the Main Stage.

All Access badges provide access to all content in the Program (Tech Tours still require registration)

View the Exhibit Floor Plan.
Wednesday October 9, 2024 2:00pm - 4:00pm EDT
Binaural sound source localization is the task of finding the location of a sound source using binaural audio as affected by the head-related transfer functions (HRTFs) of a binaural array. The most common approach to this is to train a convolutional neural network directly on the magnitude and phase of the binaural audio. Recurrent layers can then also be introduced to allow for consideration of the temporal context of the binaural data, as to create a convolutional recurrent neural network (CRNN).
This work compares the relative performance of this approach for speech localization on the horizontal plane using four different CRNN models based on different types of recurrent layers; Conv-GRU, Conv-BiGRU, Conv-LSTM, and Conv-BiLSTM, as well as a baseline system of a more conventional CNN with no recurrent layers. These systems were trained and tested on datasets of binaural audio created by convolution of speech samples with BRIRs of 120 rooms, for 50 azimuthal directions. Additive noise created from additional sound sources on the horizontal plane were also added to the signal.
Results show a clear preference for use of CRNN over CNN, with overall localization error and front-back confusion being reduced, with it additionally being seen that such systems are less effected by increasing reverb time and reduced signal to noise ratio. Comparing the recurrent layers also reveals that LSTM based layers see the best overall localisation performance, while layers with bidirectionality are more robust, and so overall finding a preference for Conv-BiLSTM for the task.
Speakers
avatar for Jago T. Reed-Jones

Jago T. Reed-Jones

Research & Development Engineer, Audioscenic
I am a Research & Development Engineer at Audioscenic, where we are bringing spatial audio to people's homes using binaural audio over loudspeakers. In addition, I am finishing a PhD at Liverpool John Moores University looking at use of neural networks to achieve binaural sound source... Read More →
Authors
avatar for Jago T. Reed-Jones

Jago T. Reed-Jones

Research & Development Engineer, Audioscenic
I am a Research & Development Engineer at Audioscenic, where we are bringing spatial audio to people's homes using binaural audio over loudspeakers. In addition, I am finishing a PhD at Liverpool John Moores University looking at use of neural networks to achieve binaural sound source... Read More →
Wednesday October 9, 2024 2:00pm - 4:00pm EDT
Poster

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link