Loading…
AES Show 2024 NY has ended
Exhibits+ badges provide access to the ADAM Audio Immersive Room, the Genelec Immersive Room, Tech Tours, and the presentations on the Main Stage.

All Access badges provide access to all content in the Program (Tech Tours still require registration)

View the Exhibit Floor Plan.
Monday, October 7
 

10:30am EDT

AI tools for sound design: emotional music and sound effects in visual media
Monday October 7, 2024 10:30am - 12:00pm EDT
Join us for an engaging workshop exploring deep learning tools for sound design. You'll receive two 30-second video clips, each intended to evoke different emotional responses, and be tasked with creating a musical accompaniment (including sound design) using the AI tools introduced. Gain insights from professionals who offer diverse perspectives on these technologies and learn how to integrate commercial and open-source AI effectively into your creative process. We’ll discuss the future of AI in sound design, focusing on its potential benefits, challenges, and areas for improvement. Come explore how AI can support rather than replace audio professionals.
Monday October 7, 2024 10:30am - 12:00pm EDT
1E03

12:15pm EDT

Dolby Atmos Music Essentials: From Stereo to Atmos
Monday October 7, 2024 12:15pm - 1:45pm EDT
In this workshop you’ll gain insights into Dolby Atmos audio with a particular emphasis on using your original stereo mix as a starting point to create an Atmos mix. You will learn how to set up your stereo mix session in a way that will make the transition into Atmos easy as well as create an Atmos mix using stems.
Speakers
avatar for Oscar Zambrano

Oscar Zambrano

Chief Engineer / Partner, Zampol Productions
Oscar Zambrano was born and raised in Mexico City. After graduating from Berklee College of Music in 2003, he moved to New York where he started working at Sound on Sound. In 2004 he and Jorge Castellanos founded Zampol Productions, a recording studio that focuses on mastering, mixing... Read More →
Monday October 7, 2024 12:15pm - 1:45pm EDT
1E03

2:00pm EDT

Creative sample-based music making in Ableton Live 12
Monday October 7, 2024 2:00pm - 3:30pm EDT
Join us for an in-depth workshop where you will explore the advanced sampling capabilities of Ableton Live 12 with expert presenter Ben Casey. This session will dive into the enhanced features and tools that make Live 12 a powerhouse for creative sample-based music production. This workshop will equip you with practical skills and techniques to elevate your music-making process. Learn how to discover, organize, and manipulate samples, generate new ideas, and integrate them seamlessly into your projects, all within the intuitive workflow of Ableton Live 12.
Speakers
avatar for Ben Casey

Ben Casey

Certified Trainer, Ableton
Ben Casey is a Brooklyn-based electronic musician, Ableton Certified Trainer, and overall music tech nerd. When he’s not surrounded by wires and drum machines or tinkering with Max for Live, Ben teaches Ableton Live to musicians across all genres, from avant-garde to zydeco.
Monday October 7, 2024 2:00pm - 3:30pm EDT
1E03

3:45pm EDT

A practical introduction to Remote ADR and Music Overdub workflows
Monday October 7, 2024 3:45pm - 5:15pm EDT
Outcome:
A deeper understanding of the voice ADR and music overdub process between the engineer, talent and client, advanced remote collaboration techniques for audio engineers, and general knowledge of the challenges of remote recording and how to overcome them to achieve flawless remote sessions.

Summary:
In this workshop you will learn in-depth the critical concepts you need to know for stress-free Remote ADR and Music Overdub sessions, when the talent (e.g. voice actor or music recording artist), audio engineer and perhaps client are all remote from each other. The primary challenge in recording remotely is that the talent will not have the engineer alongside them, and this requires additional technical setup on the talent’s side. Guiding the talent through this process is as important as knowing how to configure your own setup. In this workshop you will learn ADR and Overdub terms and concepts, how to set up a DAW for remote recording, using remote timeline synchronization techniques, and how to manage remote talent and client sessions using professional remote collaboration tools.
Bio:
Robert Marshall is musician, engineer, and producer, he is the visionary behind the idea of leveraging Internet connectivity to make post-production faster and more productive. He is a crucial force behind the solutions that have become synonymous with Source Elements such as Remote Transport Sync and Auto-Restore & Replace. Vincent dePierro is a sound engineer and musician. As Head of Support for Source Elements, as well as being a core member of their Innovations and Solutions team, Vincent has in-depth knowledge of how engineers and talent work together online.

Prior experience:
Assumes you already have knowledge in basic acoustics and digital audio workstations, existing knowledge of ADR workflows is not required.
Monday October 7, 2024 3:45pm - 5:15pm EDT
1E03
 
Tuesday, October 8
 

9:30am EDT

An Industry Focused Investigation into Immersive Commercial Melodic Rap Production - Part Two
Tuesday October 8, 2024 9:30am - 9:50am EDT
In part one of this study, five professional mixing engineers were asked to create a Dolby Atmos 7.1.4 mix of the same melodic rap song adhering to the following commercial music industry specifications: follow the framework of the stereo reference, implement binaural distance settings, and conform to –18LKFS, -1dBTP loudness levels. An analysis of the mix sessions and post-mix interviews with the engineers revealed that they felt creatively limited in their approaches due to the imposed industry specifications. The restricted approaches were evident through the minimal applications of mix processing, automation, and traditional positioning of key elements in the completed mixes.
In part two of this study, the same mix engineers were asked to complete a second mix of the same song without any imposed limitations and were encouraged to approach the mix creatively. Intra-subject comparisons between the restricted and unrestricted mixes were explored to identify differences in element positioning, mix processing techniques, panning automation, loudness levels, and binaural distance settings. Analysis of the mix sessions and interviews showed that when no restrictions were imposed on their work, the mix engineers emphasized the musical narrative through more diverse element positioning, increased use of automation, and applications of additional reverb with characteristics that differed from the reverb in the source material.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers
avatar for Christal Jerez

Christal Jerez

Engineer, Christal's Sonic Lab
Christal Jerez is an audio engineer with experience recording, mixing and mastering music. After studying audio production at American University for her B.A. in Audio Production and at New York University for her Masters degree in Music Technology, she started working professionally... Read More →
Authors
avatar for Christal Jerez

Christal Jerez

Engineer, Christal's Sonic Lab
Christal Jerez is an audio engineer with experience recording, mixing and mastering music. After studying audio production at American University for her B.A. in Audio Production and at New York University for her Masters degree in Music Technology, she started working professionally... Read More →
avatar for Andrew Scheps

Andrew Scheps

Owner, Tonequake Records
Andrew Scheps has worked with some of the biggest bands in the world: Green Day, Red Hot Chili Peppers, Weezer, Audioslave, Black Sabbath, Metallica, Linkin Park, Hozier, Kaleo and U2. He’s worked with legends such as Johnny Cash, Neil Diamond and Iggy Pop, as well as indie artists... Read More →
avatar for Hyunkook Lee

Hyunkook Lee

Professor, Applied Psychoacoustics Lab, University of Huddersfield
Professor
Tuesday October 8, 2024 9:30am - 9:50am EDT
1E03

9:50am EDT

Investigation of spatial resolution of first and high order ambisonics microphones as capturing tool for auralization of real spaces in recording studios equipped with virtual acoustics systems
Tuesday October 8, 2024 9:50am - 10:10am EDT
This paper proposes a methodology for studying the spatial resolution of a collection of first-order and high-order ambisonic microphones when employed as a capturing tool of Spatial Room Impulse Responses (SRIRs) for virtual acoustics applications. In this study, the spatial resolution is defined as the maximum number of mono statistically independent impulse responses that can be extracted through beamforming technique and used in multichannel convolution reverbs. The correlation of the responses is assessed as a function of the beam angle and frequency bands, adapted to the frequency response of the loudspeakers in use, with the aim to be used in recording studios equipped with virtual acoustics systems that operate in the creation of the spatial impression of reverberation of real environments. The study examines the differences introduced by the physical characteristics of the microphones, the normalization methodologies of the spherical harmonics, and the number of spherical harmonics introduced in the encoding (ambisonic order). Preliminary results show that the correlation is inversely proportional to frequency as a function of wavelength. 
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers
avatar for Gianluca Grazioli

Gianluca Grazioli

Montreal, Canada, McGill University
Authors
Tuesday October 8, 2024 9:50am - 10:10am EDT
1E03

10:10am EDT

A comparative study of volumetric microphone techniques and methods in a classical recording context
Tuesday October 8, 2024 10:10am - 10:30am EDT
This paper studies volumetric microphone techniques (i.e. using configurations of multiple Ambisonic microphones) in a classical recording context. A pilot study with expert opinions was designed to show its feasibility. Based on the findings from the pilot study, a trio recording of piano, violin, and cello was conducted where 6 Ambisonic microphones established a hexagon. Such a volumetric approach is believed to improve the sound characteristics where the recordings were processed with the SoundField by RØDE Ambisonic decoder and were produced into a 7.0.4 loudspeaker system. A blinded subject experiment was designed where the participants were asked to evaluate the volumetric hexagonal configuration, comparing it to a more traditional 5.0 immersive configuration and a single Ambisonic microphone, all of which were mixed with spot microphones. These results were quantitatively analyzed, and revealed that the volumetric configuration is the most localized amongst all, but less immersive than the single Ambisonic microphone. No significant difference occurred in focus, naturalness, and preference. The analyses are generalized because the demographic backgrounds of the participants have no effect on the sound characteristics.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers Authors
avatar for Parichat Songmuang

Parichat Songmuang

Studio Manager/PhD Student, New York University
Parichat Songmuang graduated from New York University with her Master of Music degree in Music Technology at New York University and Advanced Certificate in Tonmeister Studies. As an undergraduate, she studied for her Bachelor of Science in Electronics Media and Film with a concentration... Read More →
PG

Paul Geluso

Director of the Music Technolo, New York University
Tuesday October 8, 2024 10:10am - 10:30am EDT
1E03

10:30am EDT

Bestiari: a hypnagogic experience created by combining complementary state-of-the-art spatial sound technologies, Catalan Pavilion, Venice Art Biennale 2024
Tuesday October 8, 2024 10:30am - 10:50am EDT
Bestiari, by artist Carlos Casas1, is a spatial audio installation created as the Catalan pavilion for the 2024 Venice Art Biennale. The installation was designed for ambulant visitors and the use of informal seating arrangements distributed throughout the reproduction space, so the technical installation design did not focus on listeners’ presence in a single “sweet-spot”. While high-quality conventional spatial loudspeaker arrays typically provide excellent surround-sound experiences, the particular challenge of this installation was to reach into the proximate space of individual, dispersed and mobile listeners, rather than providing an experience that was only peripherally enveloping. To that end, novel spatial audio workflows and combinations of reproduction technologies were employed, including: High-order Ambisonic (HoA), Wavefield Synthesis (WFS), beamforming icosahedral (IKO), directional/parametric ultrasound, and infrasound. The work features sound recordings made for each reproduction technology, e.g., ambient Ambisonic soundfields recorded in Catalan national parks combined with mono and stereo recordings of specific insects in that habitat simultaneously projected via the WFS system. In-situ production provided an opportunity to explore the differing attributes of the reproduction devices and their interactions with the acoustical characteristics of the space – a concrete and brick structure with a trussed wooden roof, built in the late 1800s for the Venetian shipping industry. The practitioners’ reflections on this exploration, including their perception of the capabilities of this unusual combination of spatial technologies, are presented. Design, workflows and implementation are detailed.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers
avatar for Craig Cieciura

Craig Cieciura

Research Fellow, University of Surrey
Craig graduated from the Music and Sound Recording (Tonmeister) course at The University of Surrey in 2016. He then completed his PhD at the same institution in 2022. His PhD topic concerned reproduction of object-based audio in the domestic environment using combinations of installed... Read More →
Authors
avatar for Craig Cieciura

Craig Cieciura

Research Fellow, University of Surrey
Craig graduated from the Music and Sound Recording (Tonmeister) course at The University of Surrey in 2016. He then completed his PhD at the same institution in 2022. His PhD topic concerned reproduction of object-based audio in the domestic environment using combinations of installed... Read More →
Tuesday October 8, 2024 10:30am - 10:50am EDT
1E03

10:50am EDT

Influence of Dolby Atmos versus Stereo Formats on Narrative Engagement: A Comparative Study Using Physiological and Self-Report Measures
Tuesday October 8, 2024 10:50am - 11:10am EDT
As spatial audio technology rapidly evolves, the conversation around immersion becomes ever more relevant, particularly in how these advancements enhance the creation of compelling sonic experiences. However, immersion is a complex, multidimensional construct, making it challenging to study in its entirety. This paper narrows the focus to one particular dimension—narrative engagement—to explore how it shapes the immersive experience. Specifically, we investigate whether the multichannel audio format, here 7.1.4, enhances narrative engagement compared to traditional stereo storytelling. Participants were exposed to two storytelling examples: one in an immersive format and another in a stereo fold-down. Physiological responses were recorded during listening sessions, followed by a self-report survey adapted from the Narrative Engagement Scale. The lack of significant differences between two formats in both subjective and objective measures is discussed in the context of existing studies.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers
avatar for Hyunkook Lee

Hyunkook Lee

Professor, Applied Psychoacoustics Lab, University of Huddersfield
Professor
Authors
avatar for Hyunkook Lee

Hyunkook Lee

Professor, Applied Psychoacoustics Lab, University of Huddersfield
Professor
Tuesday October 8, 2024 10:50am - 11:10am EDT
1E03

11:10am EDT

Creation of representative head-related impulse responses for binaural rendering of moving audio objects
Tuesday October 8, 2024 11:10am - 11:30am EDT
To achieve highly realistic 3D audio reproduction in virtual reality (VR) or augmented reality (AR) through binaural rendering, we must address the considerable computational complexity involved in convoluting head-related impulse responses (HRIRs). To reduce this complexity, an algorithm is proposed where audio signals are distributed to pre-defined representative directions through panning. Only the distributed signals are then convoluted with the corresponding HRIRs. In this study, we explored a method for generating representative HRIRs through learning, utilizing a full-sphere HRIR set. This approach takes into account smooth transitions and minimal degradation introduced during rendering, for both moving and static audio objects. Compared with conventional panning, the proposed method reduces average distortion by approximately 47% while maintaining the runtime complexity of the rendering.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers Authors
avatar for Masayuki Nishiguchi

Masayuki Nishiguchi

Professor, Akita Prefectural University
Masayuki Nishiguchi received his B.E., M.S., and Ph.D. degrees from Tokyo Institute of Technology, University of California Santa Barbara, and Tokyo Institute of Technology, in 1981, 1989, and 2006 respectively.  He was with Sony corporation from 1981 to 2015, where he was involved... Read More →
Tuesday October 8, 2024 11:10am - 11:30am EDT
1E03

11:30am EDT

Quantifying the Impact of Head-Tracked Spatial Audio on Common User Auditory Experiences using Facial Microexpressions
Tuesday October 8, 2024 11:30am - 11:50am EDT
The study aims to enhance the understanding of how Head Tracked Spatial Audio technology influences both emotional responses and immersion levels among listeners. By employing micro facial gesture recognition technology, it quantifies the depth of immersion and the intensity of emotional responses elicited by various types of binaural content, measuring categories such as Neutral, Happy, Sad, Angry, Surprised, Scared, Disgusted, Contempt, Valence, and Arousal. Subjects were presented with a randomized set of audio stimuli consisting of stereo music, stereo speech, and 5.1 movie content. Each audio piece lasted 15 seconds, and the Spatial Audio processing was On or Off randomly throughout the experiment. The FaceReader software was detecting the facial microexpressions of the subjects constantly. Statistical analysis was conducted using R software, applying Granger causality tests in time series, T-tests, and the P-value criterion for hypothesis validation. After consolidating the records of 78 participants, the final database consisted of 212,862 unique data points. With a 95% confidence, it was determined that the average level of "Arousal" is significantly higher when Head Tracked Spatial Audio is activated compared to when it is deactivated, suggesting that HT technology increases the emotional arousal of audio listeners. Regarding the happiness reaction, the highest levels were recorded in mode 5 (HT on and Voice) with an average of 0.038, while the lowest levels were detected in mode 6 (HT off and Voice). Preliminary conclusions indicate that surprise effectively causes a decrease in neutrality, supporting the dynamic interaction between these emotional variables.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers Authors
Tuesday October 8, 2024 11:30am - 11:50am EDT
1E03

11:50am EDT

Investigating the Role of Customized Interaural Time Differences on First-Person Shooter Gaming Performance
Tuesday October 8, 2024 11:50am - 12:10pm EDT
Binaural listening with personalized Head-Related Transfer Functions (HRTFs) is known to enhance a listener's auditory localization in virtual environments, including gaming. However, the methods for achieving personalized HRTFs are often inaccessible for average game players due to measurement complexity and cost. This study explores a simplified approach to improving game performance, particularly in First-Person Shooter (FPS) games by optimizing Interaural Time Difference (ITD). Recognizing that horizontal localization is particularly important for identifying opponent positions in FPS games, this study hypothesizes that optimizing ITD alone may be sufficient for better game performances, potentially alleviating the need for full HRTF personalization. To test this hypothesis, a simplified FPS game environment was developed in Unity. Participants performed tasks to detect sound positions under three HRTF conditions: MIT-KEMAR, Steam Audio’s default HRTF, and the proposed ITD optimization method. The results indicated that our proposed method significantly reduced players' response times compared to other HRTF conditions. These findings allow players to improve their gaming performance within FPS games through simplified HRTF optimization, broadening accessibility to optimized HRTFs for a wider range of game users.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers
avatar for Sungjoon Kim

Sungjoon Kim

Research Intern, Korea Advanced Institute of Science and Technology
Authors
avatar for Sungjoon Kim

Sungjoon Kim

Research Intern, Korea Advanced Institute of Science and Technology
avatar for Rai Sato

Rai Sato

Ph.D. Student, Korea Advanced Institute of Science and Technology
Rai Sato (佐藤 来) is currently pursuing a PhD at the Graduate School of Culture Technology at the Korea Advanced Institute of Science and Technology. He holds a Bachelor of Music from Tokyo University of the Arts, where he specialized in immersive audio recording and psychoacoustics... Read More →
Tuesday October 8, 2024 11:50am - 12:10pm EDT
1E03

2:00pm EDT

Towards prediction of high-fidelity earplug subjective ratings using acoustic metrics
Tuesday October 8, 2024 2:00pm - 2:30pm EDT
High-fidelity earplugs are used by musicians and live sound engineers to prevent hearing damage while allowing musical sounds to reach the eardrum without distortion. To determine objective methods for judging earplug fidelity in a similar way to headphones or loudspeakers, a small sample of trained listeners were asked to judge the attenuation level and clarity of music through seven commercially available passive earplugs. These scores were then compared to acoustic/musical metrics measured in a laboratory. It was found that NRR is strongly predictive of both attenuation and clarity scores, and that insertion loss flatness provides no advantage over NRR. A different metric measuring spectral flatness distortion seems to predict clarity independently from attenuation and will be subject to further study.
Moderators Speakers
avatar for David Anderson

David Anderson

Assistant Professor, University of Minnesota Duluth
Authors
avatar for David Anderson

David Anderson

Assistant Professor, University of Minnesota Duluth
Tuesday October 8, 2024 2:00pm - 2:30pm EDT
1E03

2:30pm EDT

Decoding Emotions: Lexical and Acoustical Cues in Vocal Affects
Tuesday October 8, 2024 2:30pm - 3:00pm EDT
This study investigates listeners’ ability to detect emotion from a diverse set of speech samples, including both spontaneous conversations and actor-posed speech. It explores the contributions of lexical content and acoustic properties when native listeners rate seven pairs of affective attributes. Two experimental conditions were employed: a text condition, where participants evaluated emotional attributes from written transcripts without vocal information, and a voice condition, where participants listened to audio recordings to assess emotions. Results showed that the importance of lexical and vocal cues varies across 14 affective states for posed and spontaneous speech. Vocal cues enhanced the expression of sadness and anger in posed speech, while they had less impact on conveying happiness. Notably, vocal cues tended to mitigate negative emotions conveyed by the lexical content in spontaneous speech. Further analysis on correlations between emotion ratings in text and voice conditions indicated that lexical meanings suggesting anger or hostility could be interpreted as positive affective states like intimacy or confidence. Linear regression analyses indicated that emotional ratings by native listeners could be predicted up to 59% by lexical content and up to 26% by vocal cues. Listeners relied more on vocal cues to perceive emotional tone when the lexical content was ambiguous in terms of feeling and attitude. Finally, the analysis identified statistically significant basic acoustical parameters and other non/para-linguistic information, after controlling for the effect of lexical content.
Moderators Speakers
EO

Eunmi Oh

Research Professor, Yonsei University
Authors
EO

Eunmi Oh

Research Professor, Yonsei University
Tuesday October 8, 2024 2:30pm - 3:00pm EDT
1E03

3:00pm EDT

A comparison of in-ear headphone target curves for the Brüel & Kjær Head & Torso Simulator Type 5128
Tuesday October 8, 2024 3:00pm - 3:30pm EDT
Controlled listening tests were conducted on five different in-ear (IE) headphone target curves measured on the latest ITU-T Type 4.3 ear simulator (e.g. Bruel & Kjaer Head & Torso Simulator Type 5128). A total of 32 listeners rated each target on a 100-point scale based on preference for three different music programs with two observations each. When averaged across all listeners, two target curves were found to be equally preferred over the other choices. Agglomerative hierarchical clustering analysis further revealed two classes of listeners based on dissimilarities in their preferred target curves. Class One (72% of listeners) preferred the top two rated targets. Class two (28% of listeners) preferred targets with 2 dB less bass and 2 dB more treble than the target curves preferred by Class 1. Among the demographic factors examined, age was the best predictor of membership in each class.
Moderators Speakers Authors
Tuesday October 8, 2024 3:00pm - 3:30pm EDT
1E03

3:30pm EDT

A cepstrum analysis approach to perceptual modelling of the precedence effect
Tuesday October 8, 2024 3:30pm - 4:00pm EDT
The precedence effect describes our ability to perceive the spatial characteristics of lead and lag sound signals. When the time delay between the lead and lag is sufficiently small we will cease to hear two distinct sounds, instead perceiving the lead and lag as a single fused sound with its own spatial characteristics. Historically, precedence effect models have had difficulty differentiating between lead/lag signals and their fusions. The likelihood of fusion occurring is increased when the signal contains periodicity, such as in the case of music. In this work we present a cepstral analysis based perceptual model of the precedence effect, CEPBIMO, which is more resilient to the presence of fusions than its predecessors. To evaluate our model we employ four datasets of various signal types, each containing 10,000 synthetically generated room impulse responses. The results of the CEPBIMO model are then compared against results of the BICAM. Our results show that the CEPBIMO model is more resilient to the presence of fusions and signal periodicity than previous precedence effect models.
Moderators Speakers
avatar for Jeramey Tyler

Jeramey Tyler

Samtec
Jeramey is in the 3rd person. So it goes.
Authors
avatar for Jeramey Tyler

Jeramey Tyler

Samtec
Jeramey is in the 3rd person. So it goes.
Tuesday October 8, 2024 3:30pm - 4:00pm EDT
1E03

4:00pm EDT

Categorical Perception of Neutral Thirds Within the Musical Context
Tuesday October 8, 2024 4:00pm - 4:30pm EDT
This paper investigates the contextual recognition of neutral thirds in music by integrating real-world musical context into the study of categorical perception. Traditionally, categorical perception has been studied using isolated auditory stimuli in controlled laboratory settings. However, music is typically experienced within a circumstantial framework, significantly influencing its reception. Our study involved musicians from various specializations who listened to precomposed musical fragments, each concluding with a 350-cent interval preceded by different harmonic contexts. The fragments included a monophonic synthesizer and orchestral mockups, with contexts such as major chords, minor chords, a single pitch, neutral thirds, and natural fifths. The results indicate that musical context remarkably affects the recognition of pseudotonal chords. Participants' accuracy in judging interval size varied based on the preceding harmonic context. A statistical analysis was conducted to determine if there were significant differences in the neutral third perception across the different harmonic contexts. The test led to the rejection of the null hypothesis: the findings underscore the need to consider real-world listening experiences in research on auditory processing and cognition.
Moderators Speakers Authors
Tuesday October 8, 2024 4:00pm - 4:30pm EDT
1E03
 
Wednesday, October 9
 

9:30am EDT

Acoustic modeling and designing of emergency sound systems in road tunnels
Wednesday October 9, 2024 9:30am - 9:50am EDT
Road tunnels are subject to strong regulations and high life-safety standards. Requirements importantly include the speech intelligibility of the emergency sound system. However, a road tunnel is an acoustically challenging environment due to extreme reverberation, high noise floors, and its tube-like geometry. Designing an adequate sound system is a challenge and requires extensive acoustic simulation. This article summarizes recent design work on several major road tunnel projects and gives practical guidelines for the successful completion of similar projects. The project includes several tunnels, each of several kilometers’ length with one to five lanes, transitions and sections, having a total length of 33 km. For each tunnel, first a working acoustic model had to be developed before the sound system itself could be designed and optimized. On-site measurements were conducted to establish data for background noise including jet fans and various traffic situations. Critical environmental parameters were measured and reverberation times were recorded using large balloon bursts. Sprayed concrete, road surface, as well as other finishes were modeled or estimated based on publicly available data for absorption and scattering characteristics. To establish the geometrical model, each tunnel was subdivided into manageable segments of roughly 1-2 km in length based on theoretical considerations. After calibrating the model, the sound system was designed as a large number of loudspeaker lines evenly distributed along the tunnel. Level and delay alignments as well as filter adjustments were applied to achieve the required average STI of 0.48. Validation by measurement showed good correlation with modeling results.
Moderators
avatar for Ying-Ying Zhang

Ying-Ying Zhang

PhD Candidate, McGill University
YIng-Ying Zhang is a music technology researcher and sound engineer. She is currently a PhD candidate at McGill University in the Sound Recording program where her research focuses on musician-centered virtual acoustic applications in recording environments. She received her Masters... Read More →
Speakers
avatar for Stefan Feistel

Stefan Feistel

Managing Director/Partner/Co-Founder, AFMG
Stefan Feistel is Managing Director/Partner/Co-Founder of AFMG, Berlin Germany. He is an expert in acoustical simulation and calculation techniques and applications.
Authors
avatar for Stefan Feistel

Stefan Feistel

Managing Director/Partner/Co-Founder, AFMG
Stefan Feistel is Managing Director/Partner/Co-Founder of AFMG, Berlin Germany. He is an expert in acoustical simulation and calculation techniques and applications.
avatar for Tim Kuschel

Tim Kuschel

Acoustic Consultant, GUZ BOX design + audio
Experienced Acoustic Consultant with a demonstrated history of working in the architecture & planning industry. Skilled in architectural documentation, audio system design, acoustics and extensive experience using AFMG's acoustic modelling software EASE, Tim provides professional... Read More →
Wednesday October 9, 2024 9:30am - 9:50am EDT
1E03

9:50am EDT

Sound immission modeling of open-air sound systems
Wednesday October 9, 2024 9:50am - 10:10am EDT
Noise emission into the neighborhood is often a major concern when designing the configuration of an open-air sound system. In order for events to be approved, advance studies have to show that expected immission levels comply with given regulations and requirements of local authorities. For this purpose, certified engineering offices use dedicated software tools for modeling environmental noise propagation. However, predicting the radiation of modern sound systems is different from classical noise sources, such as trains or industrial plants. Sound systems and their directional patterns are modeled in electro-acoustic simulation software that can be fairly precise but that typically does not address environmental issues.
This paper proposes to use a simple data exchange format that can act as an open interface between sound system modeling tools and noise immission software. It is shown that most immission studies are conducted at points in the far field of the sound system. Far-field directivity data for the sound system is therefore a suitable solution if it is accompanied by a corresponding absolute level reference. The proposed approach has not only the advantage of being accurate for the given application but also involves low computational costs and is fully compliant with the existing framework of outdoor noise modeling standards. Concerns related to documentation and to the protection of proprietary signal processing settings are resolved as well. The proposed approach was validated by measurements at a number of outdoor concerts. Results are shown to be practically accurate within the given limits of uncertainty.
Moderators
avatar for Ying-Ying Zhang

Ying-Ying Zhang

PhD Candidate, McGill University
YIng-Ying Zhang is a music technology researcher and sound engineer. She is currently a PhD candidate at McGill University in the Sound Recording program where her research focuses on musician-centered virtual acoustic applications in recording environments. She received her Masters... Read More →
Speakers
avatar for Stefan Feistel

Stefan Feistel

Managing Director/Partner/Co-Founder, AFMG
Stefan Feistel is Managing Director/Partner/Co-Founder of AFMG, Berlin Germany. He is an expert in acoustical simulation and calculation techniques and applications.
Authors
avatar for Stefan Feistel

Stefan Feistel

Managing Director/Partner/Co-Founder, AFMG
Stefan Feistel is Managing Director/Partner/Co-Founder of AFMG, Berlin Germany. He is an expert in acoustical simulation and calculation techniques and applications.
Wednesday October 9, 2024 9:50am - 10:10am EDT
1E03

10:10am EDT

Virtual Acoustics Technology in the Recording Studio-A System Update
Wednesday October 9, 2024 10:10am - 10:30am EDT
This paper describes ongoing efforts toward optimizing the Virtual Acoustics Technology (VAT) system installed in the Immersive Media Lab at McGill University. Following the integration of the CAVIAR cancelling auralizer for feedback suppression, this current iteration of the active acoustics system is able to flexibly support the creation of virtual environments via the convolution of Spatial Room Impulse Responses (SRIRs) with real-time microphone signals. While the system has been successfully used for both recordings and live performances, we have nevertheless been looking to improve upon its “stability from feedback” and “natural sound quality,” two significant attributes of active acoustics systems [1]. We have implemented new software controls and microphone input methods to increase our ratio of gain before feedback, while additionally repositioning and adding loudspeakers to the system to generate a more even room coverage. Following these additions, we continue to evaluate the space through objective measurements and feedback from musicians and listeners.
[1] M. A. Poletti, “Active Acoustic Systems for the Control of Room Acoustics,” in Proceedings of the International Symposium on Room Acoustics, Melbourne, Australia, Aug. 2010.
Moderators
avatar for Ying-Ying Zhang

Ying-Ying Zhang

PhD Candidate, McGill University
YIng-Ying Zhang is a music technology researcher and sound engineer. She is currently a PhD candidate at McGill University in the Sound Recording program where her research focuses on musician-centered virtual acoustic applications in recording environments. She received her Masters... Read More →
Speakers
avatar for Kathleen Ying-Ying Zhang

Kathleen Ying-Ying Zhang

PhD Candidate, McGill University
YIng-Ying Zhang is a music technology researcher and sound engineer. She is currently a PhD candidate at McGill University in the Sound Recording program where her research focuses on musician-centered virtual acoustic applications in recording environments. She received her Masters... Read More →
Authors
avatar for Kathleen Ying-Ying Zhang

Kathleen Ying-Ying Zhang

PhD Candidate, McGill University
YIng-Ying Zhang is a music technology researcher and sound engineer. She is currently a PhD candidate at McGill University in the Sound Recording program where her research focuses on musician-centered virtual acoustic applications in recording environments. She received her Masters... Read More →
avatar for Mihai-Vlad Baran

Mihai-Vlad Baran

McGill University
avatar for Richard King

Richard King

Professor, McGill University
Richard King is an Educator, Researcher, and a Grammy Award winning recording engineer. Richard has garnered Grammy Awards in various fields including Best Engineered Album in both the Classical and Non-Classical categories. Richard is an Associate Professor at the Schulich School... Read More →
Wednesday October 9, 2024 10:10am - 10:30am EDT
1E03

10:30am EDT

A General Overview of Methods for Generating Room Impulse Responses
Wednesday October 9, 2024 10:30am - 10:50am EDT
The utilization of room impulse responses has proven valuable for both the acoustic assessment of indoor environments and music production. Various techniques have been devised over time to capture these responses. Although algorithmic solutions have been in existence since the 1960s for generating synthetic reverberation in real time, they continue to be computationally demanding and in general lack the accuracy in comparison to measured authentic Room Impulse Responses (RIR). In recent times, machine learning has found application in diverse fields, including acoustics, leading to the development of techniques for generating RIRs. This paper provides a general overview, of approaches and methods for generating RIRs, categorized into algorithmic and machine learning techniques, with a particular emphasis on the latter. Discussion covers the acoustical attributes of rooms relevant to perceptual testing and methodologies for comparing RIRs. An examination of disparities between captured and generated RIRs is included to better delineate the key acoustic properties characterizing a room. The paper is designed to offer a general overview for those interested in RIR generation for music production purposes, with future work considerations also explored.
Moderators
avatar for Ying-Ying Zhang

Ying-Ying Zhang

PhD Candidate, McGill University
YIng-Ying Zhang is a music technology researcher and sound engineer. She is currently a PhD candidate at McGill University in the Sound Recording program where her research focuses on musician-centered virtual acoustic applications in recording environments. She received her Masters... Read More →
Speakers
avatar for Mihai-Vlad Baran

Mihai-Vlad Baran

McGill University
Authors
avatar for Mihai-Vlad Baran

Mihai-Vlad Baran

McGill University
avatar for Richard King

Richard King

Professor, McGill University
Richard King is an Educator, Researcher, and a Grammy Award winning recording engineer. Richard has garnered Grammy Awards in various fields including Best Engineered Album in both the Classical and Non-Classical categories. Richard is an Associate Professor at the Schulich School... Read More →
Wednesday October 9, 2024 10:30am - 10:50am EDT
1E03

11:00am EDT

Reimagining Delay Effects: Integrating Generative AI for Creative Control
Wednesday October 9, 2024 11:00am - 11:20am EDT
This paper presents a novel generative delay effect that utilizes generative AI to create unique variations of a melody with each new echo. Unlike traditional delay effects, where repetitions are identical to the original input, this effect generates variations in pitch and rhythm, enhancing creative possibilities for artists. The significance of this innovation lies in addressing artists' concerns about generative AI potentially replacing their roles. By integrating generative AI into the creative process, artists retain control and collaborate with the technology, rather than being supplanted by it. The paper outlines the processing methodology, which involves training a Long Short-Term Memory (LSTM) neural network on a dataset of publicly available music. The network generates output melodies based on input characteristics, employing a specialized notation language for music. Additionally, the implementation of this machine learning model within a delay plugin's architecture is discussed, focusing on parameters such as buffer length and tail length. The integration of the model into the broader plugin framework highlights the practical aspects of utilizing generative AI in audio effects. The paper also explores the feasibility of deploying this technology on microcontrollers for use in instruments and effects pedals. By leveraging low-power AI libraries, this advanced functionality can be achieved with minimal storage requirements, demonstrating the efficiency and versatility of the approach. Finally, a demonstration of an early version of the generative delay effect will be presented.
Moderators
avatar for Marina Bosi

Marina Bosi

Stanford University
Marina Bosi,  AES Past President, is a founding Director of the Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) and the Chair of the Context-based Audio Enhancement (MPAI-CAE) Development Group and IEEE SA CAE WG.  Dr. Bosi has served the Society as President... Read More →
Speakers Authors
Wednesday October 9, 2024 11:00am - 11:20am EDT
1E03

11:20am EDT

Acoustic Characteristics of Parasaurolophus Crest: Experimental Results from a simplified anatomical model
Wednesday October 9, 2024 11:20am - 11:40am EDT
This study presents a revised acoustic model of the Parasaurolophus crest, incorporating both the main airway and lateral diverticulum, based on previous anatomical models and recent findings. A physical device, as a simplified model of the crest, was constructed using a coupled piping system, and frequency sweeps were conducted to investigate its resonance behavior. Data were collected using a minimally invasive microphone, with a control group consisting of a simple open pipe for comparison. The results show that the frequency response of the experimental model aligns with that of the control pipe at many frequencies, but notable shifts and peak-splitting behavior were observed, suggesting a more active role of the lateral diverticulum in shaping the acoustic response than previously thought. These findings challenge earlier closed-pipe approaches, indicating that complex interactions between the main airway and lateral diverticulum generate additional resonant frequencies absent in the control pipe. The study provides empirical data that offer new insights into the resonance characteristics of the Parasaurolophus crest and contribute to understanding its auditory range, particularly for low-frequency sounds.
Moderators
avatar for Marina Bosi

Marina Bosi

Stanford University
Marina Bosi,  AES Past President, is a founding Director of the Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) and the Chair of the Context-based Audio Enhancement (MPAI-CAE) Development Group and IEEE SA CAE WG.  Dr. Bosi has served the Society as President... Read More →
Speakers Authors
Wednesday October 9, 2024 11:20am - 11:40am EDT
1E03

11:40am EDT

Interpreting user-generated audio from war zones
Wednesday October 9, 2024 11:40am - 12:00pm EDT
Increasingly, civilian inhabitants and combatants in conflict areas use their mobile phones to record video and audio of armed attacks. These user-generated recordings (UGRs) often provide the only source of immediate information about armed conflicts because access by professional journalists is highly restricted. Audio forensic analysis of these UGRs can help document the circumstances and aftermath of war zone incidents, but consumer off-the-shelf recording devices are not designed for battlefield circumstances and sound levels, nor do the battlefield circumstances provide clear, noise-free audio. Moreover, as with any user-generated material that generally does not have a documented chain-of-custody, there are forensic concerns about authenticity, misinformation, and propaganda that must be considered. In this paper we present several case studies of UGRs from armed conflict areas and describe several methods to assess the quality and integrity of the recorded audio. We also include several recommendations for amateurs who make UGRs so that the recorded material is more easily authenticated and corroborated. Audio and video examples are presented.
Moderators
avatar for Marina Bosi

Marina Bosi

Stanford University
Marina Bosi,  AES Past President, is a founding Director of the Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) and the Chair of the Context-based Audio Enhancement (MPAI-CAE) Development Group and IEEE SA CAE WG.  Dr. Bosi has served the Society as President... Read More →
Speakers
avatar for Rob Maher

Rob Maher

Professor, Montana State University
Audio digital signal processing, audio forensics, music analysis and synthesis.
Authors
avatar for Rob Maher

Rob Maher

Professor, Montana State University
Audio digital signal processing, audio forensics, music analysis and synthesis.
Wednesday October 9, 2024 11:40am - 12:00pm EDT
1E03

12:00pm EDT

Experimental analysis of a car loudspeaker model based on imposed vibration velocity: effect of membrane discretization
Wednesday October 9, 2024 12:00pm - 12:20pm EDT
Nowadays, the research about the improvement of the interior sound quality of road vehicles is a relevant task. The cabin is an acoustically challenging environment due to the complex geometry, the different acoustic properties of the materials of cabin components and the presence of audio systems based on multiple loudspeaker units. This paper aims at presenting a simplified modelling approach designed to introduce the boundary condition imposed by a loudspeaker to the cabin system in the context of virtual acoustic analysis. The proposed model is discussed and compared with experimental measurements obtained from a test-case loudspeaker.
Moderators
avatar for Marina Bosi

Marina Bosi

Stanford University
Marina Bosi,  AES Past President, is a founding Director of the Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) and the Chair of the Context-based Audio Enhancement (MPAI-CAE) Development Group and IEEE SA CAE WG.  Dr. Bosi has served the Society as President... Read More →
Speakers Authors
Wednesday October 9, 2024 12:00pm - 12:20pm EDT
1E03

12:20pm EDT

A novel derivative-based approach for the automatic detection of time-reversed audio in the MPAI/IEEE-CAE ARP international standard
Wednesday October 9, 2024 12:20pm - 12:50pm EDT
The Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) Context-based Audio Enhancement (CAE) Audio Recording Preservation (ARP) standard provides the technical specifications for a comprehensive framework for digitizing and preserving analog audio, specifically focusing on documents recorded on open-reel tapes. This paper introduces a novel, envelope derivative-based method incorporated within the ARP standard to detect reverse audio sections during the digitization process. The primary objective of this method is to automatically identify segments of audio recorded in reverse. Leveraging advanced derivative-based signal processing algorithms, the system enhances its capability to detect and reverse such sections, thereby reducing errors during analog-to-digital (A/D) conversion. This feature not only aids in identifying and correcting digitization errors but also improves the efficiency of large-scale audio document digitization projects. The system's performance has been evaluated using a diverse dataset encompassing various musical genres and digitized tapes, demonstrating its effectiveness across different types of audio content.
Moderators
avatar for Marina Bosi

Marina Bosi

Stanford University
Marina Bosi,  AES Past President, is a founding Director of the Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) and the Chair of the Context-based Audio Enhancement (MPAI-CAE) Development Group and IEEE SA CAE WG.  Dr. Bosi has served the Society as President... Read More →
Authors
Wednesday October 9, 2024 12:20pm - 12:50pm EDT
1E03

2:00pm EDT

Auditory Envelopment and Affective Touch Hypothesis
Wednesday October 9, 2024 2:00pm - 2:30pm EDT
Anticipation and pleasure in response to music listening can lead to dopamine release in the human striatal system, a neural midbrain reward and motivation cluster. The sensation of auditory envelopment, however, may also in itself have a stimulating effect. Theoretical reason and circumstantial evidence are given why this could be the case, thereby possibly constituting an auditory complement to a newly discovered and studied percept, affective touch, originating from C-tactile fibres in the skin, stimulated certain ways. In a pilot test, abstract sounds were used to determine the audibility of low frequency inter-aural fluctuation. Naïve subjects aged between 6 and 96 years were all sensitive to the conditions tested and asked to characterize the stimuli in their own words. Based on these results, controlling low frequency inter-aural fluctuation in listeners should be a priority when recording, mixing, distributing and reproducing audio.
Moderators Speakers
avatar for Thomas Lund

Thomas Lund

Senior Technologist, Genelec Oy
Thomas Lund has authored papers on human perception, spatialisation, loudness, sound exposure and true-peak level. He is researcher at Genelec, and convenor of a working group on hearing health under the European Commission. Out of a medical background, Thomas previously served in... Read More →
Authors
avatar for Thomas Lund

Thomas Lund

Senior Technologist, Genelec Oy
Thomas Lund has authored papers on human perception, spatialisation, loudness, sound exposure and true-peak level. He is researcher at Genelec, and convenor of a working group on hearing health under the European Commission. Out of a medical background, Thomas previously served in... Read More →
Wednesday October 9, 2024 2:00pm - 2:30pm EDT
1E03

2:30pm EDT

A framework for high spatial-density auditory data displays using Matlab and Reaper
Wednesday October 9, 2024 2:30pm - 3:00pm EDT
This research aimed to develop a software framework to study and optimize mapping strategies for complex data presented with auditory displays with high spatial resolution, such as wave-field synthesis and higher-order ambisonics systems. Our wave field synthesis system, the Collaborative-Research Augmented Immersive Virtual Environment Laboratory (CRAIVE-Lab), has a 128 full-range loudspeaker system along the circumference of the lab. We decided to use available software music synthesizers because they are built for excellent sound quality, and much knowledge exists on how to program analog synthesizers and other common methods for a desired sound output. At the scale of 128 channels, feeding 128 synthesizers with complex data was not practical for us for initial explorations because of computational resources and the complexity of data flow infrastructure. The proposed framework was programmed in Matlab, using Weather data from the NOAA database for initial exploration. Data is processed from 128 weather stations from East to West in the US spatially aligned with latitude. A MIDI script, in sequential order for all 128 channels, is compiled from converted weather parameters like temperature, precipitation amount, humidity, and wind speed. The MIDI file is then imported into Reaper to render a single sound file using software synthesizers that are operated with the MIDI file control data instructions. The rendered file is automatically cut into the 128 channels in Matlab and reimported into Reaper for audio playback.
Wednesday October 9, 2024 2:30pm - 3:00pm EDT
1E03

3:00pm EDT

Immersive Voice and Audio Services (IVAS) Codec – The New 3GPP Standard for Immersive Communication
Wednesday October 9, 2024 3:00pm - 3:30pm EDT
The recently standardized 3GPP codec for Immersive Voice and Audio Services (IVAS) is the first fully immersive communication codec designed for 5G mobile systems. The IVAS codec is an extension of the mono 3GPP EVS codec and offers additional support for coding and rendering of stereo, multi-channel, scene-based audio (Ambisonics), objects and metadata-assisted spatial audio. The IVAS codec enables completely new service scenarios with interactive stereo and immersive audio in communication, content sharing and distribution. This paper provides an overview of the underlying architecture and new audio coding and rendering technologies. Listening test results show the performance of the new codec in terms of compression efficiency and audio quality.
Moderators Speakers Authors
avatar for Adriana Vasilache

Adriana Vasilache

Nokia Technologies
avatar for Andrea Genovese

Andrea Genovese

Research, Qualcomm
Andrea Genovese is a Senior Researcher Engineer at Qualcomm Technologies Inc. working in Multimedia R&D. Andrea specializes in spatial audio and psychoacoustics, acoustic simulations, networked immersive distributed audio, and signal processing for environmental awareness. In 2023... Read More →
Wednesday October 9, 2024 3:00pm - 3:30pm EDT
1E03

3:30pm EDT

Enhancing Spatial Post-Filters through Non-Linear Combinations
Wednesday October 9, 2024 3:30pm - 4:00pm EDT
This paper introduces a method to enhance the spatial selectivity of spatial post-filters estimated with first-order directional signals. The approach involves applying non-linear transformations on two different spatial post-filters and combining them with weights found by convex optimization of the resulting directivity patterns. The estimation of the post-filters is carried out similarly to the Cross Pattern Coherence (CroPaC) algorithm. The performance of the proposed method is evaluated in a two- and three-speaker scenario with different reverberation times and angular distances of the interfering speaker. The signal-to-interference, signal-to-distortion, and signal-to-artifact ratios are used for evaluation. The results show that the proposed method can improve the spatial selectivity of the post-filter estimated with first-order beampatterns. Using first-order patterns only, it even achieves better spatial separation than the original CroPaC post-filter estimated using first- and second-order signals.
Moderators Speakers
avatar for Stefan Wirler

Stefan Wirler

Aalto University
I'm a PhD student at the Aalto Acoustics Labs, in the Group of Ville Pulkki. I started my PhD studies in 2020. I received my Masters degree in Electrical Engineering from the Friedrich-Alexander University Erlangen-Nuremberg (FAU). My Master thesis was condicted at the AudioLabs in... Read More →
Authors
avatar for Stefan Wirler

Stefan Wirler

Aalto University
I'm a PhD student at the Aalto Acoustics Labs, in the Group of Ville Pulkki. I started my PhD studies in 2020. I received my Masters degree in Electrical Engineering from the Friedrich-Alexander University Erlangen-Nuremberg (FAU). My Master thesis was condicted at the AudioLabs in... Read More →
Wednesday October 9, 2024 3:30pm - 4:00pm EDT
1E03

4:00pm EDT

The Impact of Height Microphone Layer Position on Perceived Realism of Organ Recording Reproduction
Wednesday October 9, 2024 4:00pm - 4:30pm EDT
For on-site immersive recordings, height microphones are often placed carefully to avoid a distorted or unrealistic image, with many established immersive microphone arrays placing the height microphones 1.5 m or less above the horizontal layer. However, with an instrument so acoustically symbiotic with its space as the pipe organ, the impact of non-coincident height microphone placement has not previously been explored in-depth. Despite this, the pipe organ's radiation characteristics may benefit from non-coincident height microphone placement, providing subjectively improved tone color without sacrificing perceived realism. Subjective listening tests were conducted comparing a pipe organ recording with coincident and non-coincident height microphone positions. The findings of this case study conclude that non-coincident height microphone placement does not significantly impact perceived realism of the immersive organ recording.
Moderators Speakers
avatar for Jessica Luo

Jessica Luo

Graduate Student, New York University
Authors
avatar for Jessica Luo

Jessica Luo

Graduate Student, New York University
avatar for Garrett Treanor

Garrett Treanor

New York University
Wednesday October 9, 2024 4:00pm - 4:30pm EDT
1E03

4:30pm EDT

Spatial Matrix Synthesis
Wednesday October 9, 2024 4:30pm - 5:00pm EDT
Spatial Matrix synthesis is presented in this paper. This modulation synthesis technique creates acoustic velocity fields from acoustic pressure signals by using spatial transformation matrices, thus generating complete sound fields for spatial audio. The analysis presented here focuses on orthogonal rotation matrices in both two and three dimensions and compares the results in each scenario with other sound modulation synthesis methods, including amplitude and frequency modulation. As an alternative method for spatial sound synthesis that exclusively modifies the acoustic velocity vector through effects comparable to those created by both amplitude and frequency modulations, Spatial Matrix synthesis is argued to generate inherently spatial sounds, giving this method the potential to become a new musical instrument for spatial music.
Moderators Speakers
avatar for Timothy Schmele

Timothy Schmele

Researcher, Eurecat
Researcher in audio, audio technology, immersive audio, sonification and composition practices. Composer of electroacoustic music.
Authors
avatar for Timothy Schmele

Timothy Schmele

Researcher, Eurecat
Researcher in audio, audio technology, immersive audio, sonification and composition practices. Composer of electroacoustic music.
Wednesday October 9, 2024 4:30pm - 5:00pm EDT
1E03
 
Thursday, October 10
 

10:00am EDT

Delay detection in hearing with moving audio objects at various azimuths and bandwidths
Thursday October 10, 2024 10:00am - 10:20am EDT
In order to design efficient binaural rendering systems for 3D audio, it’s important to understand how delays in updating the relative directions of sound sources from a listener to compensate for listener’s head movements affect the sense of realism. However, sufficient study has not yet been done with this problem. We therefore investigated the delay detection capability (threshold) of hearing during localization of audio objects. We used moving sound sources emitted from loudspeakers emulating smooth update of head related transfer functions (HRTFs) and delayed update of HRTFs. We investigated the delay detection threshold with different bandwidth, direction, and speed of sound source signals. As a result, the delay detection thresholds in this experiment were found to be approximately 100ms to 500ms, and it was observed that the detection thresholds vary depending on the bandwidth and direction of the sound source. On the other hand, no significant variation in detection thresholds was observed based on the speed of sound source movement.
Moderators
avatar for Sascha Dick

Sascha Dick

Sascha Dick received his Dipl.-Ing. degree in Information and Communication Technologies from the Friedrich Alexander University (FAU) of Erlangen-Nuremberg, Germany in 2011 with a thesis on an improved psychoacoustic model for spatial audio coding, and joined the Fraunhofer Institute... Read More →
Speakers
avatar for Masayuki Nishiguchi

Masayuki Nishiguchi

Professor, Akita Prefectural University
Masayuki Nishiguchi received his B.E., M.S., and Ph.D. degrees from Tokyo Institute of Technology, University of California Santa Barbara, and Tokyo Institute of Technology, in 1981, 1989, and 2006 respectively.  He was with Sony corporation from 1981 to 2015, where he was involved... Read More →
Authors
avatar for Masayuki Nishiguchi

Masayuki Nishiguchi

Professor, Akita Prefectural University
Masayuki Nishiguchi received his B.E., M.S., and Ph.D. degrees from Tokyo Institute of Technology, University of California Santa Barbara, and Tokyo Institute of Technology, in 1981, 1989, and 2006 respectively.  He was with Sony corporation from 1981 to 2015, where he was involved... Read More →
Thursday October 10, 2024 10:00am - 10:20am EDT
1E03

10:20am EDT

Expanding and Analyzing ODAQ - The Open Dataset of Audio Quality
Thursday October 10, 2024 10:20am - 10:40am EDT
Datasets of processed audio signals along with subjective quality scores are instrumental for research into perception-based audio processing algorithms and objective audio quality metrics. However, openly available datasets are scarce due to listening test effort and copyright concerns limiting the distribution of audio material in existing datasets. To address this problem, Open Dataset of Audio Quality (ODAQ) was introduced, containing audio material along with extensive subjective test results with permissive licenses. The dataset comprises processed audio material with six different classes of signal impairments in multiple levels of processing strength covering a wide range of quality levels. The subjective quality evaluation has recently been extended and now comprises results from three international laboratories providing a total 42 listeners and 10080 subjective scores overall. Furthermore, ODAQ was recently expanded by a performance evaluation of common objective metrics for perceptual quality evaluation in their ability to predict subjective scores. The wide variety of audio material and test subjects in the test provides insight into influences and biases in subjective evaluation, which we investigated by statistical analysis, finding listener-based, training-based and lab-based influences. We also demonstrate the methodology for contributing to ODAQ, and make a request for additional contributors. In conclusion, the diversity of the processing methods and quality levels, along with a large pool of international listeners and permissive licenses make ODAQ particularly suited for further research into subjective and objective audio quality.
Moderators
avatar for Sascha Dick

Sascha Dick

Sascha Dick received his Dipl.-Ing. degree in Information and Communication Technologies from the Friedrich Alexander University (FAU) of Erlangen-Nuremberg, Germany in 2011 with a thesis on an improved psychoacoustic model for spatial audio coding, and joined the Fraunhofer Institute... Read More →
Authors
avatar for Christoph Thompson

Christoph Thompson

Director of Music Media Production, Ball State University
Christoph Thompson is vice-chair of the AES audio education committee. He is the chair of the AES Student Design Competition and the Matlab Plugin Design Competition. He is the director of the music media production program at Ball State University. His research topics include audio... Read More →
avatar for Pablo Delgado

Pablo Delgado

Fraunhofer IIS
Pablo Delgado is part of the scientific staff of the Advanced Audio Research Group at the Fraunhofer Institute for Integrated Circuits (IIS) in Erlangen, Germany. He specializes in psychoacoustics applied to audio and speech coding, as well as machine learning applications in audio... Read More →
Thursday October 10, 2024 10:20am - 10:40am EDT
1E03

10:40am EDT

Perceptual Evaluation of Hybrid Immersive Audio Systems in Orchestral Settings
Thursday October 10, 2024 10:40am - 11:00am EDT
This study investigates the perceptual strengths and weaknesses of various immersive audio capture techniques within an orchestral setting, employing channel-based, object-based, and scene-based methodologies concurrently. Conducted at McGill University’s Pollack Hall in Montreal, Canada, the research featured orchestral works by Boulanger, Prokofiev, and Schubert, performed by the McGill Symphony Orchestra in April 2024.
The innovative aspect of this study lies in the simultaneous use of multiple recording techniques, employing traditional microphone setups such as a Decca tree with outriggers, alongside an experimental pyramidal immersive capture system and a 6th order Ambisonic em64 “Eigenmike.” These diverse methodologies were selected to capture the performance with high fidelity and spatial accuracy, detailing both the performance's nuances and the sonic characteristics imparted by the room. The capture of this interplay is the focus of this study.
The project aimed to document the hall's sound quality in its last orchestral performance before closing for 2 years for renovations, providing the methodology and documentation needed for future comparative recordings of the acoustics before and after. The pyramidal system, designed with exaggerated spacing, improves decorrelation at low frequencies, allowing for the impression of a large room within a smaller listening space. Meanwhile, Ambisonic recordings provided insights into single-point versus spaced multi-viewpoint capture.
Preliminary results from informal subjective listening sessions suggest that combining different systems offers potential advantages over any single method alone, supporting exploration of hybrid solutions as a promising area of study for audio recording, enhancing the realism and spatial immersion of orchestral music recordings.
Moderators
avatar for Sascha Dick

Sascha Dick

Sascha Dick received his Dipl.-Ing. degree in Information and Communication Technologies from the Friedrich Alexander University (FAU) of Erlangen-Nuremberg, Germany in 2011 with a thesis on an improved psychoacoustic model for spatial audio coding, and joined the Fraunhofer Institute... Read More →
Speakers Authors
avatar for Kathleen Ying-Ying Zhang

Kathleen Ying-Ying Zhang

PhD Candidate, McGill University
YIng-Ying Zhang is a music technology researcher and sound engineer. She is currently a PhD candidate at McGill University in the Sound Recording program where her research focuses on musician-centered virtual acoustic applications in recording environments. She received her Masters... Read More →
avatar for Richard King

Richard King

Professor, McGill University
Richard King is an Educator, Researcher, and a Grammy Award winning recording engineer. Richard has garnered Grammy Awards in various fields including Best Engineered Album in both the Classical and Non-Classical categories. Richard is an Associate Professor at the Schulich School... Read More →
Thursday October 10, 2024 10:40am - 11:00am EDT
1E03

11:00am EDT

Perception of the missing fundamental in vibrational complex tones
Thursday October 10, 2024 11:00am - 11:20am EDT
We present a study on the perception of the missing fundamental in vibrational complex tones. When asked to match an audible frequency to the frequency of a vibrational tone with a missing fundamental frequency, participants in two experiments associated the audible frequency with lower frequencies than those present in the vibration, often corresponding to the missing fundamental of the vibrational tone. This association was found regardless of whether the vibration was presented on the back (first experiment) or the feet (second experiment). One possible application of this finding could be the reinforcement of low frequencies via vibrational motors even when such motors have high resonance frequencies
Moderators
avatar for Sascha Dick

Sascha Dick

Sascha Dick received his Dipl.-Ing. degree in Information and Communication Technologies from the Friedrich Alexander University (FAU) of Erlangen-Nuremberg, Germany in 2011 with a thesis on an improved psychoacoustic model for spatial audio coding, and joined the Fraunhofer Institute... Read More →
Speakers Authors
Thursday October 10, 2024 11:00am - 11:20am EDT
1E03

11:20am EDT

Perceptual loudness compensation for evaluation of personalized earbud equalization
Thursday October 10, 2024 11:20am - 11:40am EDT
The ear canal geometries of individuals are very different, resulting in significant variations in SPL at the drum reference point (DRP). Knowledge of the personalized transfer function from a near-field microphone (NFM) in the earbud speaker tip to DRP allows personalized equalization (P.EQ). A method to perceptually compensate for loudness perception within the evaluation of different personalized equalization filters for earbuds has been developed. The method includes: measurements at NFM to estimate the personalized transfer function from the NFM point to DRP, calibration of the NFM microphone, the acquisition of the transfer function from the earbuds' speaker terminals to NFM, and the estimation of perceptual loudness in phones at DRP, when applying the different equalization filters. The loudness estimation was computed using the Moore-Glasberg method as implemented in ISO 532-2. The difference to DRP estimation was recursively adjusted using pink noise until the delta was within 0.1 dB. The corresponding gains were applied to the different conditions to be evaluated. A listening test was performed to evaluate three conditions using the described method for loudness compensation.
Moderators
avatar for Sascha Dick

Sascha Dick

Sascha Dick received his Dipl.-Ing. degree in Information and Communication Technologies from the Friedrich Alexander University (FAU) of Erlangen-Nuremberg, Germany in 2011 with a thesis on an improved psychoacoustic model for spatial audio coding, and joined the Fraunhofer Institute... Read More →
Speakers
avatar for Adrian Celestinos

Adrian Celestinos

Samsung Research America
Authors
Thursday October 10, 2024 11:20am - 11:40am EDT
1E03

11:40am EDT

The audibility of true peak distortion (0 dBFS+)
Thursday October 10, 2024 11:40am - 12:00pm EDT
In a recent study, the authors interviewed five professional mastering engineers on the topic of contemporary loudness practices in music. Among the findings, all five mastering engineers targeted their peak levels very close to 0 dBFS and seemed somewhat unconcerned regarding true peak distortion emerging in the transcoding process, not adapting to the current recommendations of not exceeding -1 dB true peak. Furthermore, true peak measurements over the last four decades show that quite a few releases even measure true peaks above 0 dBFS in full quality. The aim of this study is to investigate the audibility of such overshoots by conducting a tailored listening test. The results indicate that even experienced and trained listeners may not be very sensitive to true peak distortion.
Moderators
avatar for Sascha Dick

Sascha Dick

Sascha Dick received his Dipl.-Ing. degree in Information and Communication Technologies from the Friedrich Alexander University (FAU) of Erlangen-Nuremberg, Germany in 2011 with a thesis on an improved psychoacoustic model for spatial audio coding, and joined the Fraunhofer Institute... Read More →
Speakers
avatar for Pål Erik Jensen

Pål Erik Jensen

University College Teacher, Høyskolen Kristiania
TeachingAudio productionMusic studio productionPro ToolsGuitarBass
Authors
avatar for Pål Erik Jensen

Pål Erik Jensen

University College Teacher, Høyskolen Kristiania
TeachingAudio productionMusic studio productionPro ToolsGuitarBass
avatar for Tore Teigland

Tore Teigland

Professor, Kristiania University College
Thursday October 10, 2024 11:40am - 12:00pm EDT
1E03

12:00pm EDT

Evaluation of sound colour in headphones used for monitoring
Thursday October 10, 2024 12:00pm - 12:20pm EDT
Extensive studies have been made into achieving generally enjoyable sound colour in headphone listening, but few publications have been written focusing on the demanding requirements of a single audio professional, and what they actually hear. The present paper describes a structured and practical method, based on in-room monitoring, for getting to know yourself as a headphone listener, and the particular model and pair you are using. Headphones provide fundamentally different listening results compared to in-room monitoring adhering to professional standards; considering imaging, auditory envelopment, localization, haptic cues etc. Moreover, in headphone listening there may be no direct connection between the frequency response measured with a generic manikin and what a given user hears. Finding out just how a pair of headphones deviates from neutral sound colour must therefore be achieved personally. An evaluation scheme based on an ultra-nearfield reference system is described, augmented by a defined test setup and procedure.
Moderators
avatar for Sascha Dick

Sascha Dick

Sascha Dick received his Dipl.-Ing. degree in Information and Communication Technologies from the Friedrich Alexander University (FAU) of Erlangen-Nuremberg, Germany in 2011 with a thesis on an improved psychoacoustic model for spatial audio coding, and joined the Fraunhofer Institute... Read More →
Speakers
avatar for Thomas Lund

Thomas Lund

Senior Technologist, Genelec Oy
Thomas Lund has authored papers on human perception, spatialisation, loudness, sound exposure and true-peak level. He is researcher at Genelec, and convenor of a working group on hearing health under the European Commission. Out of a medical background, Thomas previously served in... Read More →
Authors
avatar for Thomas Lund

Thomas Lund

Senior Technologist, Genelec Oy
Thomas Lund has authored papers on human perception, spatialisation, loudness, sound exposure and true-peak level. He is researcher at Genelec, and convenor of a working group on hearing health under the European Commission. Out of a medical background, Thomas previously served in... Read More →
Thursday October 10, 2024 12:00pm - 12:20pm EDT
1E03

2:20pm EDT

2nd order Boom Microphone
Thursday October 10, 2024 2:20pm - 2:40pm EDT
Boom microphones are mostly used in noisy environments and equipped with directional microphones to enhance speech pickup from talkers and suppress noise from surroundings. Theory suggests that using a 2nd order directional microphone array, that is simply a pair of two directional microphones, would greatly increase the amount of far-field noise rejection, with negligibly little degradation of the near-field pickup. We conducted a series of laboratory measurements to validate the theory and assess the feasibility of 2nd order boom microphone applications. The laboratory measurements were designed using Knowles Electronics Manikin for Acoustic Research (KEMAR) with two speakers to play cockpit noise, one being placed on axis of the KEMAR front and one being set to side. A mouth simulator inside KEMAR played artificial voice. The measurements were done at two distances, 6 and 25 mm, from the KEMAR mouth with a 1st order boom microphone, that is a single directional boom microphone, and a 2nd order boom microphone. Spice simulations were also conducted to support the experimental findings. Both the measurements and Spice simulations proved the theory was true until the 2nd order boom microphone was placed near KEMAR. Then, reflections off the KEMAR head degraded the performance of the 2nd order microphone while improving that of the 1st order microphone. The net result shows that the 2nd order microphone is not superior than the 1st order microphone. This article describes details of the theory, our experimental measurements, and the findings.
Moderators
PG

Paul Geluso

Director of the Music Technolo, New York University
Speakers
avatar for Jeong Nyeon Kim

Jeong Nyeon Kim

Senior Electro-Acoustic Application Engineer, Knowles Electrics
Authors
avatar for Jeong Nyeon Kim

Jeong Nyeon Kim

Senior Electro-Acoustic Application Engineer, Knowles Electrics
Thursday October 10, 2024 2:20pm - 2:40pm EDT
1E03

2:40pm EDT

Improved Analogue-to-Digital Converter for High-quality Audio
Thursday October 10, 2024 2:40pm - 3:00pm EDT
A high-oversampled, low-bit modulator typical of modern audio ADCs needs a downsampler to provide PCM audio at sampling rates ranging from 44.1kHz to 768kHz. Traditionally, a multistage downsampler requantizes at each stage, raising questions about audio transparency. We present a decimator design in which there is no requantization other than a single dithered quantization when necessary to produce a final audio output of finite precision such as 24 bits. All processing is minimum-phase and concordant with the principles introduced in [1] which optimize for a specific compact impulse response and minimal (zero) modulation noise. [2]
Moderators
PG

Paul Geluso

Director of the Music Technolo, New York University
Speakers Authors
Thursday October 10, 2024 2:40pm - 3:00pm EDT
1E03

3:00pm EDT

The sound of storytelling: the role of sound design and music in the ’drama’ genre of film
Thursday October 10, 2024 3:00pm - 3:20pm EDT
The integration of sound effects and music plays a central role in shaping the audience's emotional engagement and narrative comprehension in film. The 'drama' genre in film is primarily concerned with depicting human emotions and human narrative-based storytelling. Ten scenes were analysed, with participants in three groups exposed to six combinations of audio and visual stimuli. Participants reported salient sounds and their interpretations, focusing on context and emotional responses. One hypothesis is that effective sound design blurs the line between music and sound effects; another is that music conveys more emotion while sound effects enhance immersion. The results showed that 63\% of participants found the score more relevant to the context. The evaluation highlights that music alone emphasizes certain emotions more, while sound effects alone create moderate variability between emotion and sound identification.
Moderators
PG

Paul Geluso

Director of the Music Technolo, New York University
Speakers Authors
Thursday October 10, 2024 3:00pm - 3:20pm EDT
1E03
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.