Loading…
AES Show 2024 NY has ended
Exhibits+ badges provide access to the ADAM Audio Immersive Room, the Genelec Immersive Room, Tech Tours, and the presentations on the Main Stage.

All Access badges provide access to all content in the Program (Tech Tours still require registration)

View the Exhibit Floor Plan.
strong>1E03 [clear filter]
arrow_back View All Dates
Tuesday, October 8
 

9:30am EDT

An Industry Focused Investigation into Immersive Commercial Melodic Rap Production - Part Two
Tuesday October 8, 2024 9:30am - 9:50am EDT
In part one of this study, five professional mixing engineers were asked to create a Dolby Atmos 7.1.4 mix of the same melodic rap song adhering to the following commercial music industry specifications: follow the framework of the stereo reference, implement binaural distance settings, and conform to –18LKFS, -1dBTP loudness levels. An analysis of the mix sessions and post-mix interviews with the engineers revealed that they felt creatively limited in their approaches due to the imposed industry specifications. The restricted approaches were evident through the minimal applications of mix processing, automation, and traditional positioning of key elements in the completed mixes.
In part two of this study, the same mix engineers were asked to complete a second mix of the same song without any imposed limitations and were encouraged to approach the mix creatively. Intra-subject comparisons between the restricted and unrestricted mixes were explored to identify differences in element positioning, mix processing techniques, panning automation, loudness levels, and binaural distance settings. Analysis of the mix sessions and interviews showed that when no restrictions were imposed on their work, the mix engineers emphasized the musical narrative through more diverse element positioning, increased use of automation, and applications of additional reverb with characteristics that differed from the reverb in the source material.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers
avatar for Christal Jerez

Christal Jerez

Engineer, Christal's Sonic Lab
Christal Jerez is an audio engineer with experience recording, mixing and mastering music. After studying audio production at American University for her B.A. in Audio Production and at New York University for her Masters degree in Music Technology, she started working professionally... Read More →
Authors
avatar for Christal Jerez

Christal Jerez

Engineer, Christal's Sonic Lab
Christal Jerez is an audio engineer with experience recording, mixing and mastering music. After studying audio production at American University for her B.A. in Audio Production and at New York University for her Masters degree in Music Technology, she started working professionally... Read More →
avatar for Andrew Scheps

Andrew Scheps

Owner, Tonequake Records
Andrew Scheps has worked with some of the biggest bands in the world: Green Day, Red Hot Chili Peppers, Weezer, Audioslave, Black Sabbath, Metallica, Linkin Park, Hozier, Kaleo and U2. He’s worked with legends such as Johnny Cash, Neil Diamond and Iggy Pop, as well as indie artists... Read More →
avatar for Hyunkook Lee

Hyunkook Lee

Professor, Applied Psychoacoustics Lab, University of Huddersfield
Professor
Tuesday October 8, 2024 9:30am - 9:50am EDT
1E03

9:50am EDT

Investigation of spatial resolution of first and high order ambisonics microphones as capturing tool for auralization of real spaces in recording studios equipped with virtual acoustics systems
Tuesday October 8, 2024 9:50am - 10:10am EDT
This paper proposes a methodology for studying the spatial resolution of a collection of first-order and high-order ambisonic microphones when employed as a capturing tool of Spatial Room Impulse Responses (SRIRs) for virtual acoustics applications. In this study, the spatial resolution is defined as the maximum number of mono statistically independent impulse responses that can be extracted through beamforming technique and used in multichannel convolution reverbs. The correlation of the responses is assessed as a function of the beam angle and frequency bands, adapted to the frequency response of the loudspeakers in use, with the aim to be used in recording studios equipped with virtual acoustics systems that operate in the creation of the spatial impression of reverberation of real environments. The study examines the differences introduced by the physical characteristics of the microphones, the normalization methodologies of the spherical harmonics, and the number of spherical harmonics introduced in the encoding (ambisonic order). Preliminary results show that the correlation is inversely proportional to frequency as a function of wavelength. 
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers
avatar for Gianluca Grazioli

Gianluca Grazioli

Montreal, Canada, McGill University
Authors
Tuesday October 8, 2024 9:50am - 10:10am EDT
1E03

10:10am EDT

A comparative study of volumetric microphone techniques and methods in a classical recording context
Tuesday October 8, 2024 10:10am - 10:30am EDT
This paper studies volumetric microphone techniques (i.e. using configurations of multiple Ambisonic microphones) in a classical recording context. A pilot study with expert opinions was designed to show its feasibility. Based on the findings from the pilot study, a trio recording of piano, violin, and cello was conducted where 6 Ambisonic microphones established a hexagon. Such a volumetric approach is believed to improve the sound characteristics where the recordings were processed with the SoundField by RØDE Ambisonic decoder and were produced into a 7.0.4 loudspeaker system. A blinded subject experiment was designed where the participants were asked to evaluate the volumetric hexagonal configuration, comparing it to a more traditional 5.0 immersive configuration and a single Ambisonic microphone, all of which were mixed with spot microphones. These results were quantitatively analyzed, and revealed that the volumetric configuration is the most localized amongst all, but less immersive than the single Ambisonic microphone. No significant difference occurred in focus, naturalness, and preference. The analyses are generalized because the demographic backgrounds of the participants have no effect on the sound characteristics.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers Authors
avatar for Parichat Songmuang

Parichat Songmuang

Studio Manager/PhD Student, New York University
Parichat Songmuang graduated from New York University with her Master of Music degree in Music Technology at New York University and Advanced Certificate in Tonmeister Studies. As an undergraduate, she studied for her Bachelor of Science in Electronics Media and Film with a concentration... Read More →
PG

Paul Geluso

Director of the Music Technolo, New York University
Tuesday October 8, 2024 10:10am - 10:30am EDT
1E03

10:30am EDT

Bestiari: a hypnagogic experience created by combining complementary state-of-the-art spatial sound technologies, Catalan Pavilion, Venice Art Biennale 2024
Tuesday October 8, 2024 10:30am - 10:50am EDT
Bestiari, by artist Carlos Casas1, is a spatial audio installation created as the Catalan pavilion for the 2024 Venice Art Biennale. The installation was designed for ambulant visitors and the use of informal seating arrangements distributed throughout the reproduction space, so the technical installation design did not focus on listeners’ presence in a single “sweet-spot”. While high-quality conventional spatial loudspeaker arrays typically provide excellent surround-sound experiences, the particular challenge of this installation was to reach into the proximate space of individual, dispersed and mobile listeners, rather than providing an experience that was only peripherally enveloping. To that end, novel spatial audio workflows and combinations of reproduction technologies were employed, including: High-order Ambisonic (HoA), Wavefield Synthesis (WFS), beamforming icosahedral (IKO), directional/parametric ultrasound, and infrasound. The work features sound recordings made for each reproduction technology, e.g., ambient Ambisonic soundfields recorded in Catalan national parks combined with mono and stereo recordings of specific insects in that habitat simultaneously projected via the WFS system. In-situ production provided an opportunity to explore the differing attributes of the reproduction devices and their interactions with the acoustical characteristics of the space – a concrete and brick structure with a trussed wooden roof, built in the late 1800s for the Venetian shipping industry. The practitioners’ reflections on this exploration, including their perception of the capabilities of this unusual combination of spatial technologies, are presented. Design, workflows and implementation are detailed.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers
avatar for Craig Cieciura

Craig Cieciura

Research Fellow, University of Surrey
Craig graduated from the Music and Sound Recording (Tonmeister) course at The University of Surrey in 2016. He then completed his PhD at the same institution in 2022. His PhD topic concerned reproduction of object-based audio in the domestic environment using combinations of installed... Read More →
Authors
avatar for Craig Cieciura

Craig Cieciura

Research Fellow, University of Surrey
Craig graduated from the Music and Sound Recording (Tonmeister) course at The University of Surrey in 2016. He then completed his PhD at the same institution in 2022. His PhD topic concerned reproduction of object-based audio in the domestic environment using combinations of installed... Read More →
Tuesday October 8, 2024 10:30am - 10:50am EDT
1E03

10:50am EDT

Influence of Dolby Atmos versus Stereo Formats on Narrative Engagement: A Comparative Study Using Physiological and Self-Report Measures
Tuesday October 8, 2024 10:50am - 11:10am EDT
As spatial audio technology rapidly evolves, the conversation around immersion becomes ever more relevant, particularly in how these advancements enhance the creation of compelling sonic experiences. However, immersion is a complex, multidimensional construct, making it challenging to study in its entirety. This paper narrows the focus to one particular dimension—narrative engagement—to explore how it shapes the immersive experience. Specifically, we investigate whether the multichannel audio format, here 7.1.4, enhances narrative engagement compared to traditional stereo storytelling. Participants were exposed to two storytelling examples: one in an immersive format and another in a stereo fold-down. Physiological responses were recorded during listening sessions, followed by a self-report survey adapted from the Narrative Engagement Scale. The lack of significant differences between two formats in both subjective and objective measures is discussed in the context of existing studies.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers
avatar for Hyunkook Lee

Hyunkook Lee

Professor, Applied Psychoacoustics Lab, University of Huddersfield
Professor
Authors
avatar for Hyunkook Lee

Hyunkook Lee

Professor, Applied Psychoacoustics Lab, University of Huddersfield
Professor
Tuesday October 8, 2024 10:50am - 11:10am EDT
1E03

11:10am EDT

Creation of representative head-related impulse responses for binaural rendering of moving audio objects
Tuesday October 8, 2024 11:10am - 11:30am EDT
To achieve highly realistic 3D audio reproduction in virtual reality (VR) or augmented reality (AR) through binaural rendering, we must address the considerable computational complexity involved in convoluting head-related impulse responses (HRIRs). To reduce this complexity, an algorithm is proposed where audio signals are distributed to pre-defined representative directions through panning. Only the distributed signals are then convoluted with the corresponding HRIRs. In this study, we explored a method for generating representative HRIRs through learning, utilizing a full-sphere HRIR set. This approach takes into account smooth transitions and minimal degradation introduced during rendering, for both moving and static audio objects. Compared with conventional panning, the proposed method reduces average distortion by approximately 47% while maintaining the runtime complexity of the rendering.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers Authors
avatar for Masayuki Nishiguchi

Masayuki Nishiguchi

Professor, Akita Prefectural University
Masayuki Nishiguchi received his B.E., M.S., and Ph.D. degrees from Tokyo Institute of Technology, University of California Santa Barbara, and Tokyo Institute of Technology, in 1981, 1989, and 2006 respectively.  He was with Sony corporation from 1981 to 2015, where he was involved... Read More →
Tuesday October 8, 2024 11:10am - 11:30am EDT
1E03

11:30am EDT

Quantifying the Impact of Head-Tracked Spatial Audio on Common User Auditory Experiences using Facial Microexpressions
Tuesday October 8, 2024 11:30am - 11:50am EDT
The study aims to enhance the understanding of how Head Tracked Spatial Audio technology influences both emotional responses and immersion levels among listeners. By employing micro facial gesture recognition technology, it quantifies the depth of immersion and the intensity of emotional responses elicited by various types of binaural content, measuring categories such as Neutral, Happy, Sad, Angry, Surprised, Scared, Disgusted, Contempt, Valence, and Arousal. Subjects were presented with a randomized set of audio stimuli consisting of stereo music, stereo speech, and 5.1 movie content. Each audio piece lasted 15 seconds, and the Spatial Audio processing was On or Off randomly throughout the experiment. The FaceReader software was detecting the facial microexpressions of the subjects constantly. Statistical analysis was conducted using R software, applying Granger causality tests in time series, T-tests, and the P-value criterion for hypothesis validation. After consolidating the records of 78 participants, the final database consisted of 212,862 unique data points. With a 95% confidence, it was determined that the average level of "Arousal" is significantly higher when Head Tracked Spatial Audio is activated compared to when it is deactivated, suggesting that HT technology increases the emotional arousal of audio listeners. Regarding the happiness reaction, the highest levels were recorded in mode 5 (HT on and Voice) with an average of 0.038, while the lowest levels were detected in mode 6 (HT off and Voice). Preliminary conclusions indicate that surprise effectively causes a decrease in neutrality, supporting the dynamic interaction between these emotional variables.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers Authors
Tuesday October 8, 2024 11:30am - 11:50am EDT
1E03

11:50am EDT

Investigating the Role of Customized Interaural Time Differences on First-Person Shooter Gaming Performance
Tuesday October 8, 2024 11:50am - 12:10pm EDT
Binaural listening with personalized Head-Related Transfer Functions (HRTFs) is known to enhance a listener's auditory localization in virtual environments, including gaming. However, the methods for achieving personalized HRTFs are often inaccessible for average game players due to measurement complexity and cost. This study explores a simplified approach to improving game performance, particularly in First-Person Shooter (FPS) games by optimizing Interaural Time Difference (ITD). Recognizing that horizontal localization is particularly important for identifying opponent positions in FPS games, this study hypothesizes that optimizing ITD alone may be sufficient for better game performances, potentially alleviating the need for full HRTF personalization. To test this hypothesis, a simplified FPS game environment was developed in Unity. Participants performed tasks to detect sound positions under three HRTF conditions: MIT-KEMAR, Steam Audio’s default HRTF, and the proposed ITD optimization method. The results indicated that our proposed method significantly reduced players' response times compared to other HRTF conditions. These findings allow players to improve their gaming performance within FPS games through simplified HRTF optimization, broadening accessibility to optimized HRTFs for a wider range of game users.
Moderators
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor, New York University
Agnieszka Roginska is a Professor of Music Technology at New York University. She conducts research in the simulation and applications of immersive and 3D audio including the capture, analysis and synthesis of auditory environments, auditory displays and applications in augmented... Read More →
Speakers
avatar for Sungjoon Kim

Sungjoon Kim

Research Intern, Korea Advanced Institute of Science and Technology
Authors
avatar for Sungjoon Kim

Sungjoon Kim

Research Intern, Korea Advanced Institute of Science and Technology
avatar for Rai Sato

Rai Sato

Ph.D. Student, Korea Advanced Institute of Science and Technology
Rai Sato (佐藤 来) is currently pursuing a PhD at the Graduate School of Culture Technology at the Korea Advanced Institute of Science and Technology. He holds a Bachelor of Music from Tokyo University of the Arts, where he specialized in immersive audio recording and psychoacoustics... Read More →
Tuesday October 8, 2024 11:50am - 12:10pm EDT
1E03

2:00pm EDT

Towards prediction of high-fidelity earplug subjective ratings using acoustic metrics
Tuesday October 8, 2024 2:00pm - 2:30pm EDT
High-fidelity earplugs are used by musicians and live sound engineers to prevent hearing damage while allowing musical sounds to reach the eardrum without distortion. To determine objective methods for judging earplug fidelity in a similar way to headphones or loudspeakers, a small sample of trained listeners were asked to judge the attenuation level and clarity of music through seven commercially available passive earplugs. These scores were then compared to acoustic/musical metrics measured in a laboratory. It was found that NRR is strongly predictive of both attenuation and clarity scores, and that insertion loss flatness provides no advantage over NRR. A different metric measuring spectral flatness distortion seems to predict clarity independently from attenuation and will be subject to further study.
Moderators Speakers
avatar for David Anderson

David Anderson

Assistant Professor, University of Minnesota Duluth
Authors
avatar for David Anderson

David Anderson

Assistant Professor, University of Minnesota Duluth
Tuesday October 8, 2024 2:00pm - 2:30pm EDT
1E03

2:30pm EDT

Decoding Emotions: Lexical and Acoustical Cues in Vocal Affects
Tuesday October 8, 2024 2:30pm - 3:00pm EDT
This study investigates listeners’ ability to detect emotion from a diverse set of speech samples, including both spontaneous conversations and actor-posed speech. It explores the contributions of lexical content and acoustic properties when native listeners rate seven pairs of affective attributes. Two experimental conditions were employed: a text condition, where participants evaluated emotional attributes from written transcripts without vocal information, and a voice condition, where participants listened to audio recordings to assess emotions. Results showed that the importance of lexical and vocal cues varies across 14 affective states for posed and spontaneous speech. Vocal cues enhanced the expression of sadness and anger in posed speech, while they had less impact on conveying happiness. Notably, vocal cues tended to mitigate negative emotions conveyed by the lexical content in spontaneous speech. Further analysis on correlations between emotion ratings in text and voice conditions indicated that lexical meanings suggesting anger or hostility could be interpreted as positive affective states like intimacy or confidence. Linear regression analyses indicated that emotional ratings by native listeners could be predicted up to 59% by lexical content and up to 26% by vocal cues. Listeners relied more on vocal cues to perceive emotional tone when the lexical content was ambiguous in terms of feeling and attitude. Finally, the analysis identified statistically significant basic acoustical parameters and other non/para-linguistic information, after controlling for the effect of lexical content.
Moderators Speakers
EO

Eunmi Oh

Research Professor, Yonsei University
Authors
EO

Eunmi Oh

Research Professor, Yonsei University
Tuesday October 8, 2024 2:30pm - 3:00pm EDT
1E03

3:00pm EDT

A comparison of in-ear headphone target curves for the Brüel & Kjær Head & Torso Simulator Type 5128
Tuesday October 8, 2024 3:00pm - 3:30pm EDT
Controlled listening tests were conducted on five different in-ear (IE) headphone target curves measured on the latest ITU-T Type 4.3 ear simulator (e.g. Bruel & Kjaer Head & Torso Simulator Type 5128). A total of 32 listeners rated each target on a 100-point scale based on preference for three different music programs with two observations each. When averaged across all listeners, two target curves were found to be equally preferred over the other choices. Agglomerative hierarchical clustering analysis further revealed two classes of listeners based on dissimilarities in their preferred target curves. Class One (72% of listeners) preferred the top two rated targets. Class two (28% of listeners) preferred targets with 2 dB less bass and 2 dB more treble than the target curves preferred by Class 1. Among the demographic factors examined, age was the best predictor of membership in each class.
Moderators Speakers Authors
Tuesday October 8, 2024 3:00pm - 3:30pm EDT
1E03

3:30pm EDT

A cepstrum analysis approach to perceptual modelling of the precedence effect
Tuesday October 8, 2024 3:30pm - 4:00pm EDT
The precedence effect describes our ability to perceive the spatial characteristics of lead and lag sound signals. When the time delay between the lead and lag is sufficiently small we will cease to hear two distinct sounds, instead perceiving the lead and lag as a single fused sound with its own spatial characteristics. Historically, precedence effect models have had difficulty differentiating between lead/lag signals and their fusions. The likelihood of fusion occurring is increased when the signal contains periodicity, such as in the case of music. In this work we present a cepstral analysis based perceptual model of the precedence effect, CEPBIMO, which is more resilient to the presence of fusions than its predecessors. To evaluate our model we employ four datasets of various signal types, each containing 10,000 synthetically generated room impulse responses. The results of the CEPBIMO model are then compared against results of the BICAM. Our results show that the CEPBIMO model is more resilient to the presence of fusions and signal periodicity than previous precedence effect models.
Moderators Speakers
avatar for Jeramey Tyler

Jeramey Tyler

Samtec
Jeramey is in the 3rd person. So it goes.
Authors
avatar for Jeramey Tyler

Jeramey Tyler

Samtec
Jeramey is in the 3rd person. So it goes.
Tuesday October 8, 2024 3:30pm - 4:00pm EDT
1E03

4:00pm EDT

Categorical Perception of Neutral Thirds Within the Musical Context
Tuesday October 8, 2024 4:00pm - 4:30pm EDT
This paper investigates the contextual recognition of neutral thirds in music by integrating real-world musical context into the study of categorical perception. Traditionally, categorical perception has been studied using isolated auditory stimuli in controlled laboratory settings. However, music is typically experienced within a circumstantial framework, significantly influencing its reception. Our study involved musicians from various specializations who listened to precomposed musical fragments, each concluding with a 350-cent interval preceded by different harmonic contexts. The fragments included a monophonic synthesizer and orchestral mockups, with contexts such as major chords, minor chords, a single pitch, neutral thirds, and natural fifths. The results indicate that musical context remarkably affects the recognition of pseudotonal chords. Participants' accuracy in judging interval size varied based on the preceding harmonic context. A statistical analysis was conducted to determine if there were significant differences in the neutral third perception across the different harmonic contexts. The test led to the rejection of the null hypothesis: the findings underscore the need to consider real-world listening experiences in research on auditory processing and cognition.
Moderators Speakers Authors
Tuesday October 8, 2024 4:00pm - 4:30pm EDT
1E03
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -