AES Show 2024 NY: Full Schedule

Exhibits+ badges provide access to the ADAM Audio Immersive Room , the Genelec Immersive Room, Tech Tours, and the presentations on the Main Stage .

All Access badges provide access to all content in the Program (Tech Tours still require registration)

View the Exhibit Floor Plan.

2:00pm EDT

Towards prediction of high-fidelity earplug subjective ratings using acoustic metrics

Tuesday October 8, 2024 2:00pm - 2:30pm EDT

1E03

High-fidelity earplugs are used by musicians and live sound engineers to prevent hearing damage while allowing musical sounds to reach the eardrum without distortion. To determine objective methods for judging earplug fidelity in a similar way to headphones or loudspeakers, a small sample of trained listeners were asked to judge the attenuation level and clarity of music through seven commercially available passive earplugs. These scores were then compared to acoustic/musical metrics measured in a laboratory. It was found that NRR is strongly predictive of both attenuation and clarity scores, and that insertion loss flatness provides no advantage over NRR. A different metric measuring spectral flatness distortion seems to predict clarity independently from attenuation and will be subject to further study.

Moderators

Dave Anderson

Speakers

David Anderson

Assistant Professor, University of Minnesota Duluth

Authors

David Anderson

Assistant Professor, University of Minnesota Duluth

Stephen Roessner

Tuesday October 8, 2024 2:00pm - 2:30pm EDT
1E03

Perception, Paper Lecture

2:30pm EDT

Decoding Emotions: Lexical and Acoustical Cues in Vocal Affects

Tuesday October 8, 2024 2:30pm - 3:00pm EDT

1E03

This study investigates listeners’ ability to detect emotion from a diverse set of speech samples, including both spontaneous conversations and actor-posed speech. It explores the contributions of lexical content and acoustic properties when native listeners rate seven pairs of affective attributes. Two experimental conditions were employed: a text condition, where participants evaluated emotional attributes from written transcripts without vocal information, and a voice condition, where participants listened to audio recordings to assess emotions. Results showed that the importance of lexical and vocal cues varies across 14 affective states for posed and spontaneous speech. Vocal cues enhanced the expression of sadness and anger in posed speech, while they had less impact on conveying happiness. Notably, vocal cues tended to mitigate negative emotions conveyed by the lexical content in spontaneous speech. Further analysis on correlations between emotion ratings in text and voice conditions indicated that lexical meanings suggesting anger or hostility could be interpreted as positive affective states like intimacy or confidence. Linear regression analyses indicated that emotional ratings by native listeners could be predicted up to 59% by lexical content and up to 26% by vocal cues. Listeners relied more on vocal cues to perceive emotional tone when the lexical content was ambiguous in terms of feeling and attitude. Finally, the analysis identified statistically significant basic acoustical parameters and other non/para-linguistic information, after controlling for the effect of lexical content.

Moderators

Dave Anderson

Speakers

Eunmi Oh

Research Professor, Yonsei University

Authors

Eunmi Oh

Research Professor, Yonsei University

Jinsun Suhr

Tuesday October 8, 2024 2:30pm - 3:00pm EDT
1E03

Perception, Paper Lecture

3:00pm EDT

A comparison of in-ear headphone target curves for the Brüel & Kjær Head & Torso Simulator Type 5128

Tuesday October 8, 2024 3:00pm - 3:30pm EDT

1E03

Controlled listening tests were conducted on five different in-ear (IE) headphone target curves measured on the latest ITU-T Type 4.3 ear simulator (e.g. Bruel & Kjaer Head & Torso Simulator Type 5128). A total of 32 listeners rated each target on a 100-point scale based on preference for three different music programs with two observations each. When averaged across all listeners, two target curves were found to be equally preferred over the other choices. Agglomerative hierarchical clustering analysis further revealed two classes of listeners based on dissimilarities in their preferred target curves. Class One (72% of listeners) preferred the top two rated targets. Class two (28% of listeners) preferred targets with 2 dB less bass and 2 dB more treble than the target curves preferred by Class 1. Among the demographic factors examined, age was the best predictor of membership in each class.

Moderators

Dave Anderson

Speakers

Sean Olive

Authors

Sean Olive

Tuesday October 8, 2024 3:00pm - 3:30pm EDT
1E03

Perception, Paper Lecture

3:30pm EDT

A cepstrum analysis approach to perceptual modelling of the precedence effect

Tuesday October 8, 2024 3:30pm - 4:00pm EDT

1E03

The precedence effect describes our ability to perceive the spatial characteristics of lead and lag sound signals. When the time delay between the lead and lag is sufficiently small we will cease to hear two distinct sounds, instead perceiving the lead and lag as a single fused sound with its own spatial characteristics. Historically, precedence effect models have had difficulty differentiating between lead/lag signals and their fusions. The likelihood of fusion occurring is increased when the signal contains periodicity, such as in the case of music. In this work we present a cepstral analysis based perceptual model of the precedence effect, CEPBIMO, which is more resilient to the presence of fusions than its predecessors. To evaluate our model we employ four datasets of various signal types, each containing 10,000 synthetically generated room impulse responses. The results of the CEPBIMO model are then compared against results of the BICAM. Our results show that the CEPBIMO model is more resilient to the presence of fusions and signal periodicity than previous precedence effect models.

Moderators

Dave Anderson

Speakers

Jeramey Tyler

Samtec

Jeramey is in the 3rd person. So it goes.

Authors

Jeramey Tyler

Samtec

Jeramey is in the 3rd person. So it goes.

Jonas Braasch

Mei Si

Tuesday October 8, 2024 3:30pm - 4:00pm EDT
1E03

Perception, Paper Lecture

4:00pm EDT

Categorical Perception of Neutral Thirds Within the Musical Context

Tuesday October 8, 2024 4:00pm - 4:30pm EDT

1E03

This paper investigates the contextual recognition of neutral thirds in music by integrating real-world musical context into the study of categorical perception. Traditionally, categorical perception has been studied using isolated auditory stimuli in controlled laboratory settings. However, music is typically experienced within a circumstantial framework, significantly influencing its reception. Our study involved musicians from various specializations who listened to precomposed musical fragments, each concluding with a 350-cent interval preceded by different harmonic contexts. The fragments included a monophonic synthesizer and orchestral mockups, with contexts such as major chords, minor chords, a single pitch, neutral thirds, and natural fifths. The results indicate that musical context remarkably affects the recognition of pseudotonal chords. Participants' accuracy in judging interval size varied based on the preceding harmonic context. A statistical analysis was conducted to determine if there were significant differences in the neutral third perception across the different harmonic contexts. The test led to the rejection of the null hypothesis: the findings underscore the need to consider real-world listening experiences in research on auditory processing and cognition.

Moderators

Dave Anderson

Speakers

Krzysztof Kicior

Authors

Krzysztof Kicior

Tuesday October 8, 2024 4:00pm - 4:30pm EDT
1E03

Perception, Paper Lecture

10:00am EDT

Delay detection in hearing with moving audio objects at various azimuths and bandwidths

Thursday October 10, 2024 10:00am - 10:20am EDT

1E03

In order to design efficient binaural rendering systems for 3D audio, it’s important to understand how delays in updating the relative directions of sound sources from a listener to compensate for listener’s head movements affect the sense of realism. However, sufficient study has not yet been done with this problem. We therefore investigated the delay detection capability (threshold) of hearing during localization of audio objects. We used moving sound sources emitted from loudspeakers emulating smooth update of head related transfer functions (HRTFs) and delayed update of HRTFs. We investigated the delay detection threshold with different bandwidth, direction, and speed of sound source signals. As a result, the delay detection thresholds in this experiment were found to be approximately 100ms to 500ms, and it was observed that the detection thresholds vary depending on the bandwidth and direction of the sound source. On the other hand, no significant variation in detection thresholds was observed based on the speed of sound source movement.

Moderators

Sascha Dick

Sascha Dick received his Dipl.-Ing. degree in Information and Communication Technologies from the Friedrich Alexander University (FAU) of Erlangen-Nuremberg, Germany in 2011 with a thesis on an improved psychoacoustic model for spatial audio coding, and joined the Fraunhofer Institute... Read More →

Speakers

Masayuki Nishiguchi

Professor, Akita Prefectural University

Masayuki Nishiguchi received his B.E., M.S., and Ph.D. degrees from Tokyo Institute of Technology, University of California Santa Barbara, and Tokyo Institute of Technology, in 1981, 1989, and 2006 respectively. He was with Sony corporation from 1981 to 2015, where he was involved... Read More →

Authors

Masayuki Nishiguchi

Professor, Akita Prefectural University

Kanji Watanabe

Koji Abe

Yuuki Saito

Thursday October 10, 2024 10:00am - 10:20am EDT
1E03

Perception, Paper Lecture

10:20am EDT

Expanding and Analyzing ODAQ - The Open Dataset of Audio Quality

Thursday October 10, 2024 10:20am - 10:40am EDT

1E03

Datasets of processed audio signals along with subjective quality scores are instrumental for research into perception-based audio processing algorithms and objective audio quality metrics. However, openly available datasets are scarce due to listening test effort and copyright concerns limiting the distribution of audio material in existing datasets. To address this problem, Open Dataset of Audio Quality (ODAQ) was introduced, containing audio material along with extensive subjective test results with permissive licenses. The dataset comprises processed audio material with six different classes of signal impairments in multiple levels of processing strength covering a wide range of quality levels. The subjective quality evaluation has recently been extended and now comprises results from three international laboratories providing a total 42 listeners and 10080 subjective scores overall. Furthermore, ODAQ was recently expanded by a performance evaluation of common objective metrics for perceptual quality evaluation in their ability to predict subjective scores. The wide variety of audio material and test subjects in the test provides insight into influences and biases in subjective evaluation, which we investigated by statistical analysis, finding listener-based, training-based and lab-based influences. We also demonstrate the methodology for contributing to ODAQ, and make a request for additional contributors. In conclusion, the diversity of the processing methods and quality levels, along with a large pool of international listeners and permissive licenses make ODAQ particularly suited for further research into subjective and objective audio quality.

Moderators

Sascha Dick

Authors

Chih-Wei Wu

Christoph Thompson

Director of Music Media Production, Ball State University

Christoph Thompson is vice-chair of the AES audio education committee. He is the chair of the AES Student Design Competition and the Matlab Plugin Design Competition. He is the director of the music media production program at Ball State University. His research topics include audio... Read More →

Emanuël Habets

Matteo Torcoli

Pablo Delgado

Fraunhofer IIS

Pablo Delgado is part of the scientific staff of the Advanced Audio Research Group at the Fraunhofer Institute for Integrated Circuits (IIS) in Erlangen, Germany. He specializes in psychoacoustics applied to audio and speech coding, as well as machine learning applications in audio... Read More →

Phillip A. Williams

Thursday October 10, 2024 10:20am - 10:40am EDT
1E03

Perception, Paper Lecture

10:40am EDT

Perceptual Evaluation of Hybrid Immersive Audio Systems in Orchestral Settings

Thursday October 10, 2024 10:40am - 11:00am EDT

1E03

This study investigates the perceptual strengths and weaknesses of various immersive audio capture techniques within an orchestral setting, employing channel-based, object-based, and scene-based methodologies concurrently. Conducted at McGill University’s Pollack Hall in Montreal, Canada, the research featured orchestral works by Boulanger, Prokofiev, and Schubert, performed by the McGill Symphony Orchestra in April 2024.
The innovative aspect of this study lies in the simultaneous use of multiple recording techniques, employing traditional microphone setups such as a Decca tree with outriggers, alongside an experimental pyramidal immersive capture system and a 6th order Ambisonic em64 “Eigenmike.” These diverse methodologies were selected to capture the performance with high fidelity and spatial accuracy, detailing both the performance's nuances and the sonic characteristics imparted by the room. The capture of this interplay is the focus of this study.
The project aimed to document the hall's sound quality in its last orchestral performance before closing for 2 years for renovations, providing the methodology and documentation needed for future comparative recordings of the acoustics before and after. The pyramidal system, designed with exaggerated spacing, improves decorrelation at low frequencies, allowing for the impression of a large room within a smaller listening space. Meanwhile, Ambisonic recordings provided insights into single-point versus spaced multi-viewpoint capture.
Preliminary results from informal subjective listening sessions suggest that combining different systems offers potential advantages over any single method alone, supporting exploration of hybrid solutions as a promising area of study for audio recording, enhancing the realism and spatial immersion of orchestral music recordings.

Moderators

Sascha Dick

Speakers

Santiago Garcia Cuevas

Authors

Santiago Garcia Cuevas

James Clemens-Seely

Kathleen Ying-Ying Zhang

PhD Candidate, McGill University

YIng-Ying Zhang is a music technology researcher and sound engineer. She is currently a PhD candidate at McGill University in the Sound Recording program where her research focuses on musician-centered virtual acoustic applications in recording environments. She received her Masters... Read More →

Michail Oikonomidis

Richard King

Professor, McGill University

Richard King is an Educator, Researcher, and a Grammy Award winning recording engineer. Richard has garnered Grammy Awards in various fields including Best Engineered Album in both the Classical and Non-Classical categories. Richard is an Associate Professor at the Schulich School... Read More →

Wieslaw Woszczyk

Thursday October 10, 2024 10:40am - 11:00am EDT
1E03

Perception, Paper Lecture

11:00am EDT

Perception of the missing fundamental in vibrational complex tones

Thursday October 10, 2024 11:00am - 11:20am EDT

1E03

We present a study on the perception of the missing fundamental in vibrational complex tones. When asked to match an audible frequency to the frequency of a vibrational tone with a missing fundamental frequency, participants in two experiments associated the audible frequency with lower frequencies than those present in the vibration, often corresponding to the missing fundamental of the vibrational tone. This association was found regardless of whether the vibration was presented on the back (first experiment) or the feet (second experiment). One possible application of this finding could be the reinforcement of low frequencies via vibrational motors even when such motors have high resonance frequencies

Moderators

Sascha Dick

Speakers

Julián Villegas

Authors

Julián Villegas

Yusuke Sato

Thursday October 10, 2024 11:00am - 11:20am EDT
1E03

Perception, Paper Lecture

11:20am EDT

Perceptual loudness compensation for evaluation of personalized earbud equalization

Thursday October 10, 2024 11:20am - 11:40am EDT

1E03

The ear canal geometries of individuals are very different, resulting in significant variations in SPL at the drum reference point (DRP). Knowledge of the personalized transfer function from a near-field microphone (NFM) in the earbud speaker tip to DRP allows personalized equalization (P.EQ). A method to perceptually compensate for loudness perception within the evaluation of different personalized equalization filters for earbuds has been developed. The method includes: measurements at NFM to estimate the personalized transfer function from the NFM point to DRP, calibration of the NFM microphone, the acquisition of the transfer function from the earbuds' speaker terminals to NFM, and the estimation of perceptual loudness in phones at DRP, when applying the different equalization filters. The loudness estimation was computed using the Moore-Glasberg method as implemented in ISO 532-2. The difference to DRP estimation was recursively adjusted using pink noise until the delta was within 0.1 dB. The corresponding gains were applied to the different conditions to be evaluated. A listening test was performed to evaluate three conditions using the described method for loudness compensation.

Moderators

Sascha Dick

Speakers

Adrian Celestinos

Samsung Research America

Authors

Adrian Celestinos

Samsung Research America

Andri Bezzola

Ema Souza

Yuan Li

Thursday October 10, 2024 11:20am - 11:40am EDT
1E03

Perception, Paper Lecture

11:40am EDT

The audibility of true peak distortion (0 dBFS+)

Thursday October 10, 2024 11:40am - 12:00pm EDT

1E03

In a recent study, the authors interviewed five professional mastering engineers on the topic of contemporary loudness practices in music. Among the findings, all five mastering engineers targeted their peak levels very close to 0 dBFS and seemed somewhat unconcerned regarding true peak distortion emerging in the transcoding process, not adapting to the current recommendations of not exceeding -1 dB true peak. Furthermore, true peak measurements over the last four decades show that quite a few releases even measure true peaks above 0 dBFS in full quality. The aim of this study is to investigate the audibility of such overshoots by conducting a tailored listening test. The results indicate that even experienced and trained listeners may not be very sensitive to true peak distortion.

Moderators

Sascha Dick

Speakers

Pål Erik Jensen

University College Teacher, Høyskolen Kristiania

TeachingAudio productionMusic studio productionPro ToolsGuitarBass

Authors

Pål Erik Jensen

University College Teacher, Høyskolen Kristiania

TeachingAudio productionMusic studio productionPro ToolsGuitarBass

Claus Sohn Andersen

Tore Teigland

Professor, Kristiania University College

Thursday October 10, 2024 11:40am - 12:00pm EDT
1E03

Perception, Paper Lecture

12:00pm EDT

Evaluation of sound colour in headphones used for monitoring

Thursday October 10, 2024 12:00pm - 12:20pm EDT

1E03

Extensive studies have been made into achieving generally enjoyable sound colour in headphone listening, but few publications have been written focusing on the demanding requirements of a single audio professional, and what they actually hear. The present paper describes a structured and practical method, based on in-room monitoring, for getting to know yourself as a headphone listener, and the particular model and pair you are using. Headphones provide fundamentally different listening results compared to in-room monitoring adhering to professional standards; considering imaging, auditory envelopment, localization, haptic cues etc. Moreover, in headphone listening there may be no direct connection between the frequency response measured with a generic manikin and what a given user hears. Finding out just how a pair of headphones deviates from neutral sound colour must therefore be achieved personally. An evaluation scheme based on an ultra-nearfield reference system is described, augmented by a defined test setup and procedure.

Moderators

Sascha Dick

Speakers

Thomas Lund

Senior Technologist, Genelec Oy

Thomas Lund has authored papers on human perception, spatialisation, loudness, sound exposure and true-peak level. He is researcher at Genelec, and convenor of a working group on hearing health under the European Commission. Out of a medical background, Thomas previously served in... Read More →

Authors

Thomas Lund

Senior Technologist, Genelec Oy

Tore Stegenborg-Andersen

Thursday October 10, 2024 12:00pm - 12:20pm EDT
1E03

Perception, Paper Lecture