Spatial Sound Processing
ASIP-NET seminar on spatial sound processing
RegistrationTo register for this event you must be logged in as a member of ASIP-NET.
Time and place
April 17, 2008, 12.30-17.00
Oticon Research Center Eriksholm, Kongevejen 243, 3070 Snekkersten
Welcome by Søren Riis, Oticon
Spatial sound processing for head-geometry microphone arrays - blind and non-blind approaches
The sound field captures spatial characteristics of acoustic sources and thereby conveys important cues for the analysis and interpretation of acoustic scenes. Sound diffraction at the human body - head and outer ears (pinnae) in particular - adds further complexity to sound propagation and signals received at our two ears. The human auditory system remarkably exploits these head-related sound diffraction effects to disambiguate its interpretation of the acoustic scene. Signal processing may similarly attempt to process binaural signals available in binaurally linked microphone arrays. While the basic physical effects to be expected in binaural signals can be calculated analytically, realistic processing must be based on real measurements of head-related impulse responses (HRIR).
We review simplified models of sound diffraction by the head and describe a data collection effort that captures real HRIRs for several directions and distances under anechoic and reverberant conditions. On the non-blind algorithms side, head-geometry beamformer results are presented and evaluated with the database of real HRIRs. Blind source separation constitutes a popular "blind" alternative to beam-forming algorithms. We compared several frequency-domain approaches to blind source separation on the same HRIR database. It is shown how the choice of scaling in different frequency bands, corresponding to a(partial) convolution or deconvolution of the signals, affects overall separation performance under various acoustic situations. A variant of the standard complex-valued infomax algorithm is presented that incorporates improved scaling.
Localization Performance of Real and Virtual Sound Sources
The presentation describes how a 3D-Audio system for use in fighter aircrafts was evaluated in an experiment, by comparing localization performance between real and virtual sound sources. Virtual sound sources from 58 selected directions were evaluated, while 16 of these directions were also evaluated using real sound sources, i.e. loudspeakers. 13 pilots from the Royal Danish Air Force and 13 civil persons were used in the test. The localization performance was split into a constant and a stochastic difference between the perceived direction and the desired direction (stimulus). The constant difference is a localization offset and the stochastic difference is a measure for the localization uncertainty. Stimuli length of both 250 msec and 2 sec enabled investigation of the importance of head movements, i.e. using head tracking. Real and virtual sound sources could be located with an uncertainty of 10 and 14 degrees for azimuth while the uncertainty for elevation was 12 and 24 degrees (real and virtual sound sources). No significant localization offset was found for azimuth, while an average offset for elevation of 3 – 6 degrees was found using long stimuli. A significant difference between the localization offset obtained in different directions was found – especially for elevation, where the offset was found to have a strong correlation to the stimuli elevation.
Coffee break and networking
A loudspeaker-based virtual auditory environment (VAE) has been developed to provide a realistic versatile research environment for investigating the auditory signal processing in real environments, i.e., considering multiple sound sources and room reverberation. The VAE allows a full control of the acoustic scenario in order to systematically study the auditory processing of reverberant sounds. It is based on the ODEON software, which is state-of-the-art software for room acoustic simulations developed at Acoustic Technology, DTU. First, a MATLAB interface to the ODEON software has been developed. Afterwards, different multi-channel playback algorithms have been implemented in MATLAB in order to be able to present a succession of sounds through the VAE in a fast and efficient way. A set of both objective (physical) and subjective (perception-based) measures has been selected to validate the environment and assess its quality.
The head-related transfer function, or HRTF, describes the direction-dependant transformation from the free field to the ears. Thus, the HRTF contains all the acoustical cues that are thought to determine the apparent position of a sound. A significant application of the HRTF is as a filter for the generation of three-dimensional sound. In this context, the resolution at which HRTFs represent auditory space is an important aspect. If the spatial resolution is higher than our perception so that differences between adjacent HRTFs cannot be heard, then the effort of producing such a high resolution is wasted. On the other hand, a resolution that is too low will degrade our auditory spatial perception. Therefore, a resolution equal to, or just higher than, the minimum audible difference is desired. In addition, three-dimensional sound applications may require a dynamic scenario.
Auralization of a sound field using spherical-harmonics beamforming
The binaural auralization of a 3D sound field using spherical-harmonics beamforming (SHB) techniques was investigated and compared with the traditional method using a dummy head. The theoretical scaling of SHB was derived to estimate the binaural signals and verified by comparing simulated frequency response functions with directly measured ones. The results show that there is good agreement in the frequency range of interest. Two listening experiments were conducted to evaluate the auralization method subjectively. The auralization of target sound sources in a background noise was investigated in the first experiment, and psychoacoustic attributes of multi-channel reproduced sounds were measured in the second. The results of the first experiment indicate that SHB almost entirely restores the loudness (or annoyance) of the target sounds to unmasked levels, even when presented with background noise. In the second experiment, the outcome shows that subjective ratings of the width, spaciousness and preference of different audio reproduction modes auralized based on SHB were not significantly different from those obtained for dummy head measurements. Thus binaural synthesis using SHB may be a useful tool to psychoacoustically analyze composite sources and to reproduce a 3D sound field binaurally while saving considerably on measurement time because head rotation can be simulated based on a single recording.
The seminar is supported by the Signal Processing Chapter of the IEEE Denmark Section.