Jérôme Daniel's PhD thesis on Spatial Sound : Abstract (long version) Back to the Research page

Abstract

This thesis deals with acoustic field representation for spatial reproduction over loudspeakers or headphones, as applied to the large multimedia domain, including new applications for browsing in virtual composite 3D-scenes on the Internet. This domain combines spatial reproduction of pre-existent complex sound fields (e.g. in the "5.1" multi-channel format) and constructive spatialisation tasks (3D pan-pot and room effects). This kind of application is increasingly characterised by the variability of a number of parameters: transmission bit-rate, user resources (CPU, reproduction layout), listening conditions (individual or collective), diversity of sound or audio-visual material handled, view-point and object positions in the virtual environment (interactivity). The problem of the representation – seen as a set of signals to be directly diffused or to be decoded before – concerns the transmission (purpose of conciseness) as well as the intermediary spatialisation step (global encoding to factorise the next process).

We have chosen to study thoroughly the ambisonic approach, which is based on spherical harmonic decomposition of the acoustic field, centred on the listener view-point. It has been known for a long time as a first order restricted form, which processes a minimal, directional sound field encoding through four components (B-format): W (pressure) and X, Y, Z (pressure gradient), offering easy sound field manipulations, such as rotations. With a decoder optimised in terms of the listening conditions (ideal/centred or collective/off-centred), a coherent and homogeneous sound space rendering can be obtained for various panoramic (2D) or periphonic (3D) loudspeaker rigs. This "variable geometry" rendering extends to headphones or a pair of loudspeakers via binaural techniques (virtual loudspeakers). The consideration of higher order components, which have just begun to be studied, introduces the concept of "variable resolution" representation (scalability), used as a function of the number of loudspeakers and/or the transmission capability.

We present acoustic and psychoacoustic foundations and a critical review of spatialisation strategies (stereo, surround, binaural, transaural and new derived forms). We explain the intrinsic link between ambisonic representation and local (velocity vector V) and global (energy vector E) propagation characteristics of the reproduced sound field (considering also the loudspeaker geometry), and the prediction laws between the latter and the localisation effect according to the head moving. Thus, the localisation theories implied in ambisonic decoding (Gerzon) are thoroughly justified. While being extended to other reviewed approaches, this kind of analysis highlights Ambisonic.

The generalisation of Ambisonic to all higher orders involves all the aspects presented for the first order systems, especially the encoding formalism and the decoding principles (2D and 3D). We develop the notion of directional sampling of the spherical harmonic base (also usable for the sound field pick-up problem), then the three original decoding forms (Gerzon, Malham) are generalised into three solution families, to be applied according to the listening conditions. Objective evaluations supported by informal listening experiments confirm the contribution of the higher orders and the optimised solutions. The improvement appears through the radial expansion of the acoustic field reconstruction and the global propagation (E), or with regard to the perceptive aspects, through the sound image precision and robustness – even in non-ideal conditions – and the preservation of spatial impressions (lateral separation, a bit weak with the first order).

In addition to Ambisonic (tested up to the second order), other spatialisation techniques (pan-pot, binaural, transaural, plus artificial reverberation) are implemented, incorporated in an interface on a PC, and then experimented. This way, Ambisonic has been successfully applied to real-time source manipulation and mixing (with mono, multi-channel and B-format as input sources), and also compared or combined with other techniques, over headphones (binaural mode) as well as over loudspeakers. This tool could be used for a complete subjective validation of the ambisonic approach and the underlying strategies.

The ambisonic approach gives a very satisfying, global solution to the initial issues, although other strategies are better suited for specific problems – surround matrix systems, efficient binaural synthesis of complex scenes. Its extension to higher orders concerns many application fields and should develop in the near future thanks to the current studies and projects.

Keywords

Spatialisation - 3D-Sound - Surround - Multimedia - 3D-browsing - Ambisonic(s) - B-format - Localisation theory - Velocity vector - Energy vector - Psychoacoustic decoding - Sound field representation - Spherical harmonic decomposition  - Directionnal sampling - Scalability
 

Back to the Research page