Jérôme Daniel's PhD Thesis and Ppt Presentation: Download page

Download :
Jérôme Daniel's PhD Thesis (zipped pieced PostScript, or now available as a PDF file ) and PowerPoint Presentation (with many pictures and movies).

Pour avoir le descriptif détaillé en français: téléchargez d'abord le premier morceau (dont Table des Matières et Intro)

Page created: 10/17/2000. First published: 10/24/2000. Last update: ...

Back to the Research Page - Back to GyronymO's Home Page

PhD Thesis entitled: "Acoustic field representation, application to the transmission and the reproduction of complex sound environments in a multimedia context" (english translation).

This document has been written in french, produced as a PostScript file, cut into pieces, then zipped (following paragraphs).
You can also download it in one piece as a zipped PDF file (about 3 Mo).

Front page, Abstracts, Table Of Contents, and Introduction (pp 1-12) [46 Ko].

Part I : Theoretical Foundations and Preliminary Analysis.
Chapter 1 (pp 13-76) [967 Ko]: Acoustics, Spatial Perception, and a thourough proof/interpretation of the localisation theories (Makita, Gerzon) based on the velocity V and energy E vectors. Explicit prediction laws are presented between the propagation characterisation (by V and E) and the localisation cues (esp. ITD). (See also the PowerPoint presentation)
Chapter 2 (pp 77-144) [914 Ko]: Spatial Reproduction Principles and associated Sound Field Representation Strategies. Are discussed: two channel and multichannel stereophony; traditional (1st order) ambisonic systems; binaural and transaural, including new strategies (binaural B-format, double-transaural). More than a classical review, this chapter presents an analysis of the rendering properties which is based on the acoustic characterisation of the synthetized field, and the localisation cues (esp. ITD) behaviour as the head rotates or moves. Three main classes of virtual sound imaging over loudspeakers are defined. Each of them is charaterised by the compromises between the accuracy/strength of the reproduced binaural cues and their robustness and naturalness as the head moves.

Part II : Higher Order Ambisonics.
Chapter 3 (pp 145-204) [702 Ko]: Extension of almost all ambisonic aspects to all higher order (2D/3D), from the encoding to the decoding. The most important chapter for those who are interested in high order systems. The extended encoding formalism is presented as well as conversion formulae between the existing encoding conventions. Some extended transformation formulae are also presented. The optimised decoding solutions which were existing only for first order systems (Gerzon's optimisation for a centred listener, and Malham's solution for large areas) are generalised into three solution families (called basic, max rE and in-phase solutions). Underlying concepts and properties are explicited: the fundamental properties attached to the truncature of the spherical harmonic decomposition; the notion of directional sampling of the spherical harmonic base and its regularity property (which is involved in the decoding problem and in the definition of ambisonic microphones). The PowerPoint presentation I used to defend my thesis should give a very appreciated help to understand these points.
Chapter 4 (pp 205-242) [1150 Ko]: Evaluation of the rendering for various listening conditions (ideal, ie individual centred listening position, or collective, ie off-centred listening position). The objective analysis are corroborated by informal listening tests. They highlight the contribution of higher orders and optimised solutions. The removal of some accomodating hypothesis (plane wave hyp., etc.) is also thouroughly discussed, regarding especially: the near field effect of the loudspeakers (finite distance), its compensation, and the convergence towards a holophonic system; the effect of a near field encoded source (spherical wave) when a true acoustic reconstruction cannot be achieved even for low frequencies (large listening area); the effect of the reproduction room.

Part III [493 Ko] : Implementation, Tests and Applications.
Chapter 5 (pp 243-274): Some implementation aspects regarding the applications I developed to experiment and demonstrate various spatialisation techniques: binaural, transaural, VBAP, Ambisonics (up to the second order, at the time), room effect synthesis, and some details about the interfaces (2D/3D). Suggestions are made for using these applications as evaluation tools.
Chapter 6 (pp 275-292): Applications and perspectives. To begin, I expose the experience of one possible use of trad. Ambisonics: encoding and transmitting multichannel sound (originally 5 channels). Then I discuss potential applications and perspectives related to high order ambisonics: mixt decoding (1st+2nd order); 3D impulse responses for room effect synthesis or for diffuse field analysis, etc...
General Conclusion (pp 293-294)

Bibliography (pp 295-300) and Appendices (pp 301-320) [84 Ko]. Appendix A deals with questions related to the spherical/cylindrical harmonic decomposition(s): mathematical background and memento; the problem of the diffraction by a rigid sphere (used for simulations and illustrations); details on the optimisation of high order ambisonic decoding.

Corrected versions of AES papers [380 Ko] (to be included in appendices B and C).

Top

A much more animated and attractive presentation...

This PowerPoint Presentation has been used while defending the thesis (09/19/00).
It contains many pictures and movies which should make many points of my thesis more understandable for everyone (french people and others!). The movies (.AVI) are downloadable by groups (zip files) and can be played outside the PowerPoint presentation.
I intend to make an adaptation as a HTML document (as soon as possible), with more comments, in french and in english.

Caution:
All figures, pictures and movies have been entirely made by myself. This material is dedicated to an educational purpose. Please do not modify this document or any part of it without my permission. Please do not use it for a public presentation without my permission or without an appropriate acknowledgement.

The zipped presentation [1419 Ko].
First group of movies (21 movies) [1685 Ko].
Second group (2 movies) [1191 Ko].
Third group (2 movies) [966 Ko].
Fourth group (2 movies) [1355 Ko].

Comments:
This material covers the major points of the postscript document with roughly the same structure.
Though I had only 45 minutes to defend my thesis, (much) more time would be required to travel through the whole presentation properly. According to what the "reader" is mostly interested in, he/she can ignore some slides and focus on others, of course.
The first substantial page appears after some introductive slides for instance. This is a very didactic page which introduces and describes the main mechanisms for auditory localisation, focusing on static and dynamic lateralisation mechanisms. It highlights specifications for an efficient 3D sound reproduction, regarding not only the localisation performance and naturalness, but also the preservation of spatial qualities and impressions, such as the envelopment.
The following slides may also be found to be of great interest...

Advices for use:

This presentation works with Microsoft Powerpoint (Office 97 or later). The Equation Editor must be installed in order to avoid anomalous symbols (instead of arrows on vectors, for instance). Minor changes in appearence can also occur, as a function of the version of MSOffice used and the environment parameters.
The movie groups must be unzipped within a same subdirectory named "movies", and with respect to the subsubdirectories when existing ("movies\perception3D", "movies\stereo2HP", "movies\multiHP_E", "movies\decoding"). The subdirectory "movie" and the file "SoutenanceJD.pps" must be located in the same directory.
You are free to play the presentation as slowly/quickly as you want (manual mode): use the keyboard (arrows, space bar...) or the mouse.
During the presentation, the movies are called as hypertext links which are most of time associated to a picture. Thus, each time a picture appears, you can move the mouse over it: if there's a link, the mouse appearence changes (the arrow becomes a hand), then you can click to play the movie.
It is highly recommended to configure your player (Windows Media Player2 for instance) to play the movies in loop, since most of them describe periodic or circular phenomena. In some cases, a "pause" function and a slider (time-index) can be useful to have a more acute view of the illustrated phenomenon.

Last note:
This presentation is not accompanied with detailled comments yet (I should have already written them!). But the differents points and concepts are supposed to appear in a very progressive and logical way. So don't hesitate to spend a little time on observing the figures and the movies, and on putting them in relation with the equations or the words that have just appeared or that appear just after...

Top

Coming next(?)... (sorry for not having kept the ambitious promise presented first, yet)

An adaptation of the content of my thesis, inspired by my PowerPoint presentation, in HTML and in english. This should have a thematic structure (encoding formulae, decoder definition, etc...), with different viewpoints (a practical approach with directly usable formulae and instructions; a more theoretical approach with the underlying concepts and properties).

More illustrations and movies: 3D rendering, off-center localisation...

"The experimenter corner" (with sounds!): binaural simulations of centred and off-centred listening experiences with various ambisonic renderings... an audible proof of the contribution of high order and optimised solutions!
(Just let me the time for converting some .wav files to mp3)

Top

Back to the Research page - Back to GyronymO's Home Page