More
News

Back from the grave

The following pages in this section are devoted to describing, in brief, the motivations for what we’ll be doing, the techniques we’ll make use of, as well as some of the general technical challenges.

If you’re impatient or just don’t like clicking, a short set of .pdf slides describing the aims of the project is available here. Also, feel free to have a look through a series of short videos covering various topics.

Digital Sound Synthesis—A Little Background

Additive Synthesis

Digital sound synthesis and audio processing came into being in the late 1950s, as an outgrowth of concurrent work in speech synthesis. Many of the early techniques, such as additive synthesis, based on the use of sums of sine tones or oscillators, FM synthesis, which employs chains of such oscillators as modulators, and wavetable synthesis, making use of stored tables of data read through at variable rates, have become cornerstones of modern synthesis [1].

Sounds produced in this way dominate today’s soundscape—they are familiar to anyone who possesses a personal computer or mobile phone. At the same time, they are undeniably heuristic approaches to synthesis, motivated by perceptual and efficiency concerns—there is no strong underlying physical interpretation for such algorithms.

 

The benefits of such approaches to synthesis are obvious: conceptual simplicity, and efficiency—and such benefits underlie the continued preeminence of such methods in today’s synthesis software packages. Though powerful, these methods possess two major weaknesses:

sound quality: output is invariably synthetic, and lacking the warmth, variability, and interesting unpredictability of acoustically-produced sound.

user control: the necessity of specifying a set of input data which may very large, and/or of obscure perceptual significance.

Wavetable Synthesis

Both these difficulties are at odds with the fundamental goals of many artists and musicians—to have, at one’s fingertips, a flexible sound generation system which is simple to use, intuitive, and which generates sound which is rooted in one’s experiences in the acoustic world. One response to the first difficulty has been the incorporation of recorded audio material, or sampling [2]. Sampling, while very successful at emulating certain instruments (such as the piano), introduces a whole new set of problems: There is an explosion of the memory requirement necessary to capture the full expressible range of an instrument, as well as the difficulty in escaping from the character of these recorded fragments—the ear tires quickly of repetition. The second difficulty is much harder to address (and subjective!): the user may be faced with setting hundreds or thousands of parameters, of obscure perceptual significance. When faced with such difficulties, the musician or sound designer may be forced to retreat to “preset” configurations—and as a result, greatly limiting the potential of these methods.

The point of this project is to explore other techniques—techniques which neatly address the two issues above, while introducing new problems, not least of which is computational complexity!

Back to top

Physical Modeling Synthesis

Mass-spring Networks

A wholly different approach to synthesis is afforded by the application of physical models [3,  4] of musical components, such as strings, bars, plates, membranes, acoustic tubes, enclosed air cavities and rooms, and interactions with excitation mechanisms such as reeds, or bows.

Physical modelling synthesis, after a (long!) period of incubation, emerged in the 1980s, and has since dominated the sound synthesis research environment—and, as mentioned previously, directly addresses issues of control and sound quality. Physical modelling sound output has a natural character, and at its best, exhibits all the subtlety of acoustically-produced sound, with immense potential to go beyond what is possible with existing instruments—the musician is limited only by imagination, and of course, computational resources (read on!). The control aspect is also very neatly dealt with: instruments are defined by geometrical and material parameters, few in number, and are played by sending in physically meaningful signals such as striking locations and forces, or blowing pressures, etc. This is not to say that the user control of a physical model is easy—but it can be learned, much in the same way that one learns to play an acoustic instrument. In contrast, learning how to set the amplitudes, frequencies and phases of a thousand oscillators in order to produce a desired sound is probably beyond the capability of even the most astute and dedicated musician!

Digital waveguide synthesis

The very earliest instances of physical modeling synthesis date back to the 1960s. Kelly and Lochbaum [5] developed a model of the vocal tract based on concatenated acoustic tubes, in order to perform vocal synthesis, in 1962. Ruiz and later Hiller and Ruiz [6] employed a finite difference model of a vibrating string to generate plucked and struck string tones as far back as 1969. In the late 1970s and early 1980s, the first complete environment, CORDIS, based on networks of masses and springs was developed, primarily by Cadoz and his associates [7], and continues to develop. All of these are essentially direct numerical solvers for differential equations. The video here shows a vibrating collection of masses and springs, where sound output is drawn from the motion of one of the constituent masses.

In the 1980s various distinct frameworks emerged. Among the most important have been digital waveguides, and modal synthesis. With the advent of greater computer power came the possibility to perform synthesis for relatively complex systems in real time or near real time.

Modal synthesis

Digital waveguides [8], developed at CCRMA at Stanford University, make use of simple and efficient delay-line structures to model wave propagation in objects such as strings and acoustic tubes. Waveguides were subsequently patented and commercialized by the Yamaha corporation, and constitute the most successful application to date of physical modelling synthesis methods. The video here shows the decomposition of the vibration of a string into traveling wave components—and the resulting delay line structures.

Modal methods [9] rely on the decomposition of the dynamics of a system into modes, each of which oscillates at a given natural frequency, and have been researched extensively at IRCAM in Paris, giving rise to the Modalys software environment. The video here shows the decomposition of the vibration of a string into modes, which may then be added together in order to reconstruct the entire motion of the string.

A great reference for physical modeling synthesis (and many other topics in digital audio!) is Julius Smith’s Global Index

Back to top

Finite Difference Time Domain Methods

While waveguide and modal techniques were being applied, with great success, to physical modeling synthesis, the techniques used in musical acoustics investigations (i.e., not directly for synthesis) developed in different directions. In such investigations, efficient performance was not crucial, so there was less incentive to seek model simplifications (based on, say, traveling wave or modal decompositions).

As a result, some researchers began to look on physical modeling as a particular application of mainstream simulation techniques—and in particular time stepping methods operating over a grid. Important steps were taken by Chaigne and his group at ENSTA in the 1990s in applying finite difference methods to a variety of musical systems [10]. Interestingly, such work picked up upon the thread of work going back to much earlier days (such as that of Hiller and Ruiz [6]). Such methods, as they rely on virtually no simplifying hypotheses, are of great generality—and at the same time, are not nearly as efficient as, say, digital waveguide methods. Evaluating the strengths and weaknesses of the various physical modeling synthesis methods is a complex and multifaceted undertaking—see [3] for some general comments on the topic.

Finite Difference Scheme Update

The basic idea behind such grid-based approaches to simulation is a long-established one. Some methods involve the use of grids covering the domain of interest, containing values representing various physical variables (such as displacement, or pressure, or velocity, etc.). This finite set of values is then advanced, according to the dynamics of the problem under consideration by time stepping…or recalculating the values recursively in a loop, operating here, at an audio sample rate (typically high!). Such methods are usually locally defined—updates at a given grid point are calculated using neighbouring values. In other methods, the “state” of the simulation may not be locally defined values over a grid, but rather the coefficients of various types of global function expansions—modal methods mentioned in the previous section are one example, but there are many others, with the family of spectral methods being the most widely known [11]. See the accompanying figure, illustrating a grid-based time-stepping method applied to a model of a string.

It is possible to approach many systems which cannot be dealt with using waveguides or modal methods, and the resulting improvement in sound quality can be striking (as we hope to show throughout the course of this project!), but only very recently has readily-available computational power grown to the extent such that synthesis in a reasonable amount of time is possible—the exploration of specialized parallel hardware forms another part of the NESS project.

There are many such time stepping methods—(finite element methods being perhaps the most widely known), but here we’ll be looking mainly at finite difference (FD) methods, which operate over regular grids(but also a little at closely-related finite volume methods [12], which may operate over unstructured grids). They form perhaps the oldest method of performing a simulation, and date back at least a century, and see quite a lot of use currently in electromagnetic simulation (where such methods are often referred to as finite difference time domain, or FDTD methods [6]), and, in conjunction with finite volume methods, in fluids applications. They are much less used in solid vibration problems, where finite element methods are dominant.

Why have we chosen to work with such methods? It’s a delicate question, always, as, for a given application there are many (often conflicting) design constraints and concerns. Here, though, in the context of audio, and especially in parallel environments such as GPUs, there are various advantages: ease of programming, as the schemes often are defined over regular grids; ease of analysis—especially important if one is trying to get a grip on potentially audible artefacts such as numerical dispersion and in the case of strong, perceptually important nonlinearities. There is a price to be paid, though, for working with such methods—the main one being that complex geometries are less easy to work with!

Some excellent references on finite difference methods are given below [13,14,15]. I have my own text on applications in physical modelling synthesis [4], which you might also want to have a look at.

Back to top

Large Scale Synthesis in Parallel Hardware

tesla
The Nvidia Tesla

The problems we have looked at over the course of the NESS project vary in terms of computational requirements. Some lead to code which will not run in anywhere near real time on a typical single core microprocessor… others are easily real-time even in relatively modest hardware!

For this reason, we’ve explored parallel architectures, such as graphics processing units (GPUs), which can allow much greater computation rates than typical microprocessors—provided that one’s algorithm is readily parallelizable! From their origins in graphics rendering and games, GPUs are rapidly taking on the role that the supercomputers used to play, across a wide range of heavy computational problems. But there are other types of parallelism that we’ve made use of (sometimes in conjunction with GPUs), including work in multicore annd low-level vector intrinsics. For much more on this, see the dedicated Acceleration page.

There has recently been some exploration of GPUs in 3D room acoustics and a large part of the NESS project is to both extend this work, and also to look at GPU use for more general systems in acoustics and audio. We want to be able to listen to the sounds we produce in a reasonable amount of time (if not real time!). We’ve had some success with this; see the Virtual Room Acoustics page. We’re happy to announce that this work will continue under an ERC Proof of Concept grant, starting in December 2016 (as I type this!).

Back to top

Project Structure

NESS and WRAM Multichannel Research Space
The NESS Multichannel Research Space

The algorithm team, made up of of the project PI, five PhD students and a postdoctoral RA in the Acoustics and Audio Group at the University of Edinburgh, is responsible for numerical algorithm design for a variety of sound producing acoustic systems. They work from first principles to design simulation methods—always in the Matlab prototyping language. The major concerns here are numerical stability, especially under strongly nonlinear conditions, modularity of design (so that various components may be combined in a natural, and, above all, algorithmically simple manner), and in making sure that numerical artifacts do not degrade sound quality—especially important, as we aim to work always at a reasonable audio sample rate! Members of the algorithm team have collaborated with like-minded researchers at the Ecole Nationale Supérieure de Techniques Avancées, in Palaiseau, France, the Aalto University of Technology, in Espoo, Finland, the Centre Nationale de Recherche Scientifique in Marseille, France, and the Universite de Paris VI in Paris, France.

The HPC team, based at the Edinburgh Parallel Computing Centre, attacks the delicate problem of porting Matlab code to C, and then accelerating it by using various techniques, including multicore and CUDA.

The creative team is a set of visiting composers, who will experiment with the finished implementations produced by the HPC team, generate musical compositions, and provide feedback to the Algorithm team. Composers will undergo workshops with the PI at Edinburgh; some have already been chosen, but if you are a composer and have an interest in this kind of synthesis, and are a) interested in multichannel electronic composition, b) don’t mind working out of real time, c) are fairly adept with computers and d) don’t mind truly rotten weather, then by all means contact us—we’d love to hear from you! NB: As I write this in December 2016, I can happily say that b) can be amended to “a little bit out of real time.”

Back to top

References

[1] C. Roads. The Computer Music Tutorial, MIT Press, Cambridge, Massachusetts, 1996.

[2] T. Tolonen, V. Välimäki and M. Karjalainen. Evaluation of Modern Sound Synthesis Methods, Technical report 48, Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, March, 1998.

[3] V. Välimäki, J. Pakarinen, C. Erkut and M. Karjalainen. Discrete time Modeling of Musical Instruments, Reports on Progress in Physics, 69:1—78, 2006.

[4] S. Bilbao. Numerical Sound Synthesis: Finite Difference Schemes and Simulation in Musical Acoustics, John Wiley and Sons, Chichester, UK, 2009.

[5] J. Kelly and C. Lochbaum. Speech Synthesis. International Congress on Acoustics, Copenhagen, Denmark, 1–4, Paper G42, 1962.

[6] L. Hiller and P. Ruiz. Synthesizing Musical Sounds by Solving the Wave Equation for Vibrating Objects: Part I, Journal of the Audio Engineering Society, 19(6):462—470, 1971.

[7] C. Cadoz, A. Luciani and J.-L. Florens. CORDIS-ANIMA: A Modeling and Simulation System for Sound and Image Synthesis, Computer Music Journal, 17(1):19—29, 1993.

[8] J. O. Smith III. Acoustic Modeling Using Digital Waveguides, in C. Roads, S. Pope, A. Piccialli and G. DePoli (Eds.), Musical Signal Processing, 221—263, Swets and Zeitlinger, Lisse, The Netherlands,1997.

[9] D. Morrison and J.-M. Adrien. MOSAIC: A Framework for Modal Synthesis, Computer Music Journal, 17(1): 45—56, 1993.

[10] A. Chaigne and A. Askenfelt. Numerical Simulations of Struck Strings. I. A Physical Model for a Struck String Using Finite Difference Methods, J. of the Acoustical Soc. of Am., 95(2):1112—1118, 1994.

[11] L. Trefethen. Spectral Methods in Matlab, SIAM, Philadelphia, 2000.

[12] R. Leveque. Finite Volume Methods for Hyperbolic Problems, Cambridge University. Press, Cambridge, 2002.

[13] A. Taflove. Computational Electrodynamics, Artech House, Boston, Massachusetts, 1995.

[14] B. Gustaffson, H.-O. Kreiss and J. Oliger. Time Dependent Problems and Difference Methods, John Wiley and Sons, New York, 1995.

[15] J. Strikwerda. Finite Difference Schemes and Partial Differential Equations, Wadsworth and Brooks/Cole Advanced Books and Software, Pacific Grove, California, 1989.

Back to top