## Convolution reverb and audio editing

Audio is not really something that is my key interest or passion, but I have heard from someone who knows, that when it comes to sound editing, or studio recording, reverb is an extremely useful effect to add to a particular line of the track.  (By line, I mean singular audio input; multiple audio inputs would obviously come from the different instruments involved).  There are other important effects too, that interest sound engineers – here is a list of the most commonly used, but reverb is the most important, since it provides the impression that the instruments are playing somewhere.

Reverb is useful for completely digital music, music recorded in a studio (where the room is designed so that there are no reflections), or some mix of the two.

It is also a fairly lucrative business to get into.  ValhallaRoom, one of the leading amateur plugins, has sold hundreds of thousands of copies; and other commercial software of this flavour can cost thousands of dollars per license.

So what is reverb, and how does it work?

Most generally, a room or space will have a characteristic wave equation associated to it: $(\Delta - \partial^{2}_{t})f = 0$, with particular boundary conditions determined by the geometry.  If we apply a forcing term to this equation $(\Delta - \partial^{2}_{t})f(\vec{x},t) = h(\vec{x},t)$, then we essentially simulate the addition of sound sources to the room.  This is the general situation we wish to solve for; given an input signal (h) what is the audio response (f) ?

HERE THERE BE GREENS FUNCTIONS

It turns out that this problem is soluble, if one introduces the concept of Green’s function, which in audio speak / signal processing lingo (for this particular PDE) is also known as the impulse response .

We solve for the equation $Lf := (\Delta - \partial^{2}_{t})f(\vec{x},t) = \delta(\vec{x} - \vec{y},t - \tau)$ instead.  Various techniques can then be used; separation of variables is probably best from this point, if one observes $\delta(\vec{x} - \vec{y},t - \tau) = \delta(\vec{x} - \vec{y})\delta(t - \tau)$; the response of the system to a point source of strength one at location $(\vec{y}, \tau)$ .  It is then, with a bit of work, possible to determinate a solution $G(\vec{x} - \vec{y},t - \tau)$ to the above equation; the Green’s function solution.

Then, with a bit more work, if we observe that

$h(\vec{x},t) = \int_{\vec{y},\tau}\delta(\vec{x} - \vec{y})\delta(t - \tau)h(\vec{y}, \tau)d\tau d\vec{y}$

but $h = Lf$, and $\delta = LG$, which suggests that our solution is

$f(\vec{x},t) = \int_{\vec{y},\tau}G(\vec{x} - \vec{y}, t - \tau)h(\vec{y},\tau)d\tau d\vec{y}$

which, with a bit of careful reasoning, can actually be proved to be the case.  So our solution is the convolution of the impulse response, $G$, with our input signal, $h$.

PRACTICALITY AND IMPLEMENTATION

WLOG, we can simplify and absorb the boundary and shape information into the metric of the Laplacian, and assume we are within a cubical domain.  With a further simplication, the more practical scenario, we can simply consider the ODE

$f''(t) + Af'(t - c) + Bf(t - d) = \delta(t - \tau)$ to compute a Green’s function for a ‘room’ with shape parameters A, B, c, and d.  Then to return our signal to the listener, we merely compute the convolution of the solution to this with the original input.  For more of this flavour, there is actually an impressive series of lecture notes at this location: mi.eng.cam.ac.uk/~rwp/Maths/vid09 , in particular l2notes.pdf .

But can we do more than this?  Perhaps, if we allow for nonlinear terms.  Or higher order, ‘soliton’ terms – although I’m not sure that this consideration would be helpful.  But whatever the outcome, at the end of the day, we get a system that has tweakable parameters, like ValhallaRoom, above.

So that’s all well and good, but what about the general case?  What happens if one considers an environment with a very complex spatial geometry indeed?  Would it be possible to say create a game environment and experience sound sources from different locations in the scene?  This would be an excellent (if not extremely difficult!) project.

Well, it turns out that this work has already been done.  Some researchers have managed to implement the full solution of the general wave equation, in a general game environment, like Unity, by precomputing the impulse response (requiring anywhere between hours and many 100s of hours of computer time), to ‘bake’ in the impulse response so that it can be convolved with sound sources as the player is walking around the scene.  Note: ‘baking’ is also used for other things that would be computationally expensive to do on the fly in-game, such as lighting.  Apart from the fact that these folks have a working solution, they also seem to have got around a number of the road blocks that have shortcircuited the practicality of such in the past, such as compressing the size of the data file needed to store the impulse response information for an outdoor scene from about 100 gigabytes down to a few megabytes (maybe using techniques such as fast fourier transforms? as well as simplifying assumptions), and using various tricks to make shifting geometry and movement for audio effects computable on the fly.

In the demonstration video, some really cool effects are demonstrated: diffraction of sound around objects, the characteristic ‘double-ring’ of a bell in a bell tower caused by sound reflections, muffling or muting of sound caused by obstacles being in the way, and more!  It is very, very cool.  If such a technology was available for Unity as a plugin/asset/resource from the Unity Asset Store, for, say, an affordable price (like ValhallaRoom) I’d easily snap it up in an instant.

The paper itself, describing a sketch as to the algorithm and method used in the video above, is located here.  The video itself is downloadable from this location.

PLANS / PROJECTS

Consequent to this summary of the basics and also the state of the art, doubtless the question might be raised, where does this leave me on this matter?  Well, as mentioned before, I’d love to use the technology demonstrated above in a Unity game.  As a secondary consideration, I’d be interested in having a play with a simple reverb signal modifier, seeing if I can hack one together, as described above.

Regardless, I’ll be sure to write if I make any progress along such lines.