|

Introduction
Composition and diffusion can be understood as two complementary
and related processes: bringing sounds together, and
spreading them out again in an organized fashion. In
the Western tradition, these two processes are frequently
carried out by different people at different times,
each drawing on specialized knowledge. The electroacoustic
tradition, even if much briefer, offers the possibility
of the composer designing and implementing both aspects
of the music, and interrelating them in highly specific
ways. Computer control offers the greatest precision
in dealing with the complexities of these processes,
even though, at present, separate programs are usually
required.
I
am mainly referring to the practice of timbral composition,
which may be thought of as shaping the space within
the sound, that is, its perceived volume [Truax, 1992].
By this term I mean not merely the loudness of the sound,
but rather its spectral and temporal shape, both of
which contribute to its perceived magnitude and form.
Diffusion, as the performance mode for these sounds,
refers to the distribution of the (usually stereo) sound
in a space through the use of a mixer and multiple loudspeakers.
However, we can also understand the success of such
a performance as a matching of the space within the
sound with the space into which it is projected. This
can be done even more effectively with multiple channel
inputs where each soundtrack can be kept discrete and
projected independently of all others.
At
Simon Fraser University (SFU) we have been developing
specific digital signal processing (DSP) techniques
for each of these operations. The main techniques used
for timbral composition are digital resonators, using
variable length delay lines with controllable feedback,
and granulation of sampled sound used for time stretching,
both of which allow the composer to shape the volume
of the sound [Truax, 1994]. Recently both of these processes
have been integrated into the same program (GSAMX).
The diffusion project is a custom-designed multiple
DSP box, the DM-8, designed by Harmonic Functions in
collaboration with SFU, at the centre of which is a
computer-controlled 8 by 8 matrix with which eight input
streams may be simultaneously routed to any of eight
output channels, either in fixed or dynamic trajectory
patterns. A commercially available 16 by 16 matrix is
also being developed.
Shaping
the space inside the sound
The volume, or perceived magnitude, of a sound depends
on its spectral richness, duration, and the presence
of unsynchronized temporal components, such as those
produced by the acoustic choral effect and reverberation.
Electroacoustic techniques expand the range of methods
by which the volume of a sound may be shaped. Granular
time-stretching is perhaps the single most effective
approach, as it contributes to all three of the variables
just described. It prolongs the sound in time and overlays
several unsynchronized streams of simultaneous grains
derived from the source such that prominent spectral
components are enhanced. In addition, my GSAMX software
allows each grain stream to have its own pitch transposition,
either downwards or upwards, according to a scheme where
the untransposed pitch is the fourth harmonic in the
scale of transpositions. That is, three downward harmonic
pitches are available, plus four or more harmonics in
each octave above the original pitch. However, processing
the material through one or more resonators (using a
waveguide or delay line) prior to granulation will also
shape the spectrum of the sound quite strongly and bring
out particular harmonic or formant regions.
The
Karplus-Strong model of a recursive waveguide with filter
has long been regarded as an efficient synthesis technique
for plucked string sounds [Karplus & Strong, 1983].
The basic model for the waveguide uses a delay line
of p samples which determines the resonant frequency
of the string, a low-pass filter which simulates the
energy loss caused by the reflection of the wave, and
the feedback of the sample back into the delay line.
The initial energy input is simulated by initializing
the delay line with random values, that is, introducing
a noise burst whose spectrum decays to a sine wave at
a rate proportional to the length of the delay line.
The model applies equally to a string fixed at both
ends or a tube open at both ends, at least in terms
of the resonant frequencies all being harmonics of the
fundamental. If the sample is negated before being fed
back into the delay line, the resulting change of phase
models a tube closed at one end, which results in only
the odd harmonics being resonant, and lowers the fundamental
frequency by an octave, since the negation effectively
doubles the length of the delay line. For the basic
model, the resonance equals SR/(p + 1/2) where SR is
the sampling rate, and p is the length of the delay
line.
However,
since the technique models a resonating tube as well
as a fixed string, it is equally suited for processing
sampled sound. Because an ongoing signal activates the
resonator, rather than an initial noise burst, a feedback
gain factor must be used to prevent amplitude overflow
and to control the amount of resonance in the resulting
sound. The current real-time implementation offers a
choice of delay line configurations (single, in parallel
or series), plus the options of adding a comb filter
(to add or subtract a delayed signal) and signal negation
(which lowers the fundamental frequency by an octave
and produces odd harmonics). Particularly interesting
effects occur when the length of the Karplus-Strong
delay and the comb filter delay are related by simple
ratios. Each delay line has real-time control over its
length, and hence its tuning, up to a maximum of 511
samples. The user also controls the feedback level which
can be finely adjusted to ride just below saturation,
in combination with the input amplitude which can be
lowered to facilitate higher feedback levels. The use
of sample negation also makes it easier to control high
feedback levels since the length of the feedback loop
is essentially doubled.
The
complex behaviour of these resonators, particularly
when driven to their maximum feedback level (termed
hyper-resonance) cannot be tracked by the ear at normal
speed, compared to when such sounds are time-stretched
and their internal variations become more evident. In
practice, the sound may be resonated first, using a
chain of up to two or three resonators, then resampled
and granulated; or else, one can introduce a single
resonator directly into the processing chain during
granulation, using a specific option in the GSAMX program.
Such processing lengthens the decay of the resonance
to an arbitrary duration, hence suggesting a very large
space, while keeping the resonant frequencies intact.
That is, resonant frequencies associated with relatively
short tubes appear to emanate from spaces with much
larger volumes, as in my work Basilica (1992). Vocal
sounds subjected to this processing resemble overtone
singing in a reverberant cathedral, because the
resonant frequencies are strong enough to be heard as
pitches. The addition of simple harmonization at the
granulation stage, such as an octave lower, enriches
the sound further and gives the impression of a choir.
The
two stage version of this processing (resonance, then
time-stretching with or without harmonization) was used
in my electroacoustic music theatre work Powers of Two:
The Artist (1995) [Truax, 1996] which is the second
act of the opera Powers of Two. The sounds employed
in subsequent acts have been created using the integrated
approach where the resonance is added during the time-stretching
process. In one particularly striking example, found
in Powers of Two: The Sibyl (1997), natural sounds such
as a recording of rain and thunder, and another of ocean
waves, are hyper-resonated to the point where the original
sounds are engulfed by a low resonant mass of sound
pitched at 60 Hz (the North American electrical frequency).
Then, as the scene progresses, the amount of feedback
added to the process is gradually reduced until the
original sound is once again audible. This effect underlines
the tension in each scene between a character associated
with the modern, technological world and one associated
with traditional visionary insight.
Shaping
the sound inside a space
Although conventional diffusion is remarkably effective
with a stereo source, both the two channel bottleneck,
and the limitations of manual control and too little
rehearsal time, are currently the weak links in the
performance of electroacoustic music. Having eight discrete
sources available, all independently controllable, is
not only acoustically richer for tape music (since detail
is not lost through stereo mixing) but also challenging
compositionally in order to integrate a spatial conception
into the work. However, the same system can be used
for live, or mixed live and tape performance, since
nothing is assumed about the relation of the eight input
signals.
The
DM-8 system is essentially an 8 by 8 matrix which routes
eight channels of input (for us, the Tascam DA-88) to
eight channels of output, presumably going through a
conventional amp and speaker configuration. The hardware
is a custom designed box, external to the host Macintosh,
equipped with four Motorola 56001 chips and a 68000
controller, communicating via MIDI system exclusive
messages to the graphic front end. The software for
user control is a Max application, written by Chris
Rolfe, which can be used either in a live performance
mode with mouse triggered events, or else as a pre-programmed
score synched with the MIDI timecode on the tape. Presets
and an editable mixing score allow each of the input
tracks to have its amplitude controlled. These mixing
levels can be graphically entered, or tracked from the
users control of virtual potentiometers in real
time. These recorded levels are analyzed and compressed
by the program for an optimum data representation and
can later be edited by the user.
A
20-page documentation of the software is available,
but here are a few highlights. The 8 by 8 matrix allows
manual input/output connections to be made (i.e., speakers
turned on and off), preset patterns of which can be
stored and implemented with variable fade times. The
cross-fade from one configuration, say a stereo reduction,
to another, for example a multi-channel distribution
over 5 - 10 seconds, is a typical operation that would
be difficult to achieve manually but is aurally very
attractive.
A
set of players extend the matrix control
to either static speaker lists, or to dynamic
trajectories. Unlike the matrix operation, they automate
both the turning off of outgoing channels as well as
the turning on of new channels. The dynamic assignments
generate a series of cross-fades, moving an input from
speaker to speaker in what we call a trajectory
at a specific rate with adjustable fade patterns. Pre-defined
speaker patterns can be looped, cycled (forward and
reverse), or randomly assigned. Since eight such patterns
can be simultaneously running, very complex movements
can be easily generated. All of the player parameters
transfer directly to the score method of control, hence
a particular trajectory configuration can be tested
in real time, then copied into the score with its precise
point of implementation.
Of
interest to electroacoustic composers is the ease with
which a given set of speakers can be substituted for
another when a new performance configuration is encountered,
or when a mixdown is needed. A speaker list is defined
once and labelled (e.g. left, circle, etc.) with nothing
assumed about where those speakers are located. To change
to a different speaker configuration, only the list
needs to be edited, not each instance of its use. The
label also assists the composer in dealing with particular
spatial configurations independently of the often confusing
lists of speaker numbers.
The
nature of cross-fades between speakers is a particularly
tricky subject, and the software assists the user with
both graphic displays of the levels involved and real-time
aural tests of the effect. Cross-fade percentage is
a key variable, allowing a continuum of effects from
jumping between channels to completely smooth transitions
to be achieved. A sustain delay parameter
delays the fadeout of the previous speakers in a dynamic
sequence to create a more polyphonic effect
(analogous to the vapour trail behind a jet). Finally,
the fade increment is a simple method to
generate the cascaded entry of a speaker list, similar
to the way one might bring in a set of speakers incrementally
in conventional manual diffusion to create another polyphonic
effect.
Although
the system is designed for controlling eight source
channels, other uses are possible. For instance, a stereo
source could be duplicated up to four times at the input
of the matrix, and four pairs of distinct trajectories
or speaker assigns defined. The composer could then
use the mixing score or manually controlled input levels
to cross-fade between the different spatial treatments.
Alternatively, the entire matrix could be considered
to be an effects send and return system for studio work
with, for instance, two dry channels and
six channels of processing being mixed together.
The
DM-8 has been used in performance at the 1995 International
Computer Music Conference in Banff and in 1996 at various
Vancouver New Music electroacoustic concerts, and is
currently available for use in the Sonic Research Studio
at SFU. Although an extended (16 x 16) commercial version
has been developed, the existing hardware and software
configuration is already extremely useful for electroacoustic
diffusion. The software could also be extended by programmers
wishing to add new features or more complex lower level
control patterns.
Recent
compositional applications
As mentioned above, the 8-channel tape component of
my electroacoustic opera Powers of Two was realized
utilizing the techniques described here, both for the
design of the component sounds and their static and
dynamic distribution in eight channels. Two other recent
compositions for solo performer and stereo tape also
illustrate the timbral work, namely Wings of Fire (1996)
for female cellist and tape, based on a poem by B.C.
poet Joy Kirstin, and Androgyne, Mon Amour (1997) for
male double bassist and tape, based on poems by Tennessee
Williams. In both works, the source material is derived
from a reading of the poems as well as sounds recorded
from the live instrument processed with granulation
and/or the use of resonators simulating the open strings
of the instrument. When the voice is processed in this
way on tape, it is given some of the character of the
instrument, and in each piece the love poetry appears
to be addressed to the instrument as the lover. In other
words, the spoken voice on tape appears to be resonated
through the instrument being played, hence symbolizing
their union as lovers.
Other
sounds recorded from the cello and bass are also used
to excite the resonators. These include bowing on the
bridge, natural and artificial harmonics, col legno
attacks, snap pizzicato and various kinds of body percussion
sounds. By raising the feedback level of the resonators
(tuned to the open strings), a noisy sound such as bowing
on the bridge slowly changes from resembling breathing
to regular bowing on the strings, once again highlighting
the intimate relation between the performer and the
instrument. Interestingly enough, when the length of
the delay line is shortened to produce a very high pitch,
the noise component once again becomes dominant, as
at the end of the opening section of Wings of Fire.
In Androgyne, Mon Amour the tuning of the resonators
(independently controlled on each channel) changes more
frequently during the reading of the text, suggesting
a kind of harmonic accompaniment performed by the instrument.
The live instrument, which is frequently played in a
number of unconventional postures, sometimes mimics
this accompaniment, or creates a counterpoint to it.
At
present, the processes of shaping the volume
of the sound, its internal space, and distributing the
sound via multiple loudspeakers into the external performance
space, occur in two different design stages, much as
traditional studio composition and live diffusion have
been carried out. The compositional challenge is to
create significant relationships between the two processes.
However, if we continue to use similar DSP technology
for both, it may well become feasible in future to integrate
them into a single algorithm in which the individual
components that create the volume of the sound are given
spatial placement and definition within the performance
environment.
Sound
and space would become inextricably linked, and composition
then could truly be regarded as the acoustic design
of space.
This
article was first published in Actes III 1997 by lAcadémie
Internationale de Musique Electroacoustique, Bourges
under the title Composition-Diffusion
For
further information, please contact: Institut International
de Musique Electroacoustique de Bourges, Place Andre
Malraux, BP 39, 18001 Bourges Cedex, FRANCE
References
Truax B. (1992) Musical Creativity and Complexity at
the Threshold of the 21st Century, Interface, 21(1)
Truax
B. (1994) Discovering Inner Complexity: Time-Shifting
and Transposition with a Real-time Granulation Technique
Computer Music Journal, 18(2)
Karplus
K. & Strong A. (1983) Digital Synthesis of Plucked
String and Drum Timbres Computer Music Journal, 7(2)
Truax
B. (1996) Sounds and Sources in Powers of Two: Towards
a Contemporary Myth Organised Sound, 1(1)
Barry
Truax is a professor in both the School of Communication
and the School for the Contemporary Arts at Simon Fraser
University, Canada, where he teaches courses in acoustic
communication and electroacoustic music.
No
part of the article may be reproduced or transmitted
in any form or by any means, without prior permission
of the individual authors.
|