Vocoders and Vocoder-Derivatives with SuperCollider

Gary Morrison, 23 July 2000

Background

If I remember my technology history correctly, Homer Dudley invented the first vocoders in the 1930s as a means of reproducing an intelligible facsimile of a voice for recorded messages on telephone systems. In those days before really practical recording systems that could play back on demand, the only choice was to reduce the information content in those voice messages, which is what those vocoders did.

More recently, in the early 1970s, Wendy Carlos and Rachel Elkind used a real-time, analog vocoder that Bob Moog made to their specifications, to reproduce vocal sounds in their rendition of Beethoven's 9th symphony for Stanley Kubrick's movie, Clockwork Orange. This eventually lead to a revival of interest in vocoding, including many improved vocoding results and derivatives of the essential idea to produce interesting sonic and musical effects.

Basic vocoders produce deliciously crusty, robotic-sounding voices, which is sometimes a nifty effect. On the other hand, really sophisticated vocoders synthesize voices that are sometimes difficult to tell from the original voices themselves. I haven't attempted a really accurate vocoder yet, but I have one that's pretty good. Derivatives of the basic vocoding idea can produce sounds that seem tantalyzingly vocal-like, but not clearly like a voice. You'll also hear some such sounds in the demos below.

 

What is SuperCollider?

These demonstrations are generated by programs I wrote in James McCartney's SuperCollider language, and recorded on my G4 Power Macintosh. SuperCollider is a SmallTalk-like programming language for the Apple Macintosh, designed especially for creating an manipulating sounds. It uses object-orientated programing concepts to create a "UGen graph," which could be thought of as somewhat like a synthesizer circuit, the components of the circuit being the programming-language's objects themselves. For more information, see http://www.audiosynth.com.

 

How do Vocoders Work?

Vocoders have two stages: An encoding stage, and a decoding stage. Most of us have probably seen something like the encoding, or analysis, stage: Some stereo-system amplifiers have a "bar-LED" display that provides a real-time, continuous thermometer-like illustration of how much sound there is in each of several pitch bands of the sound spectrum. That's precisely what the encode stage of a traditional vocoder does: It divides the audio-frequency spectrum into bands of pitch, and then measures how loud the sound is in each pitch band.

The decode, or resynthesis, stage more or less reverses that process; it produces a sound with that same amount of sound in each pitch range. More specifically, it impresses the volume of the input sound in each band upon a sinewave at the center frequency of that band.

Click here for an example of a 20-band vocoding of my voice.

 

Improving Vocoding

Sometimes increasing the frequency resolution - the number of bands or slices of the audio spectrum - you analyze and resynthesize can improve the quality of the synthesis results. Click here for an example of a 63-band vocoding of my voice. This is more realistic also because the frequencies I chose were more appropriate to reproducing my voice more accurately.

Click here to hear a much bigger improvement: Pitch following. The pitch at which the voice is being resynthesized is made to track the pitch of my voice itself, using SuperCollider's autocorrelating pitch-follower.

Another way to reduce the robotic character of a vocoded voice is to make the resynthesis frequencies somewhat nonharmonic - that is, not even multiples of the lowest vocoding frequency. This makes the voice a little fuzzier, and if there were fewer bands, more bell-like, but it removes the buzziness of "s" sounds and of the voice in general. Click here to listen to this sort of vocoding of my voice.

 

Spreading or Skooshing the Spectrum

As the first special-effectsy vocoding derivative you'll hear, click here to listen to what happens if you widen the spectrum when you resynthesize a voice. That is, for example, if you filter for 50Hz, 500Hz, and 5000Hz, rather than impressing the volumes at those frequencies on 50Hz, 500Hz, and 5000Hz sinewaves, you can impress them upon 25Hz, 500Hz, and 10KHz sinewaves. Click here to hear what happens if you spread the spectrum even more.

You can do the opposite too: Skoosh in the spectrum, synthesizing the lower band at a higher frequency, and the higher band at a lower frequency. Click here to listen to a moderately skooshed vocoding of my voice. If you skoosh the spectrum in too much, the voice becomes almost impossible to recognize, as you can see here.

One really fun approach is to interactively expand, skoosh, or transpose your voice. Click here to hear what happened when I tied the upper- and lower-frequency multipliers to the position of the mouse on my screen.

Varying-Frequency Vocoding

Heading more in the direction of special effects, there's no reason why vocoding frequencies need to be stationary. Click here to listen to what happens when you decode and resynthsize to 8 continuously-rising vocoding frequencies. More precisely, the vocoding frequencies are a set of overlapping, very low-frequency sawtooth waves. The words here are a little hard to understand, so it's good for producing that haunting sensation of there being words, but those words not being clear.

It is more clearly intelligible with 16 such rising frequencies. Click here to listen to that case. Similarly, it's much more difficult to understand with only 4 such rising vocoding frequencies.

Another curious sensation comes when you have vocoding frequencies randomly fluctuating over the entire audio frequency range. As you'd probably expect, the most-intelligible results come about when you have lots of such randomly-varying frequencies. Click here to listen to 25 such frequencies vocoding my voice. It becomes less clearly intelligible, and thus more mysterious, with fewer such random frequencies. Click here to listen to the same with only 10 randomly-fluctuating frequencies, then click here to listen to just five.

 

Impressing Vocal Qualities upon Other Sounds

If you resynthesize a voice not by changing the volume of a sinewave, but instead the volume of a filtered version another sound, the result is to impress the vocal qualities of your voice upon that other sound. For example, click here to listen to my voice being impressed upon an extended-just-intonation chord with frequency ratios 5:7:9:13:17:23.

You can even impress your voice upon an entire symphony orchestra! Click here to listen to what is admittedly not a very good usage of this idea. A better usage would, for example, be to impress vocal qualities upon a piano accompaniment. That would be somewhat like multiplicatively combining a solo voice with the accompaniment rather than additively mixing the two as we usually do.