These demonstrations all use the same 9 second test file of me calling CQ, recorded at 8000 samples per second with 16 bits per sample. All audio processing was in 16-bit linear format before conversion to 8-bit mu-law for this web page.
For reference, here is the original recording converted directly to
mu-law without any other processing.
64kb/s mu-law PCM, no added noise
Now let's run this signal through a simulated SSB transmitter.
First we simulate a typical bandpass filter. Here I've used a 256-point
finite impulse response (FIR) digital filter with a nominal passband
from 300 Hz to 2.7 KHz and a passband gain of 0dB.
Bandpass filtered voice, no noise
Now let's add some channel noise. This was done by generating
Gaussian-distributed random numbers, running them through the same
bandpass filter, and adding the filtered noise to the already filtered
voice. The average signal-to-noise ratio is 10.27 dB, as
determined by separately summing the squares of the filtered signal
and filtered noise samples over the entire 9 second period and
computing their ratios.
Bandpass filtered simulated SSB signal, S/N=10.27dB
Now the question arises: how fast could we send digital data (e.g., digitized voice) using the same average transmitter power, without worrying about bandwidth? Well, to do that we first need to know the ratio of the average signal power S to the noise spectral density, N0. This is 10.27dB (the S/N in the filter bandwidth) plus the filter bandwidth expressed in dB relative to 1Hz.
We could assume the filter bandwith is just 2700 - 300 = 2400 Hz, or 33.8 dBHz and not be far off. But just to be sure I measured the noise bandwidth of the actual filter by running Gaussian noise through it, measuring the ratio of output to input power, and multiplying by one half the sampling rate. This gave a figure of 2441.5 Hz (33.87 dBHz) which is pretty close to 2400 Hz; the slight difference is due to the filter not having "brick wall" skirts (no real filter does).
So now we can compute the average S/N0 ratio for the noisy SSB signal: 10.27dB (in 2441 Hz BW) + 33.87 dBHz BW = 44.14 dBHz. In other words, with the same total energy we used in our 9 second speech sample we could have sent for 9 seconds a carrier with a 44.14 dB S/N ratio as measured in a 1 Hz receiver bandwidth.
Now let's assume we have a digital modem and FEC technique that needs an Eb/N0 (energy per bit to noise spectral density ratio) of 3dB. This can be achieved with ideal BPSK or QPSK modulation and rate 1/2 constraint length 32 convolutional encoding with sequential decoding. The data rate we can achieve with this scheme is therefore 44.14 dBHz - 3dB = 41.14 dBbps, or 13 kb/s.
Vocoders generally work by modeling the human vocal tract as an excitation source (the larynx or "vocal cords") followed by a series of accoustic filters formed by the vocal tract (throat, mouth, sinuses, etc). Some of these filters vary slowly with time as the tongue, teeth, lips, etc move. The muscles that shape speech move much more slowly than the bandwidth of the speech itself; that's the key to the vocoder's ability to reduce the required data rate.
The voice decoder (decompressor) models the human vocal tract as a set of digital filters whose parameters are encoded in the compressed speech. These filters are driven ("excited") by a signal that represents the vibration of the original speaker's vocal cords. The big differences among vocoders in data rate, CPU requirements and voice quality generally come from different approaches to encoding the excitation, not from the filters that follow.
Due largely to the bandwidth required to encode the residual signal, the GSM vocoder requires 13kb/s for its encoded data stream. This is exactly the capacity we computed earlier for our simulated SSB signal. (This is not a coincidence -- I picked the SSB S/N to achieve this result).
So here is the original audio signal, encoded and decoded using
the GSM vocoder operating at a constant 13kb/s data rate and without
any data errors. I've also included another link to the simulated SSB
signal you already heard to make A/B comparison a little easier.
GSM vocoder at 13kb/s Power-equivalent SSB (S/N=10.27dB)
The "cost" in total transponder energy to send this digital signal is exactly the same as for the analog (SSB) case. Some vocoder artifacts are noticeable, but the overall voice quality is clearly much better than the SSB signal.
Here is our test file encoded in FED-STD-1016
Codebook Excited Linear Prediction (CELP), decoded and converted
to mu-law. The encoded data rate is 4800 bps.
FED-STD-1016 CELP at 4.8kb/s Power-equivalent SSB (S/N=5.94dB)
Not bad, eh? The vocoder artifacts are again noticeable, but remember that 4800 bps is 4.33 dB down from 13kb/s. So the "power equivalent" SSB signal has an average S/N of 10.27 - 4.33 = 5.94 dB.
Unfortunately, due to the way it repeatedly searches its codebooks when encoding, CELP is much more computationally intensive than GSM 06.10. Using the default options, this test file took 96.9 seconds to encode and decode on a 486DX4-100. That's less than 10% of real time, although admittedly I have not tried to optimize this code in any way. CELP probably needs a fairly hefty DSP chip to run in real time.
The benefits of a variable rate vocoder are substantial in a full duplex cellular system since a typical speaker talks only about 40% of the time. The benefits in the half-duplex push-to-talk environment typical in ham radio are less clear.
Here is our test file encoded and decoded in IS-96A QCELP. The
encoder produced a total of 450 frames. 410 were full rate, 15 at half
rate, 10 at quarter rate, and 15 at eighth rate for an average data
rate of about 7.5 kb/s, 93.75% of full rate.
IS-96a vocoder, average 7.5kb/s Power-equivalent SSB (S/N=7.88db)
What makes the QCELP coders particularly interesting for our purposes is that maximizing the capacity of a spread spectrum cellular telephone system involves minimizing the average data rate. The peak data rate is much less important, especially since you have many users sharing the channel at once; the "law of large numbers" comes to our aid.
The exact same thing is true for amateur satellite use if you
consider spacecraft energy and not peak transponder output power to be
the critical resource. Lots of users share the transponder at the same
time, and it's the sum total power that matters. So vocoders that work
well for CDMA cellular ought to work equally well on a linear amateur
satellite transponder even when spread spectrum is not used.
13kb/s QCELP vocoder, average 7.5kb/s Power-equivalent SSB (S/N=7.88db)
The quality of LPC is significantly lower than CELP because it does not attempt to encode the "excitation" or residual with high accuracy. The encoder simply passes pitch information along to the decoder, which regenerates it from a pulse stream. The effect is much like a speech synthesizer (which essentially is a LPC decoder) or a person using an artificial larynx. Nevertheless, the speech is generally intelligible, and it does run at a pretty low data rate. And unlike CELP, it runs in better than real-time on a 486, taking only 50% of a 486DX4-100.
On the other hand, increasing the power of a digital signal above that required to demodulate without errors yields no additional improvement in speech quality. This avoids the incentive inherent in SSB to run excessive power to make one's signal sound better. Furthermore, transmitter power could be continuously and automatically adjusted to exactly that required, eliminating the "laziness" factor as well.
Back to Phil Karn's Amateur Digital Communications Page
Last updated: 8/11/95