Training embedded apps to process speech may
be as easy as finding the right 8-bit micro. Don't let what Rodger has to
say about using an ADPCM algorithm and PWM output to generate speech to go
in one ear and out the other.

The
ultimate form of feedback from a product is through speech. A product that
reacts to stimuli with a verbal response is more likely to grab your attention
than one without the capability.

In most cases, adding speech recording and playback requires extra processor
bandwidth or an additional device such as a DSP or specialized audio processor.
The cost, complexity, or lack of additional bandwidth, however, can prevent
the speech features from being integrated into the product.

Now, if the words "8-bit microcontroller" were mentioned with respect to
speech, some might chuckle to themselves, others might break into a fit of
uncontrollable laughter, but certainly all would read on. Yes, its
true: A simplified Adaptive Differential Pulse Code Modulation (ADPCM) algorithm
can be implemented on a low-cost 8-bit micro.

In this article I explain the tradeoffs between bit rate and quality that
are important in determining if you can use an 8-bit controller in the product.
I also present the details of the origin as well as features of the ADPCM
algorithm. Finally, I cover methods of integrating the microcontroller into
the application as a speech encoder/decoder peripheral or as a complete
speech-processing subsystem.

When choosing a speech processor, you must first determine the desired quality
of the speech reproduction. A speech-processing system attempts to balance
the quality of the reconstructed speech with the bit rate of the
encoding/decoding. In most cases, speech quality degrades as the bit rate
drops.

The search for a happy medium between bit rate and quality has filled volumes.
A high bit rate, high-quality speech processor implies a sophisticated algorithm
that is computationally intensive with long encoding/decoding delays (i.e.,
requires the use of a DSP or special audio processor device).

This would also imply that an 8-bit microcontroller is not a solution for
all applications but can provide reasonably good quality at medium-to-low
bit rates. These tradeoffs between bit rate, quality, and the complexity
of the system can be summarized by the following questions:

 What level of speech degradation can be tolerated?

 What is the highest bit rate a system can tolerate (in terms of
bandwidth)?

 What are the limitations on operating frequency, printed circuit board
area, and power consumption?

 How much can you afford to spend on the speech subsystem?

Unfortunately, one answer cant satisfy all these questions. However,
cost seems to drive most decisions.

Cost is the main factor behind bit rate. Lower bit rates are desirable because
they lower operating bandwidth as well as memory storage requirements. It
also means less memory to store, a fixed amount of speech, and lower cost.
Figure 1 shows graph of speech quality versus bit rate.

Figure 1A designer must make tradeoffs between bit rate and
quality of reconstructed speech. After defining these two parameters, the
selection of a speech coding algorithm can be made.

A typical system might sample speech with a 12-bit ADC at a rate of 8 kHz,
which is more than sufficient to preserve signal quality. At this rate (i.e.,
96 kbps), 1 min. of storage requires 720 KB.

To transmit the information over a communications channel requires something
higher than 96 kbps to permit supplemental information (e.g., start-of-frame
indicators, channel number). These requirements are beyond the scope of most
applications and can be reduced by using speech coding.

Speech-coding techniques for reducing the bit rate fall into two categories.
The first method is called waveform coding.

There is a higher probability of a speech signal taking a small value rather
than a large value. So, a speech processor can reduce the bit rate by quantizing
the smaller samples with finer step sizes and the large samples with coarse
step sizes.

The bit rate can be reduced further by using an inherent characteristic of
speechthere is a high correlation between consecutive speech samples.
Rather than encode the speech signal itself, the difference between consecutive
samples can be encoded. This relatively simple method is repeated on each
sample with little overhead from one sample to the next. An example of a
waveform algorithm is ADPCM.

The other way to reduce bit rate is to analyze the speech signal according
to a model of the vocal tract. The speech remains relatively constant over
short intervals and a set of parameters (e.g., pitch and amplitude) can define
that interval of speech. These parameters are then stored or transferred
over the communication channel.

This technique requires significant processing on the incoming signal as
well as memory to store and analyze the speech interval. Examples of this
type of processor (called a vocoder or hybrid coder) are linear predictive
coding (LPC) or code-excited linear predictive coding (CELP).

Quality is difficult to define or even measure. The goal of a measurement
is to completely describe the quality of a speech processor in a single number.
This measurement should be reliable across all measurement platforms as well
as speech algorithms.

Unfortunately, however, measurements are broken up into subjective and objective.
Subjective tests measure how a listener perceives the speech. Objective tests
compare the original speech against the reconstructed output and make
measurements based on signal-to-noise ratio (SNR).

The goal of a subjective test is to represent the personal opinions of a
listener about the reconstructed speech in a single number. The listener
evaluates speech segments based on the intelligibility or signal degradations
(e.g., nasal, muffled, hissing, buzzing, and so forth. Several subjective
tests exist such as diagnostic rhyme test (DRT), mean opinion score (MOS),
and diagnostic acceptability measure. Table 1 shows the MOS score and bit
rate for some common speech processors.

Coder name

Algorithm type

Bit rate

MOS

G.711

log PCM

64

4.3

G.721

ADPCM

32

4.1

G.723

CELP

5.6 & 6.4

3.9

G.726

ADPCM

16, 24, 32, 40

, 3.7, 3.9, 3.9

G.727

ADPCM

16, 24, 32, 40

, 3.7, 3.9, 3.9

G.728

Low delay CELP

16

4.0

FS 1015

LPC-10

2.4

2.3

FS 1016

CELP/MELP

4.8/3.2

2.4/3.5

GSM

RPE-LTP

13

3.5



MBE

4.8

3.7

Table 1To help reduce the decision-making process, designers
should rely on speech coder test results such as MOS, DAM, or SNR. Typically,
the lower bit rate algorithms are significantly more complex than the higher
bit rate ones.

As I said, objective testing usually involves SNR measurements. SNR is a
measurement of how closely the reconstructed speech follows the original
signal. The speech signal is broken up into smaller segments, and the SNR
is measured. All the SNR measurements are averaged together to get an overall
SNR measurement for the speech signal.

Although this measurement is sensitive to variations in gain and delay, it
cannot account for the properties of the human ear. The input to the speech
processor is usually a sine wave or narrow-band noise waveform to maintain
a repeatable test for all systems.

Because determining the quality of the speech processor is not as easy as
picking the best number, both kinds of tests should be used to identify the
best processor for your application. The best method may be to sit and listen
to the outputs of the speech processor and simply select the one that you
like the best. After all, quality is not a measured parameter but rather
a listener-perceived parameter.

ADPCM is a waveform coding technique that attempts to code signals without
any knowledge about how the signal was created. This implies that a waveform
coder can be applied to other forms of data besides speech (e.g., video).
In general, these coders are simple, with bit rates above 16 kbps. Anything
lower degrades the reconstructed speech.

ADPCM is based on two principles of speech. Because there is a high correlation
between consecutive speech samples, a relatively simple algorithm could be
used to predict what the next sample might be, based on previous samples.

When the predicted sample was compared to the real sample, it was found that
the resulting error signal had a lower variance than the original speech
samples and could therefore be quantized with fewer bits. It was also found
that no side information about the predictor would have to be sent if the
prediction was based on the quantized samples rather than on the incoming
speech signal.

The result was differential pulse code modulation, formerly named ITU-T G.721.
Further studies showed that if the predictor and quantizer were made to be
adaptive (i.e., that smaller samples are quantized using smaller steps and
larger samples with larger steps), then the reconstructed speech more closely
matched the original speech.

This adaptation helps the speech processor handle changes in the incoming
speech signal more effectively. Thus the creation of ADPCM standardized to
be ITU-T G.726 and G.727. Figure 2 shows a block diagram of the encoder and
decoder portions of ADPCM. Notice that both the encoder and decoder share
the same quantizer and predictor.

Figure 2Because the decoder block is embedded in the encoder,
the ADPCM algoritm does not need to send or store any additional side information
with the compressed data.

Most DSP manufacturers can show some type of speech algorithm that has been
implemented for their architecture. Very few 8-bit microcontroller manufacturers
can say the same due to the horsepower required to implement the speech coding
algorithms.

The ADPCM algorithm discussed in this article was developed by the now defunct
Interactive Multimedia Association (IMA) based on an Intel DVI variation
of the standard G.726. Normally, this algorithm is quite rigorous in the
computation category, but the IMA version reduces the floating-point math
and complex mathematical functions to simple arithmetic and table lookups.

A 16-bit 2s complement speech sample is converted into a 4-bit ADPCM
code. The algorithm uses approximately 600 words of program memory and 13
bytes of data memory. Almost any 8-bit microcontroller can implement this
algorithm thanks to the small amount of resources required.

The source code gives the complete ADPCM encode and decode routines written
for use in Microchips assembler (MPASM). The missing piece to the source
code is that for each message recorded or played, all the registers
(PrevSampleL,
PrevSampleH, and
PrevIndex) must be cleared.

A simple encoder/decoder peripheral can be implemented around a PIC12C672
or a PIC16C556A. The first thing to consider is the communication interface
between the PIC and the main processor.

Lower end micros dont have any type of serial or parallel peripherals
but they can be easily implemented in firmware. The complete code shows routines
that can perform I2C, SPI, and RS-232 communications with a host
processor, and Figure 3 shows a block diagram for an I2C
implementation on a PIC12C672.

Figure 3The PIC12C672 provides the smallest solution for a
serial coder peripheral. In addition to the I2C signals SDA and
SCL, this device features an interrupt and encode/decode select
signals.

Because the microcontroller is implementing the serial interface in firmware,
the application must ensure a good handshaking method to keep the micro from
overflowing. A parallel interface routine is much easier to develop than
the serial protocols, and Figure 4 shows an example of the parallel interface
to a PIC16C556A.

Figure 4The PIC 16C556A provides a cost-effective
parallel-interface solution to a speech coder peripheral. In addition to
the standard parallel interface signals, it provides an interrupt and
encode/decode select signals.

The master I2C routine uses approximately 77 words of program
memory and 5 bytes of data memory. MPASM must also be used to assemble this
file.

One consideration when designing a system based around this routine is the
transfer rate. If the PIC is the master of the interface, then the transfer
rate is solely determined by the clock source to the microcontroller. If
the PIC is a slave on the interface, then the transfer rate depends on the
clock source as well as the firmware overhead to sample the incoming data.

The SPI slave routine uses approximately 16 words of program memory and 2
bytes of data memory. The same consideration concerning clock rate applies
to this routine as well. Because of the overhead of sampling the SDI pin,
the maximum clock frequency for SPI slave is at least 18 instruction cycles,
where one instruction cycle is the oscillator frequency divided by four.

The RS-232 routine uses approximately 54 words of program memory and 3 bytes
of data memory. Although you should check to make sure that the micro has
plenty of overhead, the transfer rate of RS-232 is usually much less than
the PICs oscillator frequency.

This routine only requires the user to define the oscillator frequency and
the transfer rate. Several equations allow MPASM to calculate the necessary
delays for bit times.

After the communication protocol is chosen, you have to put all the pieces
together. First you need to implement some type of data request from the
main processor to the micro (for master) or from the PIC to the main processor
(for slave).

The micro must control the flow of data to/from the main processor because
the communication interface is implemented in firmware and not hardware.
Otherwise, data may be lost. For a slave implementation, a single I/O line
from the PIC connected to an external interrupt pin on the host processor
easily accomplishes this.

The other important piece of information is the type of operation to be
performed: encode or decode. This step can be accomplished two ways. First,
a unique command from the host processor to the microcontroller can set the
operation to follow. The host processor then initiates an encode or decode
sequence by sending the command for encode or decode.

For an encode sequence, the host processor sends two 16-bit, 2s complement
samples to the PIC. The PIC then responds with two 4-bit ADPCM codes packed
into one byte. A decode sequence reverses the order. One byte of ADPCM codes
are sent to the PIC, which responds with two 16-bit, 2s complement
samples.

The second method is to use an I/O line from the host to the PIC to indicate
an encode operation (I/O pin pulled low) or a decode operation (I/O pin pulled
high). Note that encode and decode operations should not be mixed together.

All of the data to encode or decode should be sent consecutively to the micro.
Once all of the data has been processed, the host processor can change the
type of operation to be performed.

This requirement is due to the fact that the ADPCM algorithm processes the
next data based on previous data. Anytime the operation is switched, the
encoder or decoder is initialized to a cleared state.

One other consideration is the selection of clock source to drive the PIC.
The PICs oscillator structure is flexible so either an external clock
from the host processor or a local oscillator can be connected to it.

If your application has one system clock that drives all devices on the board,
this same signal can be driven into the oscillator input on the PIC. Otherwise,
a standard oscillator circuit can be used to provide the clock signal.

You can also use a PICmicro as a complete speech-processing subsystem. The
PIC16C77x devices are an ideal choice for this because of the 12-bit
ADC and 10-bit PWM peripherals. The new PIC18Cxxx, can implement stereo
record and playback at an 8-kHz sample rate because of the optimized instruction
set, architecture, and 40-MHz operation.

The PIC can communicate to the host processor via any serial interface or
even a simple keypad that implements play, record, next message, and previous
message. Figure 5 shows a simplified block diagram of the speech subsystem
based on a PIC16C77x device.

The microphone input must be both filtered and amplified before entering
the microcontroller. This input might be designed in two stages.

First, an amplifier stage with some limited automatic gain control provides
somewhere between 40 and 60 dB of gain. The filter stage might be a fourth-order
filter centered at 4 kHz for an 8-kHz sample rate. The PIC samples the incoming
signal at 8 kHz and compresses the 12-bit sample down to four bits.

The memory size is determined by the amount of record time desired. At 8
kHz, the system requires 4 kbps of storage (8000 samples/s × 4 bits/sample).
Therefore, 1 min. of record time requires 240 KB.

An ideal match for this type of system is the Toshiba TC58A040F 4M ×
1 NAND flash-memory device. It stores approximately 131 s of speech at an
8-kHz sample rate and uses SPI as the communications interface.

You now have a choice to make on the speech output circuit. Although a DAC
makes sense in some applications, the PICs onboard 10-bit PWM peripheral
can also be used to lower cost without giving up quality.

Admittedly, the DAC has better quality than the PWM, but with the right
filtering, the PWM module can provide good results. This filter can be a
fourth-order filter centered at 4 kHz (and can be a copy of the input filter).

The final circuitthe speaker amplifier is extremely application
dependent. You may want to drive a speaker or a set of headphones. Many
companies, including National Semiconductor and TI, make amplifiers specifically
for driving speakers or headphones.

Although some applications need the high bit rate and high-quality speech
algorithms, most can use one like mine. Dont underestimate the power
of the 8-bit microcontroller. Given the right device, the medium bit-rate
algorithms can be implemented successfully without a DSP or specialized audio
device.

Improvements to the 8-bit architecture, operating speed, instruction set,
and memory sizes that have allowed the migration of low-end DSP applications
to the 8-bit world. If youve never used a PWM module to generate speech,
try it. You might be surprised.

Rodger Richey has worked for Microchip for more than four years in principal
engineer and senior applications engineer positions. You may reach him at
rodger.richey@microchip.com.

After you find an appropriate page, you are invited to
your
to this massmind site! (posts will be visible only to you before review)
Just type in the box and press the Post button.
(HTML welcomed, but not the <A tag:
Instead, use the link box to link to another page.
A tutorial is availableMembers can
login
to post directly, become page editors, and be credited for their posts.

Link? Put it here:
if you want a response,
please enter your email address:
Attn spammers: All posts are reviewed before being made visible to anyone other than the poster.