Meta

Codec 2

Introduction

Codec 2 is an open source speech codec designed for communications quality speech between 700 and 3200 bit/s. The main application is low bandwidth HF/VHF digital radio. It fills a gap in open source voice codecs beneath 5000 bit/s and is released under the GNU Lesser General Public License (LGPL).

The Codec 2 project also contains several modems (FDMDV, COHPSK and mFSK) carefully designed for digital voice over HF radio; GNU Octave simulation code to support the codec and modem development; and FreeDV – an open source digital voice protocol that integrates the modems, codecs, and FEC. FreeDV is available as a GUI application, an open source library (FreeDV API), and in hardware (the SM1000 FreeDV adaptor).

The motivations behind the Codec 2 project are summarised in this blog post.

Individuals can support Codec2 development by helping out with coding, testing, and documentation, or buying a SM1000. Companies can support Codec 2 by paid contract development.

Notes: Thank you very much Armin for providing the MELPe samples. The AMBE samples were generated using a DV-Dongle, a USB device containing the DVSI AMBE2000 chip. The LPC-10 samples were generated using the Spandsp library.

Here are some samples with acoustic background noise, similar to what would be experienced when driving a truck. As you can see (well, hear) background noise is a tough test for low bit rate vocoders. They achieve high compression rates by being highly optimised for human speech, at the expense of performance with non-speech signals like background noise and music. Note that Codec 2 has just one voicing bit, unlike mixed excitation algorithms like AMBE and MELP. The MELPe sample has the noise supression option enabled.

Patches and Git

Please submit patches against codec2-dev/freedv-dev using diff -ruN, which can be conveniently generated using svn diff –patch-compatible.

There are 3rd party Git mirrors of Codec2 and FreeDV. Use them at your own risk.

GIT IS NOT SUPPORTED!!!

All patches, support questions etc, need to be addressed against the SVN repository above. Please do not email me (David), or the codec2-dev mailing list suggesting we change to Git. I get these emails every week. I use Git for other projects and understand the arguments, really I do. However I have good reasons for using SVN for now. Please respect my choice. However I know you mean well, and really want to help – so please read Can I Help instead – we have many other problems that do need solving.

How it Works

At linux.conf.au 2012 I presented a graphical description of how Codec 2 works, see the Links section below. This is a really gentle introduction.

Codec2 uses “harmonic sinusoidal speech coding”. Sinusoidal coding was developed at the MIT Lincoln labs in the mid 1980’s, starting with some gentlemen called R.J. McAulay and T.F. Quatieri. I worked on these codec algorithms for my PhD during the 1990’s. Sinusoidal coding is a close relative of the xMBE codec family and they often use mixed voicing models similar to those used in MELP.

Speech is modelled as a sum of sinusoids:

for(m=1; m<=L; m++)
s[n] += A[m]*cos(Wo*m*n + phi[m]);

The sinusoids are multiples of the fundamental frequency Wo (omega-naught), hence the name "harmonic sinusoidal coding". For each frame, we analyse the speech signal and extract a set of parameters:

Wo, {A}, {phi}

Where Wo is the fundamental frequency (also know as the pitch), { A } is a set of L amplitudes and { phi } is a set of L phases. L is chosen to be equal to the number of harmonics that can fit in a 4 kHz bandwidth:

L = floor(pi/Wo)

Wo is specified in radians normalised to 4 kHz, such that pi radians = 4 kHz. The fundamental frequency in Hz is:

F0 = (8000/(2*pi))*Wo

We then need to encode (quantise) Wo, { A }, { phi } and transmit them to a decoder which reconstructs the speech. A frame might be 10-20ms in length so we update the parameters every 10-20ms (100 to 50 Hz update rate).

The speech quality of the basic harmonic sinusoidal model is pretty good, close to transparent. It is also relatively robust to Wo estimation errors. Unvoiced speech (e.g. consonants) are well modelled by a bunch of harmonics with random phases. Speech corrupted with background noise also sounds OK, the background noise doesn't introduce any grossly unpleasant artifacts.

As the parameters are quantised to a low bit rate and sent over the channel, the speech quality drops. The challenge is to achieve a reasonable trade off between speech quality and bit rate.

Codec 2 Block Diagrams

Here are some block diagrams that illustrate the major sgnal processing elements for a fully quantised configuration of Codec 2. This example includes the LPC correction bit which was a feature of the 2550 bit/s version.

The encoder:

The decoder:

These figures were explained in a presentation I gave at the DCC 2011 conference, for more information see the video of that talk.

Example Bit Allocation

Parameter

bits/frame

Spectral magnitudes (LSPs)

36

Joint Pitch and Energy

8

Voicing (updated each 10ms)

2

Spare

2

Total

48

At a 20ms update rate 48 bits/frame is 2400 bits/s.

Challenges

The tough bits of this project are:

1. Parameter estimation, in particular voicing estimation.

2. Reduction of a time-varying number of parameters (L changes with Wo each frame) to a fixed number of parameters required for a fixed bit rate. The trick here is that { A } tend to vary slowly with frequency, so we can "fit" a curve to the set of { A } and send parameters that describe that curve.

3. Discarding the phases { phi }. In most low bit rate codecs phases are discarded, and synthesised at the decoder using a rule-based approach. This also implies the need for a "voicing" model as voiced speech (vowels) tends to have a different phase structure to unvoiced (constants). The voicing model needs to be accurate (not introduce distortion), and relatively low bit rate.

4. Quantisation of the amplitudes { A } to a small number of bits while maintaining speech quality. For example 30 bits/frame at a 20ms frame rate is 30/0.02 = 1500 bits/s, a large part of our 2400 bit/s "budget".

5. Performance with different speakers and background noise conditions. This is where you come in - as codec2 develops please send me samples of it's performance with various speakers and background noise conditions and together we will improve the algorithm. This approach proved very powerful when developing Oslec. One of the cool things about open source!

Can I help?

Yes.

Not all of this project is DSP. There are many general C coding tasks like GUI development, porting between Octave and C, refactoring, code review, writing user applications, testing, and even patent review.

For coding you need C skills, some time, an interest (but not expertise) in DSP, the ability to work as part of a team, and determination to finish the job. I will work closely with you to answer questions and support you. You will learn a lot and contribute to the open source future of communications.

For non C coders you can also help out with documentation, configuration management, testing, and promotion, for example giving a presentation at your local club or user group.

If you are a radio Amateur - use FreeDV. Start a local FreeDV net. Use the FreeDV API in your SDR project. Use SDR software and hardware that supports FreeDV. Help make open source digital radio popular.

I will happily accept sponsorship for this project. For example research grants, or development contracts from companies interested in seeing an open source low bit rate speech codec. For the price of one AMBE/MELP license a lot of Codec 2 development can be sponsored.

Is it Patent Free?

I think so - much of the work is based on old papers from the 60, 70s and 80's and the PhD thesis work used as a baseline for this codec was original. A nice little mini project would be to audit the patents used by proprietary 2400 bit/s codecs (MELP and xMBE) and compare.

Proprietary codecs typically have small, novel parts of the algorithm protected by patents. However proprietary codecs also rely heavily on large bodies of public domain work. The patents cover perhaps 5% of the codec algorithms. Proprietary codec designers did not invent most of the algorithms they use in their codec. Typically, the patents just cover enough to make designing an interoperable codec very difficult. These also tend to be the parts that make their codecs sound good.

However there are many ways to make a codec sound good, so we simply need to choose and develop other methods.

Is Codec2 compatible with xMBE or MELP?

Nope - I don't think it's possible to build a compatible codec without infringing on patents or access to commercial in confidence information. We are pushing new boundaries where closed source can't follow, such as innovative integration between codecs and modems.

27 Jan 2012, Codec 2 talk at linux.conf.au 2012 (voted best talk of conference!) Video and Slides. This talk has a really easy to understand graphical description of Codec 2, a discussion on patent free codecs, and the strong links between Ham Radio and the Open Source movement. More on lca.conf.au 2012 in this blog post.

4 March 2012, Jean-Marc Valin has done some great work on joint VQ of the frame to frame pitch and gain differences, documented in his blog post A Pitch-Energy Quantizer for Codec2