Just a few hours ago a good friend of mine and I took CELT out for a test drive. CELT is a new audio codec
created by the Xiph foundation. Well, actually we were discussing continuations over an XMPP audio chat
and the quality was a bit bad. I mentioned CELT just as a joke but a few minutes later both of us somehow downloaded
and compiled the CELT codec.

Basically the idea was the following:

Get a sound stream from the microphone

Pipe it through the CELT encoder

Use netcat to transfer the encoded sound stream over the internet

Pipe the received sound stream through the CELT decoder

Play the decoded sound on the other computer

Well, it was just for fun but it did work extraordinary well. We grabbed the mic output with the parec command and
pacat plays back such a recorded stream. So this command basically plays back whatever your mic receives:

parec --latency-msec=1 | pacat --latency-msec=1

The --latency-msec option makes sure the latency of pulse audio is not to high. If you omit it pulse audio will choose a
very high latency (about 2 seconds in our case). Therefore we just set it to a very small value.

Now we put CELT into the game. Fortunately the CELT library form the Xiph foundation already contains a fully functional
encoder and decoder program. celtenc in the tools directory of the library is the encoder, celtdec the decoder. We
both just compiled the library (with ./configure and make) and used the encoder and decoder directly from the tools
directory. With CELT in the game our local test command looked like that:

Now we encode and directly decode the sound stream with CELT. The only thing to watch out for here is the --stereo
parameter of the CELT tools. The pulse audio tools use 2 channels by default so the --stereo options is needed to match
that. Alternatively you can use the --channels=1 parameter of parec and pacat to make pulse audio just use one
channel and then omit the --stereo option.

Anyway, more important is the bitrate. We played around a bit and 50 seems to be quite ok. Lower bitrates still sound good
but increased the latency in our case. Probably not because of the codec but because of the audio frame size. We tried to
reduce this but this did not work. Anyway 50 KBit/s was a vaule we were happy with.

Now the most important part of all: move the data over the internet. Well, thanks to netcat that is a piece of cake. We first
did it in one direction and later on extended it into a real bidirectional voice chat.

Basically we just put a netcat pair between the celtenc and celtdec programs. [target-ip] is the IP address of the
listener computer. 50123 is just a random port number, you can use whatever port you like. Another idea here is, that
the listener starts the netcat server and receives the encoded sound stream.

Now, lets make it two way. So it will actually be useful for something. An easy attempt would be to do the same two commands
again. Just in the other direction. That is also the best solution we could come up with.

However it seems a bit wasted to use two connections when we just send one data stream and receive another one. We
can use one netcat connection for that:

Now the commands get a bit more complex. We send one encoded sound stream into netcat and decode and play back
the received sound stream. I couldn't resist to draw a nice little diagram of this setup:

The data flow of the sound streams between the programs and over the internet

The dashed lines represent CELT encoded data, while normal lines are raw uncompressed audio. The problem with this setup
is the latency. As soon as computer A runs the command parec and celtenc will start to record and encode data. Even
if no connection has been established yet. The pipe to netcat will buffer this data as long as possible and when computer B
connects it will first receive all buffered data. For example if computer B runs the command 2 seconds after computer A there
will be a 2 second delay in the sound stream from computer A to B (but only in that direction).

We stopped at that point. It was working quite well and the sound quality was absolutely marvellous. For something hacked
together in about 4 hours it's quite well. Most of our time went into working round NAT problems of our routers anyway. Maybe
I'll play a bit around with the library and write some basic UDP transport system. The API looks quite easy to work with. :)

"Using CELT application developers can build software that allows musicians to perform together across the Internet, or simply build great sounding telephony systems. Why shouldn't your telephone sound as good as your stereo?"

50 KBit/s sounded lossless for me. We also tried 25 KBit/s and it sounded the same. At least I could not tell the difference. The problem with that however was that lower bitrates added a very noticeable delay to the stream, even without any network. I suppose the reason for this is that we used normal pipes to move the data from the encoder to the decoder. With a low bitrate it takes much longer for the pipe buffer to be filled and handed on to the next process. 50 KBit/s was just a bitrate where this effect was not that much disturbing.

Just for the sake of bitrate I did a small test. I used the following command from above and tested some different bitrates:

• 50, 25 and 15 KBit/s sounded the same to me. But well, my ears are not the best.
• With 10 KBit/s it started to sound a bit dull. Still better than the XMPP voice chat we used before (Empathy with SPEEX8000 I think).
• 5 KBit/s was really distorted. Still understandable but not really human like.

Please keep in mind that this was still stereo, not mono. With 5 KBit/s I got a latency of about 14 seconds. With 50 KBit/s it was about 2 seconds. So using pipes is not a very good idea. ;)

Anyway, please keep in mind that we hacked it together in a few hours without really knowing Pulse Audio or CELT. We just did it for fun and not to replace any real voice chat.

Pulseaudio is crap, just use OSS' vmix device.
The demo encoders/decoders are not trimmed for low latency, the libraries work properly with low bitrates. Use an other implementation.
Don't try to be too clever, your bidirectional netcat stuff is not actually worth it. Listen on both receiving sides and you won't have that long delay you've been talking about.
And this shouldn't have taken 4 hours!

Sorry, I don't know OSS. I know that audio systems and especially Pulse Audio are discussed quite often. It simply was installed on our computers (running Ubuntu) and so we used it. There's nothing more to it than that.

Regarding the encoder latency: From the encoder output I got the impression that it emits 20ms frames. However I haven't looked into the I/O code of the encoder.

I just fired up strace and watched the encoder a bit and it looks like it only writes the data in 4 KByte junks. Seems like libogg buffers the data until a page is filled. I replaced the ogg_stream_pageout() call in the encoder with ogg_stream_flush() and it does the trick. Still not perfect but much better. Of course this makes absolutely no sense for a normal encoder. I definitely agree with you here, the encoder is not made for low latency stuff. Anyway if we really wanted to build a serious tool here we would have written a small C program and used libcelt together with some UDP transport layer. But it was a just for fun experiment, nothing more. So we just glued together some tools that were already available. Thanks to the command line this works quite well for the effort we invested. ;)

Using the one way version twice for each direction will get rid of the uncontrollable delay. I've written above that this is the best setup we came up with. Probably didn't emphasized it enough. I only mentioned the two way version with one socket because that is the way I think it should have been. Using two sockets looks like a waste to me. Again, something that is trivial to do right if we would have written a small C program. I hope that was what you meant with "listen on both receiving sides".

Most of the time went into working around NAT. Neither of us had proper access to our routers so we could not establish a direct connection. In the end we used a socat bride on another server to exchange data. An SSH tunnel might have been easier from the get go. All in all about 3 hours went into that I guess. The pulse audio tools, the CELT encoder and decoder as well as netcat were both a matter of minutes. We spend some more minutes on trying some parameters of the encoder and decoder (rate, framesize, etc.) but without much effect.

Guys is there some simple solution for us using CELT , who don't know compelling etc. Something simple as Lucy (http://www.luci.eu/?page_id=15))? With lucy live and aac he codec you get 300ms delay.

Leave a new comment

Having thoughts on your mind about this stuff here? Want to tell me and the rest of the world your opinion?
Write and post it right here. Be sure to check out the format help (focus the large text field) and give the preview
button a try.

Name
optional
Website
optional

Format help

Please us the following stuff to spice up your comment.

An empty line starts a new paragraph.
----
print "---- lines start/end code"
----
* List items start with a * or -