What Could Go Wrong: Asynchronous Serial Edition

It’s the easiest thing in the world — simple, straightforward serial data. It’s the fallback communication protocol for nearly every embedded system out there, and so it’s one that you really want to work when the chips are down. And yet! When you need it most, you may discover that even asynchronous serial can cost you a few hours of debugging time and add a few gray hairs to your scalp.

In this article, I’m going to cover most (all?) of the things that can go wrong with asynchronous serial protocols, and how to diagnose and debug this most useful of data transfer methods. The goal is to make you aware enough of what can go wrong that when it does, you’ll troubleshoot it systematically in a few minutes instead of wasting a few hours.

The Groundwork

Imagine that you’ve got eight bits of data that you want to send me, electronically. If we have eight wires (plus ground) between us, you can simply flip your eight switches and put high or low voltages on each wire. If I’m on the other end with some LEDs, I’ll just read off which ones light up and we’re done. But eight wires is a lot of copper. So instead you decide to send one bit at a time, using just one wire (plus ground). That’s the essence of serial communication — bits are sent in series by varying the voltage on a wire precisely over time.

Sounds easy, but now we have some choices to make. How fast do you send each bit? Does a lit-up LED represent a 1 or a 0? How will I know when your message starts or stops? And finally, if we’re both going to send data to each other, we’ll need two wires. How do we know which one I’m sending on and which one you’re sending on? Each of these choices is a place to get things wrong, and for bugs to creep in.

RX/TX

That last point, which wires transmit data in which direction, is surprisingly a common source of confusion, so it’s a good place to start debugging.

“RX” and “TX” stand for “receive” and “transmit” respectively. Most serial communications systems will have one of each. Often setup goes something like this: you’ll find yourself connecting “GND” on one device up to “GND” of the other. Maybe they’ll also share a power rail, so you’ll connect “VCC” of one to “VCC” on the other. And then, on a roll, you’ll connect “RX” on one device up to “RX” on the other.

And that’s mistake number one. Both devices are expecting to receive data on their “RX” line, so they both just sit there waiting while the two “TX” lines will end up talking over each other. No, the “right” way to do it is to connect the “RX” port of one device up to the “TX” port of the other and vice-versa. That’s just logical, right? To help remind you of this, sometimes the “TX” will be labelled “TXD” where the “D” stands for “device” and that’s supposed to remind you that you’re looking at things from this device’s perspective.

Whatever you call it, connecting a port called “TX” to a port called “RX” causes trouble in modern CAD programs, where you name the network rather than the individual ports. What do you call a wire that connects both devices’ “GND” pins? “GND” is a good name. What do you call the wire that connects “TX” to “RX”? How about the one that connects “RX” to “TX”? Confusion reigns.

(Note that SPI, which has its own issues that we’ll get to next time, calls these lines “master in, slave out” and “master out, slave in”. The line names are consistent, and if you know which device you’re looking at, you know instantly which direction the data is flowing. That’s much better.)

So the first debugging question to ask yourself is whether or not you’ve properly crossed the signal lines. And even if you have, try swapping them anyway because even if you’re not confused, you can’t be sure that the engineer upstream of you wasn’t. (We’ve seen it happen.)

Baud Rate

We’ve got the wiring straight, so how about the speed at which you’re sending (and receiving) data? This matters, because if you see a high voltage on your wire for a while, you need to know how many bits that “while” was supposed to represent. If I send you four zeros, you’ll see a constant voltage for twice as long as if I sent you two, but we have to agree on a timebase so that you can be certain I didn’t just send two zeros, or eight.

The number of bit signals sent per second is called the baud rate, and it’s something that we just have to agree on. This means that both the sender and receiver have to have fairly accurate clocks onboard so that they can keep the same time.

Be on the lookout for baud rates like 2400, 9600, 38400, and 115200. If you don’t know the baud rate of your target device, it doesn’t hurt to try them all out.

Autobauding

There is a clever trick that you or your device can play if you don’t know the baud rate ahead of time. If you receive a few bytes of data, you can keep track of the length of time that the voltage is constant on the line, and find the lowest common denominator. For instance if you see a high voltage for 208 μs, and then a low for 104 μs, and finally a high for 312 μs, it’s a good bet that a bit period is 104 μs long, and that corresponds to 9600 baud. If it’s more like 8 μs, that’s 115,200 baud. Solved.

9600 Baud

115,200 Baud

Voltage Levels

You’ve got the “RX” and “TX” lines straight, and you’ve figured out the baud rate, so you’re well on your way to receiving and transmitting data. The question now becomes how to interpret it. Put another way, is a high voltage a 1 or is a high voltage a 0?

RS-232 vs TTL Voltages

You wouldn’t think this would be confusing, but alas, history conspires against us. RS-232, the most popular serial standard of old, used positive and negative voltages (from 3 V to 15 V and from -3 V to -15 V) to signal 0 and 1 respectively. Yes, that’s right. A 1 is sent with a negative voltage, and the higher voltage corresponds to a 0.

Cut to the present, where single-sided signalling is more common. Nowadays, the higher voltage (3.3 or 5 V) is taken to be a 1, and the low voltage (0 V) is taken as a zero. So the answer to the question of how to interpret the voltages as numbers is: it depends. Modern, but still RS-232-style, signaling will use 0 V and 5 V as 1 and 0, while TTL serial will do just the opposite.

3.3 V TTL

5V “RS-232”-style

The good news is that it’s possible to tell these two cases apart with an LED (or a multimeter if you’re fancy). Both RS-232 and TTL systems start off with the “TX” port of a device sending a 1 level as default. If the “TX” line idles high, you’re looking at a TTL system. If it idles low, it’s more than likely using the RS-232 polarity.

Inverter Circuit

If you’ve got an FTDI USB-to-serial cable, or one of the clone devices like the CP2102 or the … , you’re 100% in TTL-serial-land. Good news. If you need to interface with another device that uses RS-232 levels, however, you’ve got a little work to do.

Here is an RS-232-to-TTL converter circuit that works for modest baud rates, and even takes care of the voltage-level shifting for you. So you can connect your 3.3 V sensitive ESP8266 circuits to an old -15 V to 15 V line printer and all will work just fine. This is one for your tool-belt. It’s not strictly compliant, because it doesn’t swing to -12V (or whatever), but it gets the polarity right and will work with most devices.

If you need something to interface with ancient RS-232 equipment, you can pick up a chip (MAX-232 or equivalent) that creates the higher voltages for you. Indeed, if you crack open an RS-232 converter, you’ll sometimes see a USB-TTL serial chip paired with a MAX-232. (The cheap ones simply invert the signal and are no better than the two-transistor circuit above. You’ve been warned.)

Start Bits, Stop Bits, Parity, and Endianness

You can’t just send voltages down the wire. You have to know when the signal starts and stops, and what data to look for. Collectively, this is called framing. Most serial systems will use the same “8N1” frames, but when they don’t, it’s worth knowing about. These three characters correspond to the number of bits sent at once, the parity bit, and the number of stop bits respectively. Let’s take that apart.

The number of bits of data sent per packet is self-explanatory, and most serial protocols send data one byte at a time, so this isn’t usually a problem. But on really old gear, you’ll sometimes see seven bits — ASCII only uses seven bits after all. Anyway, the number of bits per packet is the “8” in “8N1”.

Now let’s think about the “start” and “stop” bits. Because the sending device’s “TX” port (and the receiver’s “RX”) idle high (for TTL), you can’t start off your data transmission with a 1 — how would you tell if it got sent? So a start bit, always low, begins the packet of data.

If you send one byte, it ends with the line high for a long time, and it’s pretty easy to tell where it ends. If you send two bytes, and the second starts with a low start bit, you need to send at least one high stop bit at the end of the first packet. (When you send one byte, the stop bit blends in to the background high state of the TX line.) If you’re keeping count, that’s a minimum of ten signals to send eight bits: one start bit and at least one stop bit. But some systems send two stop bits, so again you have to specify. The number of stop bits is the “1” in “8N1”.

Here is the message “*n” — Star and Line feed (ASCII 10) — being sent with one and two stop bits respectively. In binary, that’s 00101010 00001010. The stop bits (one and two respectively) show up between these two bytes, and there’s a single start bit before each as well. See if you can puzzle that out.

And this brings us to the “parity bit” in the middle, that is high or low depending on whether the number of ones in the data are even or odd. And the choice to encode even as a 1 or odd as a 1 is arbitrary. To take advantage of the parity bit as an error-detection mechanism, you need to know which is which, so it’s specified as “N” for no parity, or “E” or “O” for even- and odd-parity respectively. When there is a parity bit, it’s added after the data, just before the stop bit(s). The parity bit makes the number of 1s in the byte even or odd, respectively, which is the error detection scheme. If you’re using even parity and you see three 1s, including the parity bit, you know there was a transmission error.

There’s one final confusion that you’ll fortunately almost never see: the issue of endianness. The serial numeric data could be sent either with the least-significant bit coming first in time, or the most-significant. The good news is that TTL and RS-232 serial data is almost always least-significant bit first (or “little-endian”) but some other serial protocols, like the Internet protocols, send their serial data big-end first. The bad news is that scopes display data left-right as it comes in, and we write numbers most-significant-bit first, so you’ll have to reverse the bit patterns in your head when reading the scope shots. (Those four zeros in the line feed character should help orient you.)

The Protocol Layer: Line Endings

By now, you’ve got the signals all straightened out. You’d think that there’s nothing more that could go wrong. But wait! Because serial comms have evolved over time, there are two (maybe three) possible ways to signal the end of a line of data. The line-endings issue is familiar if you’ve copied text files across Unix, Windows, and older MacOS machines. Each one (naturally) uses a different standard. Confusion among these three traditions has also invaded the world of embedded devices. If the receiving program is waiting for one of these, and you’re sending the other, it won’t know when you’re done and will just sit there.

The short version is that you might need to send a Line feed (LF, ASCII 10) or a Carriage return (CR, ASCII 13) or both (CR+LF) before the other endpoint responds. Most terminal programs will let you set this on the fly, both for sending and receiving, so it’s no big deal to troubleshoot by hand. But if you’re already unsure of what your microcontroller code is doing, and you can’t see any of this to tweak, it might not occur to you that you’ve got a line-ending issue. And of course nothing stops anyone from using their own specific end-of-line character in their protocol. Sigh.

Summary

This concludes our guide to troubleshooting serial lines, and I’ve covered pretty much all of the possible variables: getting the lines right, selecting the proper baud rate, figuring out the line polarity (TTL- or RS-232-style), data length, stop bits, parity, and the line ending. That’s a lot that can go wrong all at once if you’re just trying to get some data out of an opaque microcontroller system. Knowing all of the possible factors, however, gives you a foothold — a checklist that you can use to make sure that everything is working the way you think it should be.

Most of the time, it’s not so bad. You’re going to be running into 8N1 at one of the standard baud rates. Make sure your wires are crossed, and test the voltages on the transmit lines to establish the parity. Then, you can mess around with different baud rates. If this doesn’t save you, try the line endings. And if you’re still stuck, break out the ‘scope and dig into the signals.