AVR: Battling bizarre bugs

Do you ever get the feeling that a compiler is acting weird just to mess with your head? That’s how it felt yesterday.

I was tinkering with the code for my HexMonitor. This uses an ATMEGA328P microcontroller to read values in from an eight-bit data bus or a 16-bit address bus (selectable via a switch) and display them on a four-digit 7-segment display. It’s a tool I’ve built for my glacially slow Zolatron 64 self-build computer project. (It’s not the computer that’s slow, it’s the project. Well, okay, it’s both.) It will also form the basis of my investigation into whether I prefer Eagle or KiCad for PCB design.

And it’s working. I wrote the code in C using Atmel Studio 7 and all was ticking along fine.

Then I decided to rewrite it in C++. Why? Because I like classes. I like the idea of encapsulated, black-box design that can be reused. It’s who I am. And yes, it’s quite possible that the C++ version is less efficient, but at my level of coding I don’t care. Writing separate classes for the 74HC165 and 74HC595 shift register chips helped me clarify in my mind how these chips work, and which bits of my code relate purely to the functioning of those chips and not to the logic of the main program.

The fact is that I will probably stick with the C version of the code when I come to actually use the HexMonitor board – going all object-oriented was just a bit of fun and exploration.

And it worked – for a while.

I got the class for the 74HC595 serial-to-parallel shift register working first. It was much easier than I expected. Then I wrote the code for the 74HC165 and, to my delight, that worked pretty much straight away. Then I made one very slight change.

It wouldn’t compile. Here’s what I saw in the error list.

Not very informative, other than saying it’s a problem with the linker. So I dug into the output of the compiler. This is the relevant bit (if it’s difficult to read, click on the image for a larger version):

Yes, it’s the dreaded ‘truncated to fit’ error. You could spend hours Googling this issue. In fact, I did. And it revealed much gnashing of teeth and rending of garments.

Many of these tales of woe date back some years. Indeed, in a number of the blog and forum posts I found, the conclusion was that the error was due to a compiler (or linker) bug which, allegedly, was fixed in 2005. Oh really?

The message, it seems, refers to a limit to how far (how many bytes) a jump-like instruction can reach in the machine code output. None of which is very helpful to me.

But at least that output is trying to help. It even tells me which line it thinks is causing the problem – line 78 in ShiftReg74hc165.cpp.

What you’re looking at here are two of the methods of my ShiftReg74HC165 class. Line 78, which is the one that seems to offend, takes a pin on the ATMEGA328P low. For what it’s worth, I use this exact same method/function elsewhere (including the class for the 74HC595) without any problems.

Sure enough, if I comment out line 78, the code compiles. It just doesn’t work. But there’s actually nothing wrong with that code.

So I started to wonder if jumping to the _setPin() method from within the _clockIn() method was causing the problem – if the code of the method was just out of range for a jump when the program is reduced to machine code. I tried replacing all the instances of _setPin() in the _clockIn() method with direct instructions, rather than using the method. For example, instead of:

_setPin(shift_LD, LOW);

I used:

*port &= ~(1 << shift_LD);

I started with the lines that take pins low, because that’s what the offending line 78 is doing. But nothing helped. So I did it with the lines that take pins high too. Still no joy.

Now, I know what some of you are thinking: you’re out of program space or memory. But no. Here’s what the compiler tells me:

Program Memory Usage : 1994 bytes 6.1 % Full

Data Memory Usage :43 bytes 2.1 % Full

Gah! None of the suggestions I found online, such as adding the -lm flag, did anything to help. But I kept coming back to the fact that the error was connected somehow to where the _clockIn() method is calling the _setPin() method. (As you can see, it does that a lot.)

So I decided to move things around. I first moved the line that says _setPin(shift_CLK, LOW) – line 88 in the above code – to the end of the FOR loop, just after where that same pin is set high. This also involved adding a line, after the FOR loop, to set that the shift_CLK pin high again, because that’s the default state it needs to be in.

That didn’t help. Same problem.

Then I copied the _setPin(shift_CLK, LOW) line back to where it was to start with, on line 88. Note I said ‘copied’. I left the same command in the place I’d just moved it to, intending to delete it later. For now, I just wanted to see the effect.

The program now works. And it bothers me mightily. When it gets to the end of the FOR loop, the code sets that shift_CLK pin low and then loops around to immediately set it low again. And because it no longer exits the FOR loop with the pin set high, I have to add another call to _setPin() to achieve that. As far as I’m concerned, I have two pointless lines of code in that method. But it seems to be what it takes to get it to compile. Go figure…

This is my journey through the worlds of electronics, robotics and retro computing. I’m not an expert – I’m learning as I go, and this is my way of sharing what I’ve learned. Think of it as a geek's diary and lab notebook.