Correcting falsehoods in tech documentation, lessons learned

Category Archives: tools

Vim has a (oftentimes) nice feature of remembering where in a file you were when you open it again. Which is all well and good, but it’s based on filenames, and git commit messages are always the same. So it remembers the line number where I finished my last commit message. I always want git commit messages to open up on the first line. Here’s some vimrc magic to make it Do The Right Thing™

I’ve been using the GCC ARM Embedded toolchain for STM32 development on linux for a while now. It’s maintained by ARM, it’s available for linux, windows and osx, and it’s just a zip of binaries. Untar, add to your path, and you’re golden. With the new 4.7 release (2012q4) it includes some decent code size improvements, and is generally just a one stop shop for getting a toolchain.

However, the linux builds don’t have python support in GDB. This isn’t necessarily a bad thing, but people are starting to put together some nice tools that plugin to GDB that rely on python support. There’s some experimental SWV/SWO support, there’s some neat support for printf without printf and, what I’m trying to do, dump a lot of data buffers straight out of ram on the target device for analysis in python.

So, I tried building it myself. I first tried just downloading recent gdb sources and adding python support, after setting the target to arm-none-eabi. but….. that didn’t play well with openocd, so I went back to trying to build the G-A-E provided sources instead.

I have a problem were two applications running on an OpenWrt router are using too much CPU time. I can reproduce the problem fairly well, but I don’t have any easy obvious clues as to the cause yet. Worse, it’s behaves more like a memory leak. The task stays the same, but CPU usage slowly creeps up over an hour or so. I’ve tried setting up the same applications on a linux desktop, and I can make it happen, but it’s so very very slow. So while that was running, I looked into CPU profiling on linux, and particularly, which of them would run on OpenWrt. So immediately things like valgrind/cachegrind and dtrace and friends are ruled out. I scratched google cpu profiler as being too invasive, and gprof as having too many criticisms. Also, I’d never used any of these tools before, and I didn’t want to spend all day testing them individually. Searching for CPU profiling linux kept turning up two applications, oprofile and perf. First step then was, “which of these, if any, is already packaged for OpenWrt?”

Background requirements / assumptions

You’re building your own OpenWrt images here…

I started testing this on Backfire, but then moved to Attitude Adjustment when I started working with perf.

You are free to recompile the code under test, and aren’t trying to test a live system

Turns out the both are, but you have to jump some hoops. In “make menuconfig“, you need to got to “Gobal build settings” and enable “Compile the kernel with profiling enabled“. That’s all you need for oprofile, but for perf, go also to “Advanced configuration options (for developers)” and enable “Show broken platforms / packages“. I’m working on the atheros platform (ar2317), and apparently (I’m told on IRC) it’s a packaging problem more than a broken tool. It works on atheros at least, YMMV.

Continuing, you need to turn on the libelf1 library under “Libraries“, because the perf package doesn’t select it properly for you. Finally, under “Development” enable oprofile, perf or both. (When I started this, I didn’t have BROKEN packages enabled, so just used oprofile, and later used perf. If perf works, I’d only use perf, and skip oprofile altogether. I won’t talk about oprofile any further, but this was all you needed to get it to work)

So, that will make an image that can run the tools, but all binaries are (rightfully) stripped on OpenWrt by default in the build process, so you won’t get particularly useful results. In theory, you can record performance data on the router, and then generate the reports on your linux desktop with all the symbols, but I never worked out how to do this.

So, we want to run non-stripped binaries for anything we want to actually investigate. One way of doing this is the following magic incantation in your OpenWrt buildroot (Provided by jow_laptop on IRC)

make package/foo/{clean,compile} V=99 CONFIG_DEBUG=y

You can then copy the binary from [buildroot]/staging_dir/target-arch_uClibc-0.x.xx.x/root-platform/usr/bin/foo onto your device and run it. Actually, if you only want to analyse a single program, that may be enough for you. You can copy that non-stripped binary to the ramfs on /tmp, and just start profiling…

Not much of a difference. We can see calls into our own code, but well, we don’t spend much time there :) Really, to get useful output, you need non-stripped libraries, and the kernel symbols. You’re never going to fit all of that on a ramdisk on your OpenWrt router. It’s time for some remote filesystems.

I used sshfs for this, which was very easy to setup, but it’s pretty cpu heavy on a small router. NFS probably would have been a better choice. However you do it, mount your OpenWrt buildroot somewhere on your router.

Then, even without any changes in how you call your application, you can add the kernel symbol decoding, just add the -k parameter. “perf report -k /tmp/remote/build_dir/linux-atheros/linux-3.3.8/vmlinux” Now we get the kernel chunks broken out, like in this snippet.

We’re still missing symbols from uClibc, and any other user libraries we may call. (like libjson there) Fortunately, when we rebuilt our package with CONFIG_DEBUG=y earlier, it rebuilt the libraries it depends on in debug too. But now, we need to make sure we call our binary using the correct debug libraries. (This is the bit I think should be doable offline, but this is the way I got it to work at least!)

This is setting up the library path to use the non-stripped libraries. We then start perf using this process ID. (/tmp/remote is where I mounted my OpenWrt buildroot)

We can now see into uClibc, and into user libraries like libjson. (Trust me, I just didn’t include _alllll_ the output.) This is not actually the same point in the program, so don’t read too much into the actual numbers here. If you want, you can comment that this program is spending far too much time printing something to the console. (which is true!)

That’s it. You can now get symbol level information on what’s happening and how much. Whether this is useful is a different story, but this was fairly complicated to setup and test out, so I thought this might be helpful to others at some point.

Eagle is only provided as a 32bit package for linux, even as of version 6.3. I’m still using 5.x, for compatibility reasons, so I was trying to get it installed on my newish Fedora 17 64 bit install. Eagle’s #1 FAQ item is how to do this, but it’s for fedora 10, and some of the packages have changed. You also don’t need to install as many, as some of them will be pulled in as dependencies.

I was looking for a few bytes extra flash today, and realized that some old AVR code I had, which used uint8_t extensively for loop counters and indexes (dealing with small arrays) might not be all that efficient on the STM32 Cortex-M3.

So, I went over the code and replaced all places where the size of the counter wasn’t really actually important, and made some comparisons. I was compiling the exact same c file in both cases, with only a type def changing between runs.

I would personally say that it looks like ARM still has some work to go on optimizations. If _least8 and _fast8 take up more space than int it’s not really as polished as the avr-gcc code yet. For me personally, as this code no longer has to run on both AVR and STM32, I’ll just use int.

So, after extending this a bit, my original conclusion about the fast_ types not being fully optimized with arm-gcc were wrong. It’s more that, on AVR, your “don’t care” counters should be unsigned for smaller size, while on STM32, they should be signed (Though I still think it’s dodgy that int_least8_t resulted in bigger code than int_fast8_t) Also, even if signed is better in the best case, the wrong signed is also the worst case. Awesome.

Which is fine and all, I can switch to the c/c++ spelling check section. But this fails to address the massive fucking pile of fail this is. WHY ON EARTH IS THERE A SEPARATE SPELL CHECK SETTING FOR C/C++?!

I mean, why?! This is not like formatting and tabs where you might want different per language settings. But spell check settings based on whether I’m in C, python, java or lua?! HELL NO!