tech blog, another one …

Tag Archives: Linux

For some side project I wanted to look at the schematics of the Adafruit PowerBoost 500C, which happened to be made with Autodesk EAGLE.

Having run EAGLE on Debian 9 (stretch) for a while now without great hassle, I did not expect much difficulties, I was wrong. First I downloaded the tarball from their download site. Don’t worry, there’s still the “free” version for hobbyists, however it’s not Free Software, but precompiled binaries for amd64 architecture, better than nothing.

After extracting, I tried to start it like this and got the first error:

1

2

3

4

alex@lemmy ~/opt/eagle-9.6.0 % ./eagle

terminate called after throwing an instance of 'std::runtime_error'

what(): locale::facet::_S_create_c_locale name not valid

[1] 20775 abort ./eagle

There’s no verbose or debug option. And according to the comments on the blog post “How to Install Autodesk EAGLE On Windows, Mac and Linux” the problem also affects other users. I vaguely remembered somewhere deep in the back of my head, I already had this problem some time ago on another machine, and tried something not obvious at all. My system locale is German, looked like this before:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

alex@lemmy ~ % locale

LANG=de_DE.UTF-8

LANGUAGE=

LC_CTYPE="de_DE.UTF-8"

LC_NUMERIC="de_DE.UTF-8"

LC_TIME="de_DE.UTF-8"

LC_COLLATE="de_DE.UTF-8"

LC_MONETARY="de_DE.UTF-8"

LC_MESSAGES="de_DE.UTF-8"

LC_PAPER="de_DE.UTF-8"

LC_NAME="de_DE.UTF-8"

LC_ADDRESS="de_DE.UTF-8"

LC_TELEPHONE="de_DE.UTF-8"

LC_MEASUREMENT="de_DE.UTF-8"

LC_IDENTIFICATION="de_DE.UTF-8"

LC_ALL=

alex@lemmy ~ % locale -a

C

C.UTF-8

de_DE.utf8

POSIX

As you can see, no English locale, so I added one. On Debian you do it like this. Result below:

1

2

3

4

5

6

7

8

% sudo dpkg-reconfigure locales

alex@lemmy ~/opt/eagle-9.6.0 % locale -a

C

C.UTF-8

de_DE.utf8

en_US.utf8

POSIX

This seems like a typical “works on my machine” problem from an US developer, huh? Next try starting eagle, you’ll see the locale problem is gone, the next error appears:

That’s the problem if you don’t build from source against the libraries of the system. In that case EAGLE ships a shared object libxcb-dri3.so.0, which itself is linked against the libxcb.so.1 from my host system, and those don’t play well together. I found a solution to that in a forum thread “Can’t run EAGLE on Debian 10 (testing)” and it is using the ld_preload trick:

This is about profiling your applications on your embedded Linux target or let’s say finding the spots of high CPU usage, which is a common concern in practice. For an extensive overview see Linux Performance by Brendan Gregg. We will focus on viewing flame graphs with a tool called hotspot here, based on performance data recorded with perf. This proved to be helpful enough to solve most of the performance issues I had lately.

Installing the Tools

We have two sides here: the embedded Linux target and your Linux workstation host. For your computer you need to install hotspot. In Debian it is available from version 10 (buster). You can build from source of course, I did that with Debian 9 (stretch) a while ago. IIRC there are instructions for that upstream. Or you build it from the deb-src package from Debian unstable (sid) by following this BuildingTutorial.1

The embedded target part needs basically two parts. You have to set some options in the kernel config and you need the userland tool perf. For ptxdist here’s what I did:

Add -fno-omit-frame-pointer to global CFLAGS and CXXFLAGS

Enable PTXCONF_KERNEL_TOOL_PERF

Note: I had to update my kernel from v4.9 to v4.14, otherwise I got build errors when building perf.

Configuring the Kernel

I won’t quote the whole kernel config here, but I have a diff on what I had to set to make perf record useful things. These options are probably important, at least I had those on in my debug sessions (others might also be needed):

CONFIG_KALLSYMS

CONFIG_PERF_EVENTS

CONFIG_UPROBES

CONFIG_STACKTRACE

CONFIG_FTRACE

Using it

For embedded use, I basically followed the instructions of upstream hotspot. You might however want to dive a little into the options of perf, because it is a very powerful tool. What I did to record on the target was basically this to get samples from my daemon application mydaemon for 30 seconds:

1

perf record--call-graph dwarf--pid=$(pgrep mydaemon)sleep30

This can produce quite a lot of data, so use it with short times first to not fill your filesystem. Luckily I had enough space on the flash memory of the embedded target available. Then just follow what the hotspot README says: copy the file and your kernel symbols to your host and call hotspot with the right options to your sysroot. This was the call I used (from a subfolder of my ptxdist BSP, where I copied those files to):

tl;dr: Please upstream developers: Do not ever change what you already published, but make an additional version with your fix. This causes less trouble for people building your stuff.

As some of you might have noticed: I’m a little into embedded Linux software and contribute to some of the build systems around, mainly to buildroot (for fli4l) and to ptxdist (at work). This is a very special kind of fun meaning constantly trying to fix things gone wrong. Today is a day where the temperatures outside I’m stuck, because someone else fucked up his stuff.

Last week I built myself an image for testing iperf on a BeagleBone Black with the current buildroot master. This got me a tarball iperf-2.0.9.tar.gz from some mirror server. This worked.

Today I upgraded a ptxdist BSP from some older state, I think 2016.12.0, to the recent ptxdist 2017.06.0 which included an upgrade of the package iperf from 2.0.5 to 2.0.9. This got me a complaint about an invalid checksum. Those embedded build systems contain checksums for tarballs, buildroot uses mostly sha256, while ptxdist still uses md5. This is mostly to ensure transport integrity, but it also triggers when the upstream tarball changes. Which it should not.

So now the checksums in buildroot master from today are still from buildroot changeset 2016.05-1497-g11cc12e from 2016-07-29:

Those are the very same to the file I have locally. Note: both ptxdist and buildroot download archives to the same shared folder here. The md5sum of this file is 1bb3a1d98b1973aee6e8f171933c0f61 and ptxdist aborts with a warning this sum would not match. Well in ptxdist the iperf package was changed on 2016-12-19 last time, also upgrading from iperf 2.0.5 and here the changeset is ptxdist-2016.12.0-10-gd661f64 and the md5sum expected: 351b018b71176b8cb25f20eef6a9e37c. This is the same you can see today on sf.net, but why is it different from the one above?

To find out I downloaded the file currently available on sf.net, which was last changed 2016-09-08, after buildroot included the package update. The great tool diffoscope showed me, a lot of the content between those two archives was changed. But why?

Now this is the point where I’m not sure whether to just get pissed, deeply sighing, or trying to fix the mess for those build systems. The clean way would be upstream releasing some new tarball, either the one or the other or even make a new release, named 2.0.9a or whatever.

What are possible solutions?

Wait for upstream to make a clean, new release. (And hope this doesn’t get changed in the future.)

Upgrade those hashes in buildroot. This obviously breaks old versions of buildroot.

According to the buildroot IRC channel, they want their package to be updated, even if older releases will break. And they said they have a fallback and use their own mirror, so that’s where my first package may have come from.

There’s an embedded Linux board on my desk, where a LED is connected to some GPIO pin. Everything is set up properly through device tree and with a recent kernel 4.9.13 the usual LED trigger mechanisms work fine, so no problem using heartbeat or just switching the LED on and off.

Now I wanted to have the LED blink with certain patterns and it turns out, this is quite easy given you know how. You have to set LEDS_TRIGGER_TIMER in your kernel config first. Now go to the sysfs folder of your LED, here it is:

1

cd/sys/class/leds/status

Have a look at the available triggers:

1

2

$cat trigger

[none]timer oneshot mtd nand-disk heartbeat gpio default-on panic

Switch to the timer trigger:

1

$echo timer>trigger

Now two new files appear, delay_on and delay_off. Per default both contain the value 500 which lets the LED blink with 1 Hz. Without further looking into the trigger code or searching for documentation I assume those values are the on and off times in milliseconds. So have the LED blink with a certain frequency the following formular could be used:

Did you ever read the GCC documentation part Warning Options? If not, you may know the -Wall option. Yeah well, it enables a lot of options, but not literally all possible warnings. In my opinion setting -Wall should be the minimum you should set in every project, but there are more. You can also set -Wextra which enables even more warnings, but as you now might guess, still not all. Missing is especially one option, this post is about and the following describes why I consider it important to set: -Wcast-align

So what does the GCC doc say about it?

Warn whenever a pointer is cast such that the required alignment of the target is increased. For example, warn if a char * is cast to an int * on machines where integers can only be accessed at two- or four-byte boundaries.

In case you can not imagine what this means, let me explain. For example there are 32bit-CPUs out there which access memory correctly only at 32bit boundaries. This is to my knowledge by design. Let’s say you have some byte stream starting at an arbitrary aligned memory offset and it contains bytes starting from 0, followed by 1, then 2 and so on like this:

1

0001020304050607

Now you set an uint32_t pointer to a non aligned address and dereference it. What would you expect? To help you a little, I have a tiny code snippet for demonstration:

Note the last three containing some random bytes from memory behind our buffer! This is the output you get on a amd64 standard PC with little endian format (compiled on Debian GNU/Linux with some GCC 4.9.x).

This comes from an embedded Linux target with an AT91SAM9G20 Arm CPU, which is ARM9E family and ARMv5TEJ architecture or lets just say armv5 or older Arm CPU. Here It runs as little endian and was compiled with a GCC 4.7.x cross compiler.

Well those 32bit integers look somehow reordered, as if the CPU would shuffle the bytes of the word we point into? If you’re not aware of this this means silent data corruption on older Arm platforms! You can set the -Wcast-align option to let the compiler warn you, you may try this by yourself with the above snippet and your favorite cross compiler. Note: the warning does not solve the corruption issue, it just warns you to fix your code.

When reading the FAQ by Arm itself on this topic it’s not quite clear what the supposed behavior is, but what is clear is the following: unaligned access is not supported on older Arm CPUs up to ARM9 family or ARMv5 architecture.

Another point is interesting: even if the CPU supports unaligned access, whether it’s hard coded or an optional thing you must switch on first, it will give you a performance penalty. And coming back to my PC: this is also true for other processor families like Intel or AMD, although on recent processors it might not be that bad.

So what could or should we do as software developers? Assuming there are still a lot of old processors out there and architectures you might not know, and you never know where your code will end up: design your data structures and network protocols with word alignment in mind! If you have to deal with legacy stuff or bad protocols you can not change, you still have some other possibilities, have a look at The ARM Structured Alignment FAQ or search the net on how to let your kernel handle this.

If you want to handle it in code, memcpy() is one possibility. Assume we want to access a 32bit integer at offset 2, we could do it like this:

1

2

3

4

uint8_t *buf;

uint32_t myint;

memcpy(&amp;myint,buf+2,sizeof(uint32_t));

And as said in the topic and above: turn on the -Wcast-align option!

(If you don’t want to be too scared about silent data corruption on your new IoT devices with those cheap old processors and your freshly compiled board support package, you might not want to turn it on on all those existing software out there. You might get a little depressed … )

Sometimes I’m in the mood to destruct my devices. Luckily some drivers may have an ioctl for that:

Most devices can perform operations beyond simple data transfers; user space must often be able to request, for example, that the device lock its door, eject its media, report error information, change a baud rate, or self desctruct.