design and implementation of software-intensive systems

The biggest psychological impediment to my work on wireless sensor network frameworks is data. What do I collect? What do I do with it? In short, why do I even want a WSN? (If you can answer that, but want help making it happen, please get in touch through one of the options on my about page.)

Collecting data that’s never looked at is a waste of time, so good tool support for viewing and analyzing time-series data is important. The pieces underlying this are the databases and the graphs.

The traditional database approach is RRDTool. Been around forever, does the job very well.

I’m not in love with its API, which involves providing text representations of the observations through an argc/argv command-line interface even from within C, which on the surface introduces a lot of overhead. But “I just don’t like it” is an inadequate reason to reject a time-proven solution. In fact, a text-based API has certain benefits: I recently got a patch merged to RRD that significantly simplifies specification of archive retention, but the new feature can’t be used in collectd configurations because the parameters are no longer the integer counts that collectd stores and sorts to reformat into RRD arguments.

Where RRDTool falls down is in the display of the data. Well, really, in constructing the specification that’s used to define the graphs. The rrdgraph tool that comes with it actually makes some very powerful graphs, but it’s in desperate need of a GUI to help select among data sets, combine them, change the time bounds, etc. There’s cacti, but it wants to do data collection too, I don’t like kitchen-sink solutions, and it’s not as good as collectd. The days of the fat client are past; there are some web apps like Collectd Graph Panel that make it a bit easier, but not enough to really be nice to use.

This is what I’m talking about: dashboards with pre-configured graphs showing the information I want to see, automatically updated as the data comes in. A web application to dynamically construct graphs, adding and removing from all available data sets, applying transformations to each source in turn, etc.

Turns out the graphite project provides three layers:

whisper and ceres are back-end databases for time series data, similar to RRDTool

carbon is a daemon infrastructure that receives samples over a network connection and dispatches them to whisper or ceres based on a text name, which by convention encodes a hierarchy such as
collectd.server1.cpu0.load.

graphite-web is what generates the graphs; it can read data from whisper, ceres, and RRD databases.

It’s all written in Python and runs as a virtual host under django, so it’s pretty easy to configure on Ubuntu 12.04 or 14.04. Even getting django to run in Apache isn’t nearly as hard as everybody makes it out to be.

Graphite’s capabilities are awesome.

Carbon has some really neat architectural features including automated creation of databases as metrics arrive (with customized retentions based on the metric name), centralized aggregation of metrics, and daemons to perform meta-aggregation and relaying to other servers.

not designed for irregular updates. In short, incoming data to RRD gets interpolated to align with the primary data point timestamps. If you don’t get data often enough to do that interpolation, RRDTool can’t do the alignment, and data gets dropped. I’m sympathetic to this issue, though it doesn’t affect my use cases as much as it does, say, StatsD and other system-monitoring applications.

On the other hand, whisper has a few problems of its own:

Each file stores only one metric, which wastes storage when a sensor provides a multiple metrics (e.g. temperature, humidity, pressure, wind direction and speed) and multiple consolidations (AVERAGE, MAX, MIN over various retentions)

As a consequence, when aggregating for lower-resolution periods only one consolidation function is allowed (normally “average”). You lose the extreme values (such as daily highs and lows) unless you configure to create separate databases for those. (Carbon does support this if carbon-aggregator is used, but that’s another daemon/point-of-failure.)

In my case, sensors have a limited ability to store data locally, so if the off-sensor database stops functioning and I’m told about it in time I can restart things and back-fill the missing material. This is exactly what nagios is for, but figuring out when the last update was received by a whisper database requires an O(n) search because the value isn’t stored in the database header, even though a related problem was a motivation for rejecting RRDTool!

The biggest problem with carbon, and the graphite project as a whole, is lack of leadership and active management. Carbon has over two hundred open issues and pull requests, some of which have been apparently been addressed but the issues left open. There was a major rework called “megacarbon” but that’s been dormant for six months. There are incompatible changes being made on the 0.9.x maintenance branch relative to master. whisper is supposed to be superseded by ceres, but requests for information on project status and schedule are left unanswered for months. If you noticed the link I gave above for graphite was to an outdated page: it’s because the new one doesn’t have the FAQ or any of the pieces that tell people why they should even care about the project.

This an unfortunate but common failing with open source projects, where nobody’s compensated for their effort and maintenance naturally drops when the original developer can only respond “it works for me” or “I don’t even use that software anymore”.

Regardless of all that. RRDTool is a robust tool that’s worked for years and continues to be maintained: a dozen enhancement patches I submitted were promptly integrated for the next release. As an open source solution for web access to display real-time data I don’t think graphite-web would survive a serious challenge from an actively-managed alternative, but it works well enough that there’s really no motivation to develop a competitor.

I’ve set up permanently running rrdtool, graphite, and collectd systems on both my stable and development servers. They’re already recording whole-house power consumption at 1Hz from a TED 5000, interior temperature and humidity from a daemon running on a raspberry pi (stored in RRD databases because I care about that data), and collectd statistics from internal hosts (stored in whisper databases because it’s easier to customize the retention periods and I don’t care about that data).

Since I finally have a way to visualize the data, and I have all these wireless microcontrollers and sensors, it’s probably time to start collecting more.

A forum discussion on Stellarisiti raised the question of how to achieve short delays on a Cortex-M microcontroller. Specifically, delays on the order of cycles, where the overhead of calling a vendor-supplied library routine exceeds the desired delay. The difficulty arises from an earlier observation that ARM documents the
NOP instruction as being usable only for alignment, and makes no promises about how it impacts execution time. In fact, ARM specifies that its use may decrease execution time, miraculous though that might be.

I felt the lines of argument lacked evidence, and accepted a challenge to investigate. This post covers the details of the experiment and its result; the forum discussion provides additional information including an explanation of “hint instruction”, the effect of “architected hint”, and why the particular alternative delay instructions were selected.

The experiment I proposed was the following:

Timing will be performed by reading the cycle count register, executing an instruction sequence, then reading the cycle counter. The observation will be the difference between the two counter reads.

The sequence will consist of zero or one context instructions followed by zero or more (max 7) delay instructions

The only context instruction tested will be a bit-band write of 1 to
SYSCTL->RCGCGPIO enabling a GPIO module that had not been enabled prior to the sequence.

The two candidate delay instructions will be
NOP and
MOV R8,R8

Evaluation will be performed on an EK-TM4C123GXL experimenter board using gcc-arm-none-eabi-4_8-2013q4 with the following flags:
-Wall-Wno-main-Werror-std=c99-ggdb-Os-ffunction-sections-fdata-sections-mthumb-mcpu=cortex-m4-mfpu=fpv4-sp-d16-mfloat-abi=softfp

The implementation will be in C using BSPACM, with the generated assembly code inspected to ensure the sequences as defined above are what has been tested

The predictions I made prior to starting work were:

Null hypothesis (my bet): There will be no measurable cycle count difference in any test cases that vary only in the selected delay instruction. I.e., there is no pipeline difference on the Cortex-M4.

“Learn something” result (consistent with my previous claims but not my expectations): For cases where N>0, one cycle fewer will be measured in sequences using
NOP than in sequences using
MOV R8,R8. I have no prediction whether the context instruction will impact this behavior. I.e., on the Cortex-M4 only one
NOP instruction may be absorbed.

“Surprise me” result (still consistent with my previous claims but demonstrating a much higher level of technology in Cortex-M4 than I would predict): A difference in more than one cycle will be observed between any two cases that vary only in the selected delay instruction, but the difference has an upper bound less than the sequence length. I.e., the pipeline is so deep multiple decoded instructions can be dropped without impacting execution time.

“The universe is borked” result (can’t happen): The duration of a sequence involving
NOP is constant regardless of sequence length while the duration of the sequence involving
MOV R8,R8 is (in the limit) linear in sequence length. I.e., the CPU is able to decode and discard an arbitrary number of
NOP instructions in constant time.

Naturally, things turned out to be a little more complex, but I believe the results are enlightening. The code is available in this github gist.

Here’s the output from the test program:

Test program output

1

2

3

4

5

6

7

8

9

May 22 2014 06:16:24

System clock 16000000 Hz

Before GPIO context insn: 21

After GPIO context insn: 23

After GPIO context restored: 21

Null context, NOP: 1 2 3 4 5 6 7 8 1

Null context, MOV: 1 3 4 5 6 7 8 9 1

GPIO context, NOP: 7 7 10 10 10 11 12 13 7

GPIO context, MOV: 7 7 10 10 10 11 12 13 7

So what does this say?

First, note that I’ve added diagnostics to confirm that the GPIO context instruction does what it’s supposed to do (enable an unused GPIO module), and that the instruction to reset the context works. Second, the results for each test show the cycle times for the context followed by zero, one, two, …, seven, and zero delay instructions.

Let’s expand the empty, one, and two delay versions of each case to see what it is we’ve timed. These are extracted from main.dis-Os in the gist. Here’s the null context with
NOP:

Null context with NOP

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

35:main.c **** t0 = BSPACM_CORE_CYCCNT();

35 0000 214B ldr r3, .L2

36 0002 5A68 ldr r2, [r3, #4]

36:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

39 0004 5968 ldr r1, [r3, #4]

40 0006 8A1A subs r2, r1, r2

42 0008 0260 str r2, [r0]

38:main.c **** t0 = BSPACM_CORE_CYCCNT();

44 000a 5A68 ldr r2, [r3, #4]

39:main.c **** DELAY_INSN_NOP();

48 000c 00BF nop

40:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

53 000e 5968 ldr r1, [r3, #4]

54 0010 8A1A subs r2, r1, r2

56 0012 4260 str r2, [r0, #4]

42:main.c **** t0 = BSPACM_CORE_CYCCNT();

58 0014 5A68 ldr r2, [r3, #4]

43:main.c **** DELAY_INSN_NOP(); DELAY_INSN_NOP();

62 0016 00BF nop

65 0018 00BF nop

44:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

70 001a 5968 ldr r1, [r3, #4]

71 001c 8A1A subs r2, r1, r2

73 001e 8260 str r2, [r0, #8]

Good, so it’s doing what we expect in the most basic case. What about
MOV R8,R8?

Null context with MOV R8, R8

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

80:main.c **** t0 = BSPACM_CORE_CYCCNT();

240 0000 214B ldr r3, .L5

241 0002 5A68 ldr r2, [r3, #4]

81:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

244 0004 5968 ldr r1, [r3, #4]

245 0006 8A1A subs r2, r1, r2

247 0008 0260 str r2, [r0]

83:main.c **** t0 = BSPACM_CORE_CYCCNT();

249 000a 5A68 ldr r2, [r3, #4]

84:main.c **** DELAY_INSN_MOV();

253 000c C046 mov r8, r8

85:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

258 000e 5968 ldr r1, [r3, #4]

259 0010 8A1A subs r2, r1, r2

261 0012 4260 str r2, [r0, #4]

87:main.c **** t0 = BSPACM_CORE_CYCCNT();

263 0014 5A68 ldr r2, [r3, #4]

88:main.c **** DELAY_INSN_MOV(); DELAY_INSN_MOV();

267 0016 C046 mov r8, r8

270 0018 C046 mov r8, r8

89:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

275 001a 5968 ldr r1, [r3, #4]

276 001c 8A1A subs r2, r1, r2

278 001e 8260 str r2, [r0, #8]

Good: those differ only in the delay instruction, and it’s the same number of octets in the instruction stream.

Now let’s see what the bitband assignment does to the instruction sequence when followed by
NOP:

Bitband assignment of 1 with NOP delays

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

125:main.c **** t0 = BSPACM_CORE_CYCCNT();

444 0000 394A ldr r2, .L8

126:main.c **** CONTEXT_INSN_GPIO();

446 0002 3A4B ldr r3, .L8+4

447 0004 0121 movs r1, #1

122:main.c **** {

449 0006 30B5 push {r4, r5, lr}

125:main.c **** t0 = BSPACM_CORE_CYCCNT();

455 0008 5468 ldr r4, [r2, #4]

458 000a 1960 str r1, [r3]

127:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

461 000c 5568 ldr r5, [r2, #4]

462 000e 2C1B subs r4, r5, r4

464 0010 0460 str r4, [r0]

128:main.c **** RESTORE_CONTEXT_INSN_GPIO();

466 0012 0024 movs r4, #0

467 0014 1C60 str r4, [r3]

130:main.c **** t0 = BSPACM_CORE_CYCCNT();

469 0016 5468 ldr r4, [r2, #4]

131:main.c **** CONTEXT_INSN_GPIO();

472 0018 1960 str r1, [r3]

132:main.c **** DELAY_INSN_NOP();

475 001a 00BF nop

133:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

480 001c 5368 ldr r3, [r2, #4]

481 001e 1B1B subs r3, r3, r4

482 0020 4360 str r3, [r0, #4]

134:main.c **** RESTORE_CONTEXT_INSN_GPIO();

484 0022 324B ldr r3, .L8+4

485 0024 0021 movs r1, #0

486 0026 1960 str r1, [r3]

136:main.c **** t0 = BSPACM_CORE_CYCCNT();

488 0028 5168 ldr r1, [r2, #4]

137:main.c **** CONTEXT_INSN_GPIO();

491 002a 0122 movs r2, #1 // *** OOPS

492 002c 1A60 str r2, [r3]

138:main.c **** DELAY_INSN_NOP(); DELAY_INSN_NOP();

495 002e 00BF nop

498 0030 00BF nop

139:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

503 0032 2D4A ldr r2, .L8

504 0034 5368 ldr r3, [r2, #4]

505 0036 5B1A subs r3, r3, r1

506 0038 8360 str r3, [r0, #8]

140:main.c **** RESTORE_CONTEXT_INSN_GPIO();

508 003a 2C4B ldr r3, .L8+4

509 003c 0021 movs r1, #0

511 003e 1960 str r1, [r3]

Don’t be misled: although the C source shows the read of the cycle counter occurring before some overhead instructions (e.g. the push), the actual read doesn’t occur until offset 8. So what’s being timed is what we want.

Finally, here’s the bitband assignment with
MOV R8,R8 as the delay instruction:

Bitband assignment of 1 with MOV R8, R8 delays

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

188:main.c **** t0 = BSPACM_CORE_CYCCNT();

726 0000 394A ldr r2, .L11

189:main.c **** CONTEXT_INSN_GPIO();

728 0002 3A4B ldr r3, .L11+4

729 0004 0121 movs r1, #1

185:main.c **** {

731 0006 30B5 push {r4, r5, lr}

188:main.c **** t0 = BSPACM_CORE_CYCCNT();

737 0008 5468 ldr r4, [r2, #4]

740 000a 1960 str r1, [r3]

190:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

743 000c 5568 ldr r5, [r2, #4]

744 000e 2C1B subs r4, r5, r4

746 0010 0460 str r4, [r0]

191:main.c **** RESTORE_CONTEXT_INSN_GPIO();

748 0012 0024 movs r4, #0

749 0014 1C60 str r4, [r3]

193:main.c **** t0 = BSPACM_CORE_CYCCNT();

751 0016 5468 ldr r4, [r2, #4]

194:main.c **** CONTEXT_INSN_GPIO();

754 0018 1960 str r1, [r3]

195:main.c **** DELAY_INSN_MOV();

757 001a C046 mov r8, r8

196:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

762 001c 5368 ldr r3, [r2, #4]

763 001e 1B1B subs r3, r3, r4

764 0020 4360 str r3, [r0, #4]

197:main.c **** RESTORE_CONTEXT_INSN_GPIO();

766 0022 324B ldr r3, .L11+4

767 0024 0021 movs r1, #0

768 0026 1960 str r1, [r3]

199:main.c **** t0 = BSPACM_CORE_CYCCNT();

770 0028 5168 ldr r1, [r2, #4]

200:main.c **** CONTEXT_INSN_GPIO();

773 002a 0122 movs r2, #1 // *** OOPS

774 002c 1A60 str r2, [r3]

201:main.c **** DELAY_INSN_MOV(); DELAY_INSN_MOV();

777 002e C046 mov r8, r8

780 0030 C046 mov r8, r8

202:main.c **** *dp++ = BSPACM_CORE_CYCCNT() - t0;

785 0032 2D4A ldr r2, .L11

786 0034 5368 ldr r3, [r2, #4]

787 0036 5B1A subs r3, r3, r1

788 0038 8360 str r3, [r0, #8]

203:main.c **** RESTORE_CONTEXT_INSN_GPIO();

790 003a 2C4B ldr r3, .L11+4

791 003c 0021 movs r1, #0

793 003e 1960 str r1, [r3]

So now that we’ve seen what’s being timed, let’s look at results again:

Null context timing results, both delay types

1

2

Null context, NOP: 1 2 3 4 5 6 7 8 1

Null context, MOV: 1 3 4 5 6 7 8 9 1

NOP consistently introduces a one-cycle delay, which is what us old-timers would expect an opcode named “NOP” to do. The
MOV R8,R8 instruction also introduces a one-cycle delay but only when it can be pipelined; a single instance in isolation takes two cycles.

This results requires a little analysis. If you look at the code, the instruction sequences with zero and one delay instruction are what we want to time. With two delay instructions the compiler happens to have loaded the RHS of the bitband store operation into a register within the timed sequence at highlighted line marked
***OOPS in the listing above.

From experience with MSP430 I normally use
-Os when compiling, since that enables optimizations designed to reduce code size. These optimizations tend to be a little weak; when
-O2 is used instead of
-Os the compiler is smarter and doesn’t do the load within the timed sequence:

-O2 results, null and bitband context, both delay types

1

2

3

4

Null context, NOP: 1 2 3 4 5 6 7 8 1

Null context, MOV: 1 3 4 5 6 7 8 9 1

GPIO context, NOP: 7 7 7 7 7 8 9 10 7

GPIO context, MOV: 7 7 7 7 7 8 9 10 7

You can go look at main.dis-O2 to check out what’s being timed here, but I claim it’s exactly what should be timed.

What this shows is that the peripheral bitband write takes six cycles to complete (subtracting the 1 cycle timing overhead), and the delay instruction gets absorbed into that regardless of which type of delay instruction is used. (Why it takes six cycles is a different question. A bitband write to an SRAM address instead of the peripheral register took five. I don’t know whether the pipeline has six/seven stages, or something else is stalling the CPU.)

My conclusions:

Don’t muck about trying to be clever: for a one-cycle delay just use
__NOP(), the ARM CMSIS standard spelling for an inline function that emits the
NOP instruction. Where it has an effect, it’s a one-cycle effect. Where it doesn’t, other instructions don’t behave any better.

The effect of the pipeline is much bigger than I anticipated: not only does the Cortex-M take advantage of the permission granted by the architected hint that
__NOP() can be dropped from the execution stage, the impact of the peripheral write eliminates the difference between a one- and a two-cycle instruction.

What this really means is that attempts to do small (1-3) cycle delays have fragile dependencies on the surrounding instructions, which in turn depend on the compiler and its optimization flags. If you’re getting a hard fault because you manipulate a module register too quickly after enabling the module, insert a
__NOP() or two and see if it works. If the exact cycle count of the code you write is critical, you’re going to have to analyze it in context.

You can stick with an existing, “well understood” system, and assume that you’re safe because it passes what you think is important to test. Or you can keep up to date with what’s provided by a vendor (who sees a lot more use cases and variations than you do). This is a management choice.

All I can say is that, in my own multi-decade experience, the biggest long-term source of destabilization comes not from regular updates to the current toolchain, but from staying with old tools until something happens that forces you to make a multi-version jump to a new compiler. (And I agree that a new version is a new compiler and cannot just be assumed to work; this is why one should develop complete regression suites with test harnesses to check the “can’t happen but actually did once” situations.) I can’t see what happens in proprietary systems, but it’s been many years since an update to GCC has resulted in my discovery of an undesirable behavioral change that wasn’t ultimately a bug in my own code, with the fix improving quality for that code and all code I’ve worked on since.

If you’re operating in a regulated environment where the cost of updating/certifying is prohibitive, so be it. Best approach in that case is to keep with the toolchain used for original release of the product, and release new products with the most recent toolchain so you’re always taking advantage of the best available solution at the time.

I’m not saying there’s a universally ideal policy, e.g. that you should always use the current toolchain. I am saying that a shop that develops and releases new products using old toolchains without a strong reason behind that decision is not using best practices and is likely to produce an inferior product. If management thinks they’re saving money and reducing risk by not updating, there’s a good chance they’re being short-sighted.

I first got a Wolverine (MSP430FR5969) chip back in August 2012 by badgering TI to send them to me and Daniel Beer so the open source toolchain comprising mspgcc and mspdebug would support them when the launchpad was released. That lash-up has been in BSP430 since October 2012, but wasn’t really usable. As of today, full support for EXP430FR5969 has been added to BSP430.

The device has a standard 14-pin JTAG header, and also supports the eZ-FET emulator through the micro-B USB connector as the first of the two interfaces (generally showing up as /dev/ttyACM0 on Linux). This emulator can be used with the ezfet driver under mspdebug. Which I suppose is good since you don’t need the MSP-FET430UIF, but check this comparison:

1

2

3

4

5

6

7

8

9

10

11

llc[287]$ time mspdebug ezfet "prog app.elf" > /dev/null

real 0m34.338s

user 0m0.064s

sys 0m0.236s

llc[288]$ time mspdebug tilib "prog app.elf" > /dev/null

real 0m8.095s

user 0m0.032s

sys 0m0.016s

llc[289]$ msp430-size app.elf

text data bss dec hex filename

30011 542 1370 31923 7cb3 app.elf

Four times faster with the FET430UIF. 15 seconds is overhead, the rest is just that eZ-FET is slower per byte transferred. I lived with this for six hours before I was comfortable enough to try the same fix I used for the EXP430F5529LP: disconnect everything, plug in only the EXP430FR5969, then run:

Shell

1

2

3

4

# verify you get a notice saying the firmware needs to be updated

mspdebug tilib

# do the update

sudo mspdebug tilib --allow-fw-update

Yes, you do have to do that as root even if you have udev rules allowing user-level access to the device: during the firmware update the vendor/product IDs change and those rules won’t apply. Now I can put the FET430UIF back in the closet.

Other miscellaneous items of interest:

In the current silicon and user’s guide, PM5CTL0.LOCKLPM5 powers up set. Unless explicitly cleared, no GPIO configuration takes effect. This includes useful things like “I’m alive” LED blinks. Since that behavior wasn’t present in the original FR58xx user’s guide or the old XSPFR5969 chips I got two years ago, I spent a few minutes wondering why the board didn’t blink after programming. KatiePier at 43oh to the rescue and now BSP430 clears that bit when the board is initialized. (And yes, of course you can prevent it from doing so, or hook in before it happens, so you can properly handle wakeups from LPM x.5.)

The micro-B USB port provides a back-channel UART that shows up as the second interface (/dev/ttyACM1). This uses EUSCI A0, while EUSCI A1 is connected to the standard launchpad UART pins (A.3 and A.4).

One of BSP430’s board-raising applications sends the MSP430 clock signals to headers so they can be externally validated; that’d be MCLK (the master CPU clock); SMCLK (sub-master for peripheral clocks); and ACLK (auxiliary clock). The EXP430FR5969 does not make any of these easy to get to: they’re only accessible on the JTAG header.

I’m unable to set the clock speed above 8 MHz; the next setting is 16 MHz, and an oscillator fault is generated immediately after CSCTL3.DIVM is cleared to produce an undivided MCLK (the power-up setting divides by 8). This is not an erratum listed in SLAZ473E, but 16MHz is the maximum listed operating speed. I was able to run at 12MHz by using a 24MHz DCOCLK with DIVM_2.

Each device has a unique 128-bit number in the TLV region with tag 0x15.

The ADC12_B is a 32-channel converter, and the standard internal and voltage inputs are disabled by default (and are not on the traditional channels 10 and 11). The new revision E chips are subject to erratum ADC40, described as

The ADC module may return large errors in its conversion results. The probability of conversion results with large errors varies depending on temperature and VCC.

and notes that the workaround is “None”. I’m seeing 5% errors randomly on temperature and voltage reads with all three reference voltage levels.

So: The device works, but is clearly still experimental (as indicated by the X430FR5969 part number, and the warning sheet included in the box). Good enough to bring BSP430 up to twelve supported platforms, though.

I started work on PyXB just over five years ago. At the time, Python 3.0 had just come out, but was far too new to hassle with, so I made Python 2.4 the minimum required version.

In September 2011 people started to hint they’d like Python 3 support, but it looked like it’d be an awful lot of work, and nobody asked officially, so I just kept it in the back of my mind. In June 2012 the noise was getting harder to ignore, so I logged the request but didn’t take it further.

Over the next year or so PyXB’s unicode support got stronger, and I started understanding exactly how much easier it’d be to do XML with a proper distinction between text (i.e., unicode) and data (i.e, octet sequences). Python 2 did this poorly, but the difference is deeply embedded in Python 3. In September 2013 I finally created a branch for Python 3 off the 1.2.3 release. This involved running 2to3 over the source then running a second script to fix the resulting errors. This was good enough to make available for folks who could build from the repository, but couldn’t support packaging a version because converting the source was too complex to run on an end-user’s machine.

While investigating an installation problem that ultimately turned out to be a bug in pip I discovered six. Six is a single module, released under the MIT license, that can be integrated into a Python package to allow the same source code to work under both Python 2 and Python 3. No more running 2to3. No more fixing up the mess 2to3 makes when it changes
pyxb.utils.unicode to
pyxb.utils.str.

As of today, the next branch of PyXB passes all tests using Python version 2.6 up through 3.4.0rc1 without source-code changes. Well, ok, some unit tests fail because whitespace in formatted XML changed in 2.7; the unittest.TestCase.assertRaisescontext manager feature isn’t handled in 2.6, 3.0, or 3.1; and I haven’t tested 3.0.1 because hacking its configure script so it can build a functional hashlib module on Ubuntu 12.04 isn’t worth the effort. Nonetheless, PyXB itself works fine.

There’s more work to be done. A packaged PyXB includes generated bindings for about 186 namespaces. When building from the repository those can be generated with the same Python that’ll be running them, so they might include Unicode literals which aren’t going to work across the gap where Python 3 didn’t support the unicode prefix (
u'text') until version 3.3. But the big hurdle has been overcome, and the next PyXB release should support all Python versions from 2.6 onward.

While looking into DocBook recently, I discovered that GNU emacs finally has a high-quality XML editing mode that includes validation of the documents. nXML mode is integrated into emacs23, and it comes with RELAX NG grammars to support docbook editing, though only for DocBook 4.2.

This is a great start, but when I tried using it with xmllint from libxml2 to validate some schemas supported by PyXB it said they were invalid. There are a variety of subtle issues the original version didn’t quite get right (and a few cases where my example schema were wrong). I’ve updated the schema to fix those issues, and made it available on github.

emacs nXML comes with an XSLT RELAX NG schema, but only for version 1.0. As XSLT 3 is nearly complete at the time I’m writing this, I was hoping to find support to validate against other XSLT versions as well. Turns out Norman Walsh has provided a unified solution for XSLT 1.0, 2.0, and 3.0 on github.

So: To support XSD and XSLT editing with nXML in Emacs 23, I put this in my .emacs file:

.emacs snippet for nxml support

1

2

3

4

5

6

7

8

9

;; nXML mode customization

(add-to-list 'auto-mode-alist '("\\.xsd\\'" . xml-mode))

(add-to-list 'auto-mode-alist '("\\.xslt\\'" . xml-mode))

(add-hook 'nxml-mode-hook

'(lambda ()

(make-local-variable 'indent-tabs-mode)

(setq indent-tabs-mode nil)

(add-to-list 'rng-schema-locating-files

"~/.emacs.d/nxml-schemas/schemas.xml")))

I copied the original schema from the rng4xsd and xslt-relax-ng repositories, and used Trang to convert from the standard RELAX NG XML syntax to the compact syntax used by nXML. Then the following goes into
~/.emacs.d/nxml-schemas/schemas.xml:

Generic algorithms in C++ operate by substituting specific types into templates that use features of the underlying types. Optimized implementations of algorithms can be selected when the type parameters satisfy certain constraints. A standard technique for this is overload selection via tag dispatching, but maybe you want a more abstract solution.

As shown previously, std::enable_if can be used to select an override if types are assignable. The condition can be arbitrarily complex; taking advantage of decltype and std::declval you could even check whether a particular member function can be called and will return a particular type:

The problem is that you need something to distinguish and prioritize the different overloads, so when multiple candidates are possible the best one is selected. In this case we used
int and
long. The blog post Remastered enable_if covers the options well, but the final word is in a followup post Beating overload resolution into submission which presents a solution with unbounded extensibility. It’s the one at the bottom of that post, if you don’t want to read the whole thing (though you should).

All the heavy lifting was done in those posts. I want to add a little value because some things about the solution leave me discomforted:

There’s a magic number 10 used to ground the template recursion, based on an assumption that no more than 10 overloads are necessary;

There’s a distinct type that’s used for the “none-of-the-above” situation, which in my view is non-orthogonal;

The example doesn’t work with clang: it compiles without warning, but prints “fizzbuzz” for every N.

I reduced this to LLVM bug 18677, which was promptly marked as a duplicate of LLVM bug 11723, showing that the bug has been present for about two years, so it’s clearly not a priority. In fact, it’s mentioned in the fine print of Remastered. The upshot is: using a variadic parameter pack to make signatures distinct without requiring an instance is a neat trick, but if you want to be portable to clang you can’t use it.

Fortunately, the solution in Beating shows us how to get an unbounded number of unique types that form an overload hierarchy without caring whether the template parameter is unique, so all we need do is change the canonical form back to the more traditional default template parameter:

This has the added benefit (IMO) of removing the custom template alias and directly using only constructs for which the meaning is well-defined and in the standard namespace.

The other two concerns are dealt with simply by inverting the priority indicated by the unsigned template parameter: let 0 be the lowest priority, which becomes what you use for the default case. To start the recursion, just keep track of the maximum overload count required for the particular function you need to support. Here’s the whole solution:

A trivial change, but this allows us to put the
overload_weight template into a library and use it for multiple functions without having to change it if somebody adds more alternatives than were anticipated.

At C++Now 2012 Marshall Clow presented Generic Programming in C++: A Real World Example which addressed the addition of a hex/unhex pair of functions to Boost.Utility. A future post may address why I think the design for this specific feature took a wrong turn right at the start, but as a pedagogical example of intermediate C++ generic programming it’s worth viewing.

The design includes an algorithm which expects a template parameter to provide certain capabilities. The original solution used std::enable_if to disable the definition when those requirements were not met. Around 00:45:00, Stephan T. Lavavej pointed out that disabling unacceptable overloads with
std::enable_if produces obscure errors uninterpretable by mortal users because the compiler won’t find a match, and that a cleaner solution is an outer function with a static_assert that invokes an inner function that implements the algorithm. After a very inconvenient interruption, comment from somebody I didn’t recognize at 00:47:40 pointed out that not all compilers terminate template expansion on the
static_assert failure, so using this approach you get the static assert diagnostic followed by the no-matching-function diagnostics. The commenter went on to propose a workaround where the inner function takes a
bool argument, constructed in the outer function from the
std::enable_if calculation, which bypasses the body if the expansion is not valid. Unfortunately the audio is unintelligible and I can’t figure out what technique was being recommended (did he say “
mpl:bool_”, “
template bool”; is the flag a template parameter or a function parameter; …).

All that’s the topic of this post. You can get the full source for the examples at this github gist.

So let’s start with a simple example. Here’s a generic algorithm that assigns one value to another:

no-check

1

2

3

4

5

template<typename T1,typename T2>

voiduseit(T1&t1,T2 t2)

{

t1=t2;

}

Here’s code that invokes it, but with types that don’t satisfy the expectations of the algorithm:

That’s not something I want my users to have to cope with. Sure, it says what the problem is, but there’s a lot of detail that’s just distracting, and it’d be a lot worse with more complex types in a more complex algorithm.

So: Assume we take the original approach from the talk and disable the generic algorithm when the types are not assignable:

The diagnostic is shorter, and somewhat helpful because the conditional is so simple, but still obscure and indirect.

What STL appeared to propose was to add a static assert which verifies the expectations of the parameter and emits a diagnostic when they aren’t satisfied, then delegates to the original version:

sa-check

1

2

3

4

5

6

7

8

9

10

11

12

13

14

template<typename T1,typename T2>

voiduseit_(T1&t1,T2 t2)

{

t1=t2;

}

/* Generate a diagnostic if the expectations aren't met, but defer the

* mis-use to another function */

template<typename T1,typename T2>

voiduseit(T1&t1,T2 t2)

{

static_assert(template_types_ok::value,"cannot assign T2 to T1");

useit_(t1,t2);

}

This is the same technique addressed in this blog post. And, just as the anonymous commenter in the video warned, the static assert failure didn’t prevent gcc from going on to produce the non-helpful cascading SFINAE errors:

I don’t know what the unrecognized commenter intended as the solution, but my reconstruction is the following: put the static assert in the user-called function, then delegate to a hidden overloaded implementation that provides the working algorithm only when the constraints are met, and provides a stub with no errors when they aren’t:

Walking through, in line 21 we alias
template_types_ok to a type that’s equivalent to either std::true_type or
std::false_type depending on whether or not the algorithm requirements are satisfied by the type parameters. In line 22 we check the satisfiability at compile-time and provide a user-level description of any failed expectation. Then line 23 we use the type that represents the satisfiability to select an implementation that won’t have compile-time errors. That one of the implementations wouldn’t work at runtime is irrelevant because it’s selected only when the static assert prevents compilation from succeeding.

For a couple days I worked around this by casting the integer literal to a type that satisfied the calls, but eventually I got fed up.

So I looked for alternatives. I found fault with the first two choices, but joy with the third. Herein are some examples with discussion of what they reveal about the choices. The files are available as a github gist.

The Test Criteria

Three specific assertions were found to cause trouble with various solutions, so the examples used below show all of them:

Comparing a
std::stringsize() with an integer literal;

Pointer-equality testing for
char* values;

Comparing a floating point result to a specific absolute accuracy

In addition, these criteria are relevant:

Verbosity: how much boilerplate do you have to add that isn’t really part of your test?

Installation overhead: is it easy to build the library for specific compiler flags or is the assumption that you build it once and share it? This matters when playing with advanced language feature flags such as
-std=c++1y, which can affect linking test cases together.

Assertion levels: when a test fails can you control whether the test keeps going or aborts (e.g., when following assertions would be invalid if the first fails).

Assertion comparisons: can you express specific relations (not equal, greater than) or is it mostly a true/false capability?

CppUnit

CppUnit comes with a standard configure/make/make install build process which installs the headers and the support library into the proper directories within a toolchain prefix. You need to provide a main routine to invoke the test driver.

CppUnit provides only one level of assertion: the test case aborts when it fails. It also has limited ability to express specific requirements (for example, there is
CPPUNIT_ASSERT_EQUAL(x,y) but no
CPPUNIT_ASSERT_NOT_EQUAL(x,y).

Here’s what the tests looks like with CppUnit:

cpput_eval.cc

C++

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

#include <cppunit/extensions/HelperMacros.h>

#include <string>

#include <cmath>

classtestStringStuff:publicCppUnit::TestFixture

{

protected:

voidtestBasic()

{

constchar*constcstr{"no\0no\0"};

conststd::stringstr("text");

CPPUNIT_ASSERT_EQUAL(std::size_t{4},str.size());

CPPUNIT_ASSERT(cstr!=(cstr+3));

}

private:

CPPUNIT_TEST_SUITE(testStringStuff);

CPPUNIT_TEST(testBasic);

CPPUNIT_TEST_SUITE_END();

};

CPPUNIT_TEST_SUITE_REGISTRATION(testStringStuff);

classtestFloatStuff:publicCppUnit::TestFixture

{

protected:

voidtestBasic()

{

CPPUNIT_ASSERT_DOUBLES_EQUAL(11.045,std::sqrt(122.0),0.001);

}

private:

CPPUNIT_TEST_SUITE(testFloatStuff);

CPPUNIT_TEST(testBasic);

CPPUNIT_TEST_SUITE_END();

};

CPPUNIT_TEST_SUITE_REGISTRATION(testFloatStuff);

There’s a lot of overhead, what with the need to define and register the suites, though it didn’t really bother me until I saw what other frameworks require. And I did have to do that irritating explicit cast to get the size comparison to compile.

The output is terse and all tests pass:

1

2

3

testFloatStuff::testBasic:OK

testStringStuff::testBasic:OK

OK(2)

Boost.Test

Boost is a federated collection of highly-coupled but independently maintained C++ libraries covering a wide range of capabilities. It includes Boost.Test, the unit test framework used by boost developers themselves.

Boost.Test can be used as a header-only solution, but I happened to install it in library form. This gave me a default main routine for invocation, though I did have to have a separate object file with preprocessor defines which incorporated it into the executable.

Boost.Test also supports three levels of assertion. WARN is a diagnostic only; CHECK marks the test as failing but continues; and REQUIRE marks the test as failing and stops the test. There are also a wide variety of conditions (
EQUAL,
NE,
GT, …), each of which is supported for each level.

Here’s what the tests look like with Boost.Test:

butf_eval.cc

C++

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

#include <boost/test/unit_test.hpp>

#include <string>

#include <cmath>

BOOST_AUTO_TEST_CASE(StringStuffBasic)

{

conststd::stringstr("text");

floatfa[2];

constchar*constcstr{"no\0no\0"};

BOOST_CHECK_EQUAL(4,str.size());

BOOST_CHECK_NE(fa,fa+1);

BOOST_CHECK_NE(cstr,cstr+3);

}

BOOST_AUTO_TEST_CASE(FloatStuffBasic)

{

BOOST_CHECK_CLOSE(11.045,std::sqrt(122),0.001);

}

This is much more terse than CppUnit, and seems promising. Here’s what happens when it runs:

Boost.Test silently treats the
char* pointers as though they were strings, and does a string comparison instead of a pointer comparison. Which is not what I asked for, and not what
BOOST_CHECK_NE(x,y) will do with other pointer types.

Boost.Test also does not provide a mechanism for absolute difference in floating point comparison. Instead, it provides two relative solutions:
BOOST_CHECK_CLOSE(v1,v2,pct) checks that
v1 and
v2 are no more than
pct percent different (e.g. 10 would be 10% different), while
BOOST_CHECK_CLOSE_FRACTION(v1,v2,frac) does the same thing but using fractions of a unit (e.g. 0.1 would be 10% different). Now, you can argue that there’s value in a relative error calculation. But to have two of them, and not have an absolute error check—that doesn’t work for me.

Boost.Test also has a few other issues. The released version has not been updated for four years, but the development version used internal to the Boost project has many changes, which are expected to be released at some point in the future. From comments on the boost developers mailing list the documentation is generally agreed to be difficult to use, and has produced a rewritten version (which, honestly, is what I had to use to try it out).

All in all, I don’t feel comfortable depending on Boost.Test.

Google Test

Google Test is another cross-platform unit test framework, which supports a companion mocking framework to support unit testing of capabilities that are not stand-alone.

The code comes with configure/make/install support, but also provides a single-file interface allowing it to be built easily within the project being tested with the same compiler and options as the code being tested. You do need a separate main routine, but it’s a two-liner to initialize the tests and run them all.

Google Test supports two levels of assertion: failure of an
ASSERT aborts the test, while failure of
EXPECT fails the test but continues to check additional conditions. It also provides a wide variety of conditions.

Here’s what the tests look like with Google Test:

gt_eval.cc

C++

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

#include <gtest/gtest.h>

#include <string>

#include <cmath>

TEST(StringStuff,Basic)

{

conststd::stringstr("text");

constchar*constcstr{"no\0no\0"};

ASSERT_EQ(4,str.size());

ASSERT_NE(cstr,cstr+3);

}

TEST(FloatStuff,Basic)

{

ASSERT_NEAR(11.045,std::sqrt(122.0),0.001);

}

Even more terse than Boost.Test, because it doesn’t use something like
GTEST_TEST or
GTEST_ASSERT_EQ. To avoid conflict with user code I normally expect framework tools to provide their interfaces within a namespace (literally for C++, or by using a standard identifier prefix where that wouldn’t work). Both CppUnit and Boost.Test do this for their macros, but for unit test code that doesn’t get incorporated into an application I think it’s ok that this isn’t done.

And here’s what you get when running it:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

[==========]Running2tests from2test cases.

[----------]Globaltest environment set-up.

[----------]1test from StringStuff

[RUN]StringStuff.Basic

[OK]StringStuff.Basic(0ms)

[----------]1test from StringStuff(0ms total)

[----------]1test from FloatStuff

[RUN]FloatStuff.Basic

[OK]FloatStuff.Basic(0ms)

[----------]1test from FloatStuff(0ms total)

[----------]Globaltest environment tear-down

[==========]2tests from2test cases ran.(0ms total)

[PASSED]2tests.

A little more verbose than I’m accustomed to from CppUnit, but it’s tolerable. The most important bit is the last line tells you the overall success, so you only need to scroll up if something didn’t work.

Conclusions

Summarizing the individual tests for each criterion, with a bold answer being preferable from my subjective perspective:

Feature

CppUnit

Boost.Test

Google Test

Handles size_t/int compares

no

yes

yes

Handles char* compares

yes

no

yes

Handles absolute float delta

yes

no

yes

Verbosity

high

low

low

Installation

toolchain

header-only or toolchain

project

Assertion Levels

one

three

two

Assertion Conditions

few

every

many

So I’m now happily using Google Test as the unit test framework for new C++ projects.

In fact, I’ve also started to use Google Mock, which turns out to be even more cool and eliminates the biggest limitation on unit testing: what to do if the routine being tested normally needs a heavy-weight and uncontrollable supporting infrastructure to satisfy its API needs. But I can’t really add anything beyond what you’ll can find on their wiki, so will leave it at that.