When you send colors, the 1 bits are fairly easy to encode but the 0 bits require that you reliably hold a pin high for just 250-400 nanoseconds. Too short and the LED will think that your 0 bit was a blip of noise, too long and it will think that your 0 is a 1. Using timer peripherals is a reasonable solution, but it requires a faster clock than 16MHz and we won’t be able to use interrupts because it takes about 20-30 clock cycles for the STM32 to jump to an interrupt handler. At 72MHz it takes my code about 300-400 nanoseconds to get to an interrupt handler, and that’s just not fast enough.

There are ways to make it faster, but this is also a good example of how difficult it can be to calculate how long your C code will take to execute ahead of time. Between compiler optimizations and hardware realities like Flash wait-states and pushing/popping functions, the easiest way to tell how long your code takes to run is often to simply run it and check.

Pulseview diagram of ‘101’ in Neopixel. I can’t be sure, but I think the ‘0’ pulse might be about 375 nanoseconds long.

Which brings us to the topic of this tutorial – we are going to write a simple program which uses an STM32 timer peripheral to draw colors to ‘Neopixel’ LEDs. Along the way, we will debug the program’s timing using Sigrok and Pulseview with an affordable 8-channel digital logic analyzer. These are available for $10-20 from most online retailers; try Sparkfun, or Amazon/eBay/Aliexpress/etc. I don’t know why Adafruit doesn’t sell these; maybe they don’t want to carry cheap generics in the same category as Salae analyzers. Anyways, go ahead and install Pulseview, brush up a bit on STM32 timers if you need to, and let’s get started!

A Quick Neopixel Test Program

So let’s start by writing a quick program to send a stream of 10101010 bytes with the Neopixel interface timing. We don’t know how long each pulse will turn out to be yet, but the basic logic will look the same as the state machine from the FPGA example. We send 24 bits of color for each LED, and we can send each bit of color with a simple process:

Set a timer to wait for X ticks. (X will be lower for a 0, higher for a 1.)

Pull the GPIO pin high.

Start the timer.

Wait for the timer’s counter to be >= X.

Pull the GPIO pin low.

Move on to the next bit.

This process is difficult to get working with a slow-ish chip like a 16MHz AVR or MSP430 core because steps 3-4 often take too long, but ARM Cortex-M cores like the STM32 can usually run fast enough to manage it. Note that you will need to temporarily disable hardware interrupts if your program uses them, because any unexpected delays between the pin toggles can throw things off.

The example code that I put on Github tries to support some F0, F1, and L4 chips, but to save space I’ll just present code written for an STM32L432KC ‘Nucleo’ board. ST sells them for about $11. I was thinking of using one of the timer’s PWM ‘one-shot’ modes with a different duty cycle for 0s and 1s, but I wanted something that would work on any GPIO pin, not just ones that are mapped to timer peripherals. So:

I guess you could also use a loop of asm("nop"); instructions if you’re going to guess-and-check, but however you decide to do things it starts with a guess of how many time units a 0 and a 1 will take. And you only really need to worry about the ‘high’ signal length, because the ‘low’ signal can be stretched out as long as you send another bit before the ‘latch’ time, which I think is around 10-50 microseconds for WS2812Bs and 50-100 microseconds for SK6812s.

Pulseview

Now we need to figure out how long each pulse actually lasts, and adjust them to fit the 350 and 900 nanosecond high-times that are expected for a 0 and 1 bit respectively. An oscilloscope is a good way to do this if you have one and know how to use it, but the sort of digital logic analyzer that I mentioned earlier is good enough for this purpose. You can get started pretty easily by installing Pulseview, which is a graphical interface built around the Sigrok project. Sparkfun has a good tutorial with instructions for installing and setting up these programs.

Once you have your logic analyzer plugged in and open in Pulseview, you can connect its ground cable to your microcontroller’s ground, and plug the signal wire into one of the channel cables; I usually pick channel 0 or 1, but it doesn’t matter. Make sure that the program is set to collect samples at 24MHz (the fastest supported by these devices), and click the ‘Run’ button with a grey circle next to it. The program should collect however many samples you asked for (the default is 1 million) and draw them in the main display:

‘1’ and ‘0’ Timing signals.

If you zoom in you can see that 1 pulses take longer than 0 pulses, and you can see a general timescale along the top of the screen, but it’s not immediately clear how long each pulse lasted. Pulseview supports a variety of secondary views which can automatically translate signals into other formats, so let’s add one to this view. Click the ‘Protocol Decoder’ menu (circled in purple), and click on the option that says ‘Timing’. A new bar should be added to the bottom of the display:

Pulseview View

You can move this signal up and down in the view by clicking and dragging its label along the left sidebar – I moved mine to the top so that it was close to the ‘channel 0’ signal. And you can configure it by double-clicking on the label. I set the ‘averaging period’ to zero so that it only showed the duration of each pulse.

Timer configuration menu

And now it’s easy to see how long each pulse lasted. The 0 pulse took too long, and so did the 1 pulse. I’m actually not sure if a 1 pulse has a maximum high-time – the LED chip might just wait for a falling edge – but I’d like to keep each bit close to 1.25 microseconds because less idle CPU time is usually better:

Pulseview timing view of some too-long pulses.

Changing the the values from 90 and 30 to 50 and 10 seems to get better results of 375 / 958 nanoseconds – then we just need to send 24 color bits and a latching signal instead of an endless stream of 10101010. The 32-bit coding for a color is 00GGRRBB (Green/Red/Blue), so this code should send one pixel of purple:

After building and uploading that, the first LED in a strand turns purple. Hooray:

One Purple Neopixel. New band name?

Desperate Times

This isn’t a great way to do things, but it worked fine for the nicer Cortex-M3 and -M4 chips which could run at 72MHz – the STM32L432KC can actually do 80MHz, but I ran it at the same 72MHz speed as the cheap and popular STM32F103C8 to demonstrate how the timings worked out differently even with the same core clock speed. The L432 is newer, and it includes more options for caching and speeding up flash access. It also uses a Cortex-M4 core with a larger instruction set, so even though both chips run at 72MHz, the STM32L432 performs better in the actual application here; the STM32F103 barely even has time to run one cycle of its ‘while’ loop.

With so much variance between similar chips made by the same manufacturer, you can see why I sometimes reach for the oscilloscope instead of the calculator for things like this. A few hundred nanoseconds don’t usually matter, but here it’s the difference between the right color and a blinding white light. And along those lines, when I tried to get this working on the very affordable STM32F030K6, I found that its maximum 48MHz clock speed was a bit too slow to reliably send a 0 – it ended up taking about 500 nanoseconds to toggle the pin with the register write and while loop in between.

There are a few ways that this code could be faster. I could put the code in RAM and run it from there, or maybe write a section of assembly code to quickly check the counter register. But I didn’t want to change the code too much, so it seemed easiest to just try running the chip a little faster. ST says not to do this all over the reference manual, so this may or may not damage the chip – try it at your own risk.

But if you look at the FLASH_ACR register description in the reference manual, (section 3.5.1) it says that there are 3 bits available for setting the number of wait states. It only lists 000 (0-24MHz) and 001 (24-48MHz) as available options, but I tried using 010 (2 wait-states for 48-72MHz?) with a PLL multiplication of 16 for its HSI/2 input for a 64MHz core clock speed, and that seemed to work. I wonder how fast you could go? It sounds like it depends on what peripherals you want to use, but it looks like these STM32 chips might be significantly faster than they appear for general-purpose applications.

Conclusions

This isn’t exactly an elegant way to figure out how long your code will take to run , and I don’t usually like solutions that rely on “magic numbers”. But it is easy and fast, so when you find yourself on a tight timeline that just needs some goddamned pretty sparkly colors yesterday, these logic analyzers are nice tools to have.

Related posts:

Like many of us, I’ve been stuck indoors without much to do for the past month or so. Unfortunately, I’m also in the process of moving, so I don’t know anyone in the local area and most of my ‘maker’ equipment is in storage. But there’s not much point in sulking for N months straight, so I’ve been looking at this as an opportunity to learn about designing and implementing FPGA circuits.

I tried getting into Verilog a little while ago, but that didn’t go too well. I did manage to write a simple WS2812B “NeoPixel” driver, but it was clunky and I got bored soon after. In my defense, Verilog and VHDL are not exactly user-friendly or easy to learn. They can do amazing things in the hands of people who know how to use them, but they also have a steep learning curve.

Luckily for us novices, open-source FPGA development tools have advanced in leaps and bounds over the past few years. The yosys and nextpnr projects have provided free and (mostly) vendor-agnostic tools to build designs for real hardware. And a handful of high-level code generators have also emerged to do the heavy lifting of generating Verilog or VHDL code from more user-friendly languages. Examples of those include the SpinalHDL Scala libraries, and the nMigen Python libraries which I’ll be talking about in this post.

I’ve been using nMigen to write a simple RISC-V microcontroller over the past couple of months, mostly as a learning exercise. But I also like the idea of using an open-source MCU for smaller projects where I would currently use something like an STM32 or MSP430. And most importantly, I really want some dedicated peripherals for driving cheap addressable “NeoPixel” LEDs; I’m tired of needing to mis-use a SPI peripheral or write carefully-timed assembly code which cannot run while interrupts are active.

But that will have to wait for a follow-up post; for now, I’m going to talk about some simpler tasks to introduce nMigen. In this post, we will learn how to read “program data” from the SPI Flash chip on an iCE40 FPGA board, and how to use that data to light up the on-board LEDs in programmable patterns.

The LEDs on these boards are very bright, because you’re supposed to use PWM to drive them.

The target hardware will be an iCE40UP5K-SG48 chip, but nMigen is cross-platform so it should be easy to adapt this code for other FPGAs. If you want to follow along, you can find a 48-pin iCE40UP5K on an $8-20 “Upduino” board or a $50 Lattice evaluation board. If you get an “Upduino”, be careful not to mis-configure the SPI Flash pins; theoretically, you could effectively brick the board if you made it impossible to communicate with the Flash chip. The Lattice evaluation board has jumpers which you could unplug to recover if that happens, but I don’t think that the code presented here should cause those sorts of problems. I haven’t managed to brick anything yet, knock on wood…

Be aware that the Upduino v1 board is cheaper because it does not include the FT2232 USB/SPI chip which the toolchain expects to communicate with, so if you decide to use that option, you’ll need to know how to manually write a binary file to SPI Flash in lieu of the iceprog commands listed later in this post.

Across the globe, people seem to enjoy decorating their homes, communities, and outdoor spaces with lights and ornaments during the winter holidays. Maybe it helps with the depressingly early sunsets for those of us who don’t live near the equator. Anyways, I thought it’d be fun to make some ornaments with multi-color addressable LEDs last year, and I figured I’d write about what worked and what didn’t.

I didn’t have many microcontrollers at the time because I was visiting family for the holidays, so I ended up coding the lighting patterns for a cheap little STM32F103 “black pill” board which was in the bottom of my backpack. And it’s a convenient coincidence that I just started learning about the very similar GD32VF103 chips with their fancy RISC-V CPUs and nearly-identical peripheral layout, so this also seems like a good opportunity to write about how to cross-compile the same code for two different CPU architectures.

Pretty holiday stars! “Frosted white” acrylic sheets aren’t the best way to diffuse light, but they are cheap and easy to work with.

This was a fun and festive project, and it might not be a bad way to introduce people to embedded development since there are so many ways to drive these ubiquitous “NeoPixel” LEDs. Sorry that this post is a little bit late for the winter holidays – I’ve been traveling for the past few months – but maybe it’ll get you thinking about next year 🙂

I’ll talk about how I assembled the stars and what I might do differently next time, then I’ll review how to light them up with an STM32F103, and how to adapt that code for a GD32VF103. But you could also use a MicroPython or Arduino board to set the LED colors if you don’t want to muck around with peripheral registers.

I’ve written a little bit in the past about how to design a basic STM32 breakout board, and how to write simple software that runs on these kinds of microcontrollers. But let’s be honest: there’s still a bit of a gap between creating a small breakout board to blink an LED, and building hardware / software for a ‘real-world’ application. Personally, I would still want a couple of more experienced engineers to double-check any designs that I wanted to be reliable enough for other people to use, but building more complex applications is a great way to help yourself learn.

So in this post, I’m going to walk through the process of designing a small ‘gameboy’-style handheld with a GPS receiver and microSD card slot, for exploring the outdoors instead of video games. Don’t get me wrong, you could still write games to run on this if you wanted to, and that would be fun, but everyone and their dog has made a Cortex-M-based handheld game console by now; there are plenty of better guides for that, and many of those authors put a lot more time into their designs and firmware than I ever did.

Assembled GPS Doohicky. I left too much room between the ribbon connector footprint and the edge of the board on this first revision, so the display couldn’t fold over quite right. Oh well, you live and learn.

The board design isn’t too complicated, but there are several different parts and it gets easier to make small-but-important mistakes as a design gets larger. It mostly uses peripherals that I’ve talked about previously, but there are a couple of new ones too. The display will be driven over SPI, the speaker uses a DAC, the GPS receiver talks over UART, the battery and light levels will be read using an ADC, and the buttons will be listened to using interrupts. But I haven’t written about the USB or SD card (“MMC”) peripherals, and those will need to go in a future post since I haven’t actually worked them out myself yet. Note that SD cards can technically use either SPI or SD/MMC to communicate, but the microcontroller that I picked has a dedicated SD/MMC peripheral, and I wanted to learn about it.