copyleft hardware planet

December 12, 2017

AltOS 1.8.3 — TeleMega version 3.0 support and bug fixes

AltOS is the core of the software
for all of the Altus Metrum products. It
consists of firmware for our cc1111, STM32L151, STMF042, LPC11U14 and ATtiny85
based electronics and Java-based ground station software.

This is a minor release of AltOS, including support for our new
TeleMega v3.0 board and a selection
of bug fixes

Announcing TeleMega v3.0

TeleMega is our top of the line
flight computer with 9-axis IMU, 6 pyro channels, uBlox Max 7Q GPS and
40mW telemetry system. Version 3.0 is feature compatible with version
2.0, incorporating a new higher-perfomance 9-axis IMU in place of the
former 6-axis IMU and separate 3-axis magnetometer.

AltOS 1.8.3

In addition to support for TeleMega v3.0 boards, AltOS
1.8.3 contains some important bug
fixes for all flight computers. Users are advised to upgrade their
devices.

Ground testing EasyMega and TeleMega additional pyro
channels could result in a sticky 'fired' status which would
prevent these channels from firing on future flights.

December 04, 2017

The Netdev 2.2 conference took place in Seoul, South Korea. As we work on a diversity of networking topics at Free Electrons as part of our Linux kernel contributions, Free Electrons engineers Alexandre Belloni and Antoine Ténart went to Seoul to attend lots of interesting sessions and to meet with the Linux networking community. Below, they report on what they learned from this conference, by highlighting two talks they particularly liked.

David S. Miller gave a keynote about reducing the size of core structures in the Linux kernel networking core. The idea behind his work is to use smaller structures which has many benefits in terms of performance as less cache misses will occur and less memory resources are needed. This is especially true in the networking core as small changes may have enormous impacts and improve performance a lot. Another argument from his maintainer hat perspective is the maintainability, where smaller structures usually means less complexity.

He presented five techniques he used to shrink the networking core data structures. The first one was to identify members of common base structures that are only used in sub-classes, as these members can easily be moved out and not impact all the data paths.

The second one makes use of what David calls “state compression”, aka. understanding the real width of the information stored in data structures and to pack flags together to save space. In his mind a boolean should take a single bit whereas in the kernel it requires way more space than that. While this is fine for many uses it makes sense to compress all these data in critical structures.

Then David S. Miller spoke about unused bits in pointers where in the kernel all pointers have 3 bits never used. He argued these bits are 3 boolean values that should be used to reduce core data structure sizes. This technique and the state compression one can be used by introducing helpers to safely access the data.

Another technique he used was to unionize members that aren’t used at the same time. This helps reducing even more the structure size by not having areas of memory never used during identified steps in the networking stack.

Finally he showed us the last technique he used, which was using lookup keys instead of pointers when the objects can be found cheaply based on their index. While this cannot be used for every object, it helped reducing some data structures.

While going through all these techniques he gave many examples to help understanding what can be saved and how it was effective. This was overall a great talk showing a critical aspect we do not always think of when writing drivers, which can lead to big performance improvements.

Jason A. Donenfeld presented his new and shiny L3 network tunneling mechanism, in Linux. After two years of development this in-kernel formally proven cryptographic protocol is ready to be submitted upstream to get the first rounds of review.

The idea behind Wireguard is to provide, with a small code base, a simple interface to establish and maintain encrypted tunnels. Jason made a demo which was impressive by its simplicity when securely connecting two machines, while it can be a real pain when working with OpenVPN or IPsec. Under the hood this mechanism uses UDP packets on top of either IPv4 and IPv6 to transport encrypted packets using modern cryptographic principles. The authentication is similar to what SSH is using: static private/public key pairs. One particularly nice design choice is the fact that Wireguard is exposed as a stateless interface to the administrator whereas the protocol is stateful and timer based, which allow to put devices into sleep mode and not to care about it.

One of the difficulty to get Wireguard accepted upstream is its cryptographic needs, which do not match what can provide the kernel cryptographic framework. Jason knows this and plan to first send patches to rework the cryptographic framework so that his module nicely integrates with in-kernel APIs. First RFC patches for Wireguard should be sent at the end of 2017, or at the beginning of 2018.

We look forward to seeing Wireguard hit the mainline kernel, to allow everybody to establish secure tunnels in an easy way!

Jason A. Donenfeld at Netdev 2.2

Conclusion

Netdev 2.2 was again an excellent experience for us. It was an (almost) single track format, running alongside the workshops, allowing to not miss any session. The technical content let us dive deeply in the inner working of the network stack and stay up-to-date with the current developments.

Thanks for organizing this and for the impressive job, we had an amazing time!

November 26, 2017

It’s unusual to find a ware without a clear winner, and reading through the comment thread I found a lot of near-misses but none of them close enough for me to declare a winner.

The Ware for October 2017 is a Minibar Systems automated hotel minibar (looks like a “SmartCube 40i”, “The Minibar of the Future”). During an overnight layover, I decided to check the minibar for snack options, but upon pulling what I thought was the handle for the fridge, lo and behold a tray of electronics presented itself. My friend, extremely amused by my enthusiastic reaction, snapped this picture of me adding the ware to my catalog:

During the closing session of this conference, Free Electrons CEO Michael Opdenacker has received from the hands of Tim Bird, on behalf of the ELCE committee, an award for its continuous participation to the Embedded Linux Conference Europe. Indeed, Michael has participated to all 11 editions of ELCE, with no interruption. He has been very active in promoting the event, especially through the video recording effort that Free Electrons did in the early years of the conference, as well as through the numerous talks given by Free Electrons.

Free Electrons is proud to see its continuous commitment to knowledge sharing and community participation be recognized by this award!

According to Linux Kernel Patch statistics, Free Electrons contributed 111 patches to this release, making it the 24th contributing company by number of commits: a somewhat lower than usual contribution level from our side. At least, Free Electrons cannot be blamed for trying to push more code into 4.14 because of its Long Term Support nature!

The main highlights of our contributions are:

On the RTC subsystem, Alexandre Belloni made as usual a number of fixes and improvements to various drivers, especially the ds1307 driver.

On the NAND subsystem, Boris Brezillon did a number of small improvements in various areas.

On the support for Marvell platforms

Antoine Ténart improved the ppv2 network driver used by the Marvell Armada 7K/8K SoCs: support for 10G speed and TSO support are the main highlights. In order to support 10G speed, Antoine added a driver in drivers/phy/ to configure the common PHYs in the Armada 7K/8K SoCs.

Thomas Petazzoni also improved the ppv2 network driver by adding support for TX interrupts and per-CPU RX interrupts.

Grégory Clement contributed some patches to enable NAND support on Armada 7K/8K, as well as a number of fixes in different areas (GPIO fix, clock handling fixes, etc.)

Maxime Ripard contributed the support for a new board, the BananaPI M2-Magic. Maxime also contributed a few fixes to the Allwinner DRM driver, and a few other misc fixes (clock, MMC, RTC, etc.).

Quentin Schulz contributed the support for the power button functionality of the AXP221 (PMIC used in several Allwinner platforms)

On the support for Atmel platforms, Quentin Schulz improved the clock drivers for this platform to properly support the Audio PLL, which allowed to fix the Atmel audio drivers. He also fixed suspend/resume support in the Atmel MMC driver to support the deep sleep mode of the SAMA5D2 processor.

In addition to making direct contributions, Free Electrons is also involved in the Linux kernel development by having a number of its engineers act as Linux kernel maintainers. As part of this effort, Free Electrons engineers have reviewed, merged and sent pull requests for a large number of contributions from other developers:

November 16, 2017

Recently, our customer Senic asked us to integrate an Over-The-Air (OTA) mechanism in their embedded Linux system, and after some discussion, they ended up chosing Mender. This article will detail an example of Mender’s integration and how to use it.

What is Mender?

Mender is an open source remote updater for embedded devices. It is composed of a client installed on the embedded device, and a management server installed on a remote server. However, the server is not mandatory as Mender can be used standalone, with updates triggered directly on the embedded device.

In order to offer a fallback in case of failure, Mender uses the double partition layout: the device will have at least 2 rootfs partitions, one active and one inactive. Mender will deploy an update on the inactive partition, so that in case of an error during the update process, it will still have the active partition intact. If the update succeeds, it will switch to the updated partition: the active partition becomes inactive and the inactive one becomes the new active. As the kernel and the device tree are stored in the /boot folder of the root filesystem, it is possible to easily update an entire system. Note that Mender needs at least 4 partitions:

bootloader partition

data persistent partition

rootfs + kernel active partition

rootfs + kernel inactive partition

It is, of course, customizable if you need more partitions.

Two reference devices are supported: the BeagleBone Black and a virtual device. In our case, the board was a Nanopi-Neo, which is based on an Allwinner H3.

Mender provides a Yocto Project layer containing all the necessary classes and recipes to make it work. The most important thing to know is that it will produce an image ready to be written to an SD card to flash empty boards. It will also produce “artifacts” (files with .mender extension) that will be used to update an existing system.

Installation and setup

In this section, we will see how to setup the Mender client and server for your project. Most of the instructions are taken from the Mender documentation that we found well detailed and really pleasant to read. We’ll simply summarize the most important steps.

Server side

The Mender server will allow you to remotely update devices. The server can be installed in two modes:

demo mode: Used to test a demo server. It can be nice to test it if you just want to quickly deploy a Mender solution, for testing purpose only. It includes a demo layer that simplify and configure for you a default Mender server on localhost of your workstation.

production mode: Used for production. We will focus on this mode as we wanted to use Mender in a production context. This mode allows to customize the server configuration: IP address, certificates, etc. Because of that, some configuration will be necessary (which is not the case in the demo mode).

In order to install the Mender server, you should first install Docker CE and Docker Compose. Have a look at the corresponding Docker instructions.

Final configuration

This final configuration will link the generated keys with the Mender server. All the modifications will be in the prod.yml file.

Locate the storage-proxy service in prod.yml and set it to your domain name. In our case s3.foobar.com under the networks.mender.aliases

Locate the minio service. Set MINIO_ACCESS_KEY to “mender-deployments” and the MINIO_SECRET_KEY to a generated password (with e.g.: $ apg -n1 -a0 -m32)

Locate the mender-deployments service. Set DEPLOYMENTS_AWS_AUTH_KEY and DEPLOYMENTS_AWS_AUTH_SECRET to respectively the value of MINIO_ACCESS_KEY and MINIO_SECRET_KEY. Set DEPLOYMENTS_AWS_URI to point to your domain such as https://s3.foobar.com:9000

Start the server

Make sure that the domain names you have defined (mender.foobar.com and s3.foobar.com) are accessible, potentially by adding them to /etc/hosts if you’re just testing.

By default, Mender assumes that your storage device is /dev/mmcblk0, that mmcblk0p1 is your boot partition (containing the bootloader), that mmcblk0p2 and mmcblk0p3 are your two root filesystem partitions, and that mmcblk0p5 is your data partition. If that’s the case for you, then everything is fine! However, if you need a different layout, you need to update your machine configuration. Mender’s client will retrieve which storage device to use by using the MENDER_STORAGE_DEVICE variable (which defaults to mmcblk0). The partitions themselves should be specified using MENDER_BOOT_PART, MENDER_ROOTFS_PART_A, MENDER_ROOTFS_PART_B and ROOTFS_DATA_PART. If you need to change the default storage or the partitions’ layout, edit in your machine configuration the different variables according to your need. Here is an example for /dev/sda:

Do not forget to update the artifact name in your local.conf, for example:

MENDER_ARTIFACT_NAME = "release-1"

As described in Mender’s documentation, Mender will store the artifact name in its artifact image. It must be unique which is what we expect because an artifact will represent a release tag or a delivery. Note that if you forgot to update it and upload an artifact with the same name as an existing in the web UI, it will not be taken into account.

U-Boot configuration tuning

Some modifications in U-Boot are necessary to be able to perform the rollback (use a different partition after an unsuccessful update)

Mender needs BOOTCOUNT support in U-Boot. It creates a bootcount variable that will be incremented each time a reboot appears (or reset to 1 after a power-on reset). Mender will use this variable in its rollback mechanism.
Make sure to enable it in your U-Boot configuration. This will most likely require a patch to your board .h configuration file, enabling:

Tune your U-Boot environment to use Mender’s variables. Here are some examples of the modifications to be done. Set the root= kernel argument to use ${mender_kernel_root}, set the bootcmd to load the kernel image and Device Tree from ${mender_uboot_root} and to run mender_setup. Make sure that you are loading the Linux kernel image and Device Tree file from the root filesystem /boot directory.

Mender’s client recipe

As stated in the introduction, Mender has a client, in the form of a userspace application, that will be used on the target. Mender’s layer has a Yocto recipe for it but it does not have our server certificates. To establish a connection between the client and the server, the certificates have to be installed in the image. For that, a bbappend recipe will be created. It will also allow to perform additional Mender configuration, such as defining the server URL.

Recompile an image and now, we should have everything we need to be able to update an image. Do not hesitate to run the integration checklist, it is really a convenient way to know if everything is correctly configured (or not).

If you want to be more robust and secure, you can sign your artifacts to be sure that they come from a trusted source. If you want this feature, have a look at this documentation.

Usage

Standalone mode

To update an artifact using the standalone mode (i.e. without server), here are the commands to use. You will need to update them according to your needs.

You can also use the mender command to start an update from a local .mender file, provided by a USB key or SD card.

Once finished, you will have to reboot the target manually

$ reboot

After the first reboot, you will be on the the new active partition (if the previous one was /dev/mmcblk0p2, you should be on /dev/mmcblk0p3). Check the kernel version, artifact name or command line:

$ uname -a
$ cat /etc/mender/artifact_info
$ cat /proc/cmdline

If you are okay with this update, you will have to commit the modification otherwise the update will not be persistent and once you will reboot the board, Mender will rollback to the previous partition:

$ mender -commit

Using Mender’s server UI

The Mender server UI provides a management interface to deploy updates on all your devices. It knows about all your devices, their current software version, and you can plan deployments on all or a subset of your devices. Here are the basic steps to trigger a deployment:

The first time, you will have to authorize the device. You will find it in your “dashboard” or in the “devices” section.

After authorizing it, it will retrieve device information such as current software version, MAC address, network interface, and so on

To update a partition, you will have to create a deployment using an artifact.

Upload the new artifact in the server UI using the “Artifacts” section

Deploy the new artifact using the “deployment” or the “devices” section. You will retrieve the status of the deployment in the “status” field. It will be in “installing”, “rebooting”, etc. The board will reboot and the partition should be updated.

Troubleshooting

Here are some issues we faced when we integrated Mender for our device. The Mender documentation also has a troubleshooting section so have a look at it if you are facing issues. Otherwise, the community seems to be active even if we did not need to interact with it as it worked like a charm when we tried it.

Update systemd’s service starting

By default, the Mender systemd service will start after the service “resolved” the domain name. On our target device, the network was available only via WiFi. We had to wait for the wlan0 interface to be up and configured to automatically connect a network before starting Mender’s service. Otherwise, it leads to an error due to the network being unreachable. To solve this issue which is specific to our platform, we set the systemd dependencies to “network-online.target” to be sure that a network is available:

To solve this issue, update the date on your board and make sure your RTC is correctly set.

Device deletion

While testing Mender’s server (version 1.0), we always used the same board and got into the issue that the board was already registered in the Server UI but had a different Device ID (which is used by Mender to identify devices). Because of that, the server was always rejecting the authentication. The next release of the Mender server offers the possibility to remove a device so we updated the Mender’s server to the last version.

Deployments not taken into account

Note that the Mender’s client is checking by default every 30 minutes if a deployment is available for this device. During testing, you may want to reduce this period, which you can in the Mender’s configuration file using its UpdatePollIntervalSeconds variable.

Conclusion

Mender is an OTA updater for Embedded devices. It has a great documentation in the form of tutorials which makes the integration easy. While testing it, the only issues we got were related to our custom platform or were already indicated in the documentation. Deploying it on a board was not difficult, only some U-Boot/kernel and Yocto Project modifications were necessary. All in all, Mender worked perfectly fine for our project!

November 15, 2017

As discussed in our previous blog post, Free Electrons had a strong presence at the Embedded Linux Conference Europe, with 7 attendees, 4 talks, one BoF and one poster during the technical show case.

In this blog post, we would like to highlight a number of talks from the conference that we found interesting. Each Free Electrons engineer who attended the conference has selected one talk, and gives his/her feedback about this talk.

uClibc Today: Still Makes Sense – Alexey Brodkin

Talk selected by Michael Opdenacker

Alexey Brodkin, an active contributor to the uClibc library, shared recent updates about this C library, trying to show people that the project was still active and making progress, after a few years during which it appeared to be stalled. Alexey works for Synopsys, the makers of the ARC architecture, which uClibc supports.

If you look at the repository for uClibc releases, you will see that since version 1.0.0 released in 2015, the project has made 26 releases according to a predictable schedule. The project also runs runtime regression tests on all its releases, which weren’t done before. The developers have also added support for 4 new architectures (arm64 in particular), and uClibc remains the default C library that Buildroot proposes.

Alexey highlighted that in spite of the competition from the musl library, which causes several projects to switch from uClibc to musl, uClibc still makes a lot of sense today. As a matter of fact, it supports more hardware architectures than glibc and musl do, as it’s the only one to support platforms without an MMU (such as noMMU ARM, Blackfin, m68k, Xtensa) and as the library size is still smaller that what you get with musl (though a static hello_world program is much smaller with musl if you have a close look at the library comparison tests he mentioned).

Alexey noted that the uClibc++ project is still alive too, and used in OpenWRT/Lede by default.

An SoC is made of multiple IP blocks from different vendors. In some cases the source or model of the hardware blocks are neither documented nor marketed by the SoC vendor. However, since there are only very few vendors of a given IP block, stakes are high that your SoC vendor’s undocumented IP block is compatible with a known one.

With his experience in developing drivers for multiple IP blocks present in Allwinner SoCs and as a maintainer of those same SoCs, Chen-Yu first explained that SoC vendors often either embed some vendors’ licensed IP blocks in their SoCs and add the glue around it for platform- or SoC-specific hardware (clocks, resets and control signals), or they clone IP blocks with the same logic but some twists (missing, obfuscated or rearranged registers).

To identify the IP block, we can dig into the datasheet or the vendor BSP and compare those with well documented datasheets such as the one for NXP i.MX6, TI KeyStone II or the Zynq UltraScale+ MPSoC, or with mainline drivers. Asking the community is also a good idea as someone might have encountered an IP with the same behaviour before and can help us identify it quicker.

Some good identifiers for IPs could be register layouts or names along with DMA logic and descriptor format. For the unlucky ones that have been provided only a blob, they can look for the symbols in it and that may slightly help in the process.

He also mentioned that identifying an IP block is often the result of the developer’s experience in identifying IPs and other time just pure luck. Unfortunately, there are times when someone couldn’t identify the IP and wrote a complete driver just to be told by someone else in the community that this IP reminds him of that IP in which case the work done can be just thrown away. That’s where the community plays a strong role, to help us in our quest to identify an IP.

Chen-Yu then went on with the presentation of the different ways to handle the multiple variants of an IP blocks in drivers. He said that the core logic of all IP drivers is usually presented as a library and that the different variants have a driver with their own resources, extra setup and use this library. Also, a good practice is to use booleans to select features of IP blocks instead of using the ID of each variant.
For IPs whose registers have been altered, the way to go is usually to write a table for the register offsets, or use regmaps when bitfields are also modified. When the IP block differs a bit too much, custom callbacks should be used.

He ended his talk with his return from experience on multiple IP blocks (UART, USB OTG, GMAC, EMAC and HDMI) present in Allwinner SoCs and the differences developers had to handle when adding support for them.

printk(): The Most Useful Tool is Now Showing its Age – Steven Rostedt & Sergey Senozhatsky

Talks selected by Boris Brezillon. Boris also covered the related talk “printk: It’s Old, What Can We Do to Make It Young Again?” from the same speakers.

Maybe I should be ashamed of saying that but printk() is one of the basic tool I’m using to debug kernel code, and I must say it did the job so far, so when I saw these presentations talking about how to improve printk() I was a bit curious. What could be wrong in printk() implementation?
Before attending the talks, I never digged into printk()’s code, because it just worked for me, but what I thought was a simple piece of code appeared to be a complex infrastructure with locking scheme that makes you realize how hard it is when several CPUs are involved.

At its core, printk() is supposed to store logs into a circular buffer and push new entries to one or several consoles. In his first talk Steven Rostedt walked through the history of printk() and explained why it became more complex when support for multi CPU appeared. He also detailed why printk() is not re-entrant and the problem it causes when called from an NMI handler. He finally went through some fixes that made the situation a bit better and advertised the 2nd half of the talk driven by Sergey Senozhatsky.

Note that between these two presentations, the printk() rework has been discussed at Kernel Summit, so Sergey already had some feedback on his proposals. While Steven presentation focused mainly on the main printk() function, Sergey gave a bit more details on why printk() can deadlock, and one of the reasons why preventing deadlocks is so complicated is that printk() delegates the ‘print to console’ aspect to console drivers which have their own locking scheme. To address that, it is proposed to move away from the callback approach and let console drivers poll for new entries in the console buffer instead, which would remove part of the locking issues. The problem with this approach is that it brings even more uncertainty on when the logs are printed on the consoles, and one of the nice things about printk() in its current form is that you are likely to have the log printed on the output before printk() returns (which helps a lot when you debug things).

More robust I2C designs with a new fault-injection driver – Wolfram Sang

Talk selected by Miquèl Raynal

Although Wolfram had a lot of troubles starting its presentation lacking a proper HDMI adaptater, he gave an illuminating talk about how, as an I2C subsystem maintainer, he would like to strengthen the robustness of I2C drivers.

He first explained some basics of the I2C bus like START and STOP conditions and introduced us to a few errors he regularly spots in drivers. For instance, some badly written drivers used a START and STOP sequence while a “repeated START” was needed. This is very bad because another master on the bus could, in this little spare idle delay, decide to grab the medium and send its message. Then the message right after the repeated START would not have the expected effect. Of course plenty other errors can happen: stalled bus (SDA or SCL stuck low), lost arbitration, faulty bits… All these situations are usually due to incorrect sequences sent by the driver.

To avoid so much pain debugging obscure situations where this happens, he decided to use an extended I2C-gpio interface to access SDA and SCL from two external GPIOs and this way forces faulty situations by simply pinning high or low one line (or both) and see how the driver reacts. The I2C specification and framework provide everything to get out of a faulty situation, it is just a matter of using it (sending a STOP condition, clocking 9 times, operate a reset, etc).

Wolfram is aware of his quite conservative approach but he is really scared about breaking users by using random quirks so he tried with this talk to explain his point of view and the solutions he wants to promote.

Two questions that you might have a hard time hearing were also interesting. The first person asked if he ever considered using a “default faulty chip” designed to do by itself this kind of fault injection and see how the host reacts and behaves. Wolfram said buying hardware is too much effort for debugging, so he was more motivated to get something very easy and straightforward to use. Someone else asked if he thought about multiple clients situation, but from Wolfram’s point of view, all clients are in the same state whether the bus is busy or free and should not badly behave if we clock 9 times.

HDMI 4k Video: Lessons Learned – Hans Verkuil

Talk selected by Maxime Ripard

Having worked recently on a number of display related drivers, it was quite natural to go see what I was probably going to work on in a quite close future.

Hans started by talking about HDMI in general, and the various signals that could go through it. He then went on with what was basically a war story about all the mistakes, gotchas and misconceptions that he encountered while working on a video-conference box for Cisco. He covered the hardware itself, but also more low-level oriented aspects, such as the clocks frequencies needed to operate properly, the various signals you could look at for debugging or the issues that might come with the associated encoding and / or color spaces, especially when you want to support as many displays as you can. He also pointed out the flaws in the specifications that might lead to implementation inconsistencies. He concluded with the flaws of various HDMI adapters, the issues that might arise using them on various OSes, and how to work around them when doable.

The Serial Device Bus – Johan Hovold

Talk selected by Thomas Petazzoni

Johan started his talk at ELCE by exposing the problem with how serial ports (UARTs) are currently handled in the Linux kernel. Serial ports are handled by the TTY layer, which allows user-space applications to send and receive data with what is connected on the other side of the UART. However, the kernel doesn’t provide a good mechanism to model the device that is connected at the other side of the UART, such as a Bluetooth chip. Due to this, people have resorted to either writing user-space drivers for such devices (which falls short when those devices need additional resources such as regulators, GPIOs, etc.) or to developing specific TTY line-discipline in the kernel. The latter also doesn’t work very well because a line discipline needs to be explicitly attached to a UART to operate, which requires a user-space program such as hciattach used in Bluetooth applications.

In order to address this problem, Johan picked up the work initially started by Rob Herring (Linaro), which consisted in writing a serial device bus (serdev in short), which consists in turning UART into a proper bus, with bus controllers (UART controllers) and devices connected to this bus, very much like other busses in the Linux kernel (I2C, SPI, etc.). serdev was initially merged in Linux 4.11, with some improvements being merged in follow-up versions. Johan then described in details how serdev works. First, there is a TTY port controller, which instead of registering the traditional TTY character device will register the serdev controller and its slave devices. Thanks to this, one can described in its Device Tree child nodes of the UART controller node, which will be probed as serdev slaves. There is then a serdev API to allow the implementation of serdev slave drivers, that can send and receive data of the UART. Already a few drivers are using this API: hci_serdev, hci_bcm, hci_ll, hci_nokia (Bluetooth) and qca_uart (Ethernet).

We found this talk very interesting, as it clearly explained what is the use case for serdev and how it works, and it should become a very useful subsystem for many embedded applications that use UART-connected devices.

GStreamer for Tiny Devices – Olivier Crête

Talk selected by Grégory Clement

The purpose of this talk was to show how to shrink Gstreamer to make it fit in an embedded Linux device. First, Olivier Crête introduced what GStreamer is, it was very high level but well done. Then after presenting the issue, he showed step by step how he managed to reduce the footprint of a GStreamer application to fit in his device.

In a first part it was a focus on features specific to GStreamer such as how to generate only the needed plugins. Then most of the tricks showed could be used for any C or C++ application. The talk was pretty short so there was no useless or boring part Moreover, the speaker himself was good and dynamic.

To conclude it was a very pleasant talk teaching step by step how to reduce the footprint of an application being GStreamer or not.

November 08, 2017

There’s an Internet controversy going on between Dale Dougherty, the CEO of Maker Media and Naomi Wu (@realsexycyborg), a Chinese Maker and Internet personality. Briefly, Dale Doughtery tweeted a single line questioning Naomi Wu’s authenticity, which is destroying Naomi’s reputation and livelihood in China.

In short, I am in support of Naomi Wu. Rather than let the Internet speculate on why, I am sharing my perspectives on the situation preemptively.

As with most Internet controversies, it’s messy and emotional. I will try my best to outline the biases and issues I have observed. Of course, everyone has their perspective; you don’t have to agree with mine. And I suspect many of my core audience will dislike and disagree with this post. However, the beginning of healing starts with sharing and listening. I will share, and I respectfully request that readers read the entire content of this post before attacking any individual point out of context.

The key forces I see at play are:

Prototype Bias – how assumptions based on stereotypes influence the way we think and feel

Power Asymmetry – those with more power have more influence, and should be held to a higher standard of accountability

Guanxi Bias – the tendency to give foreign faces more credibility than local faces in China

All these forces came together in a perfect storm this past week.

1. Prototype Bias

If someone asked you to draw a picture of an engineer, who would you draw? As you draw the figure, the gender assigned is a reflection of your mental prototype of an engineer – your own prototype bias. Most will draw a male figure. Society is biased to assign high-level intellectual ability to males, and this bias starts at a young age. Situations that don’t fit into your prototypes can feel threatening; studies have shown that men defend their standing by undermining the success of women in STEM initiatives.

The bias is real and pervasive. For example, my co-founder in Chibitronics, Jie Qi, is female. The company is founded on technology that is a direct result of her MIT Media Lab PhD dissertation. She is the inventor of paper electronics. I am a supporting actor in her show. Despite laying this fact out repeatedly, she still receives comments and innuendo implying that I am the inventor or more influential than I really am in the development process.

Any engineer who observes a bias in a system and chooses not to pro-actively correct for it is either a bad engineer or they stand to benefit from the bias. So much of engineering is about compensating, trimming, and equalizing imperfections out of real systems: wrap a feedback loop around it, and force the error function to zero.

So when Jie and I stand on stage together, prototype bias causes people to assume I’m the one who invented the technology. Given that I’m aware of the bias, does it make sense to give us equal time on the stage? No – that would be like knowing there is uneven loss in a channel and then being surprised when certain frequency bands are suppressed by the time it hits the receivers. So, I make a conscious and deliberate effort to showcase her contributions and to ensure her voice is the first and last voice you hear.

Naomi Wu (pictured below) likely challenges your prototypical ideal of an engineer. I imagine many people feel a cognitive dissonance juxtaposing the label “engineer” or “Maker” with her appearance. The strength of that dissonant feeling is proportional to the amount of prototype bias you have.

I’ve been fortunate to experience breaking my own prototypical notions that associate certain dress norms with intelligence. I’m a regular at Burning Man, and my theme camp is dominated by scientists and engineers. I’ve discussed injection molding with men in pink tutus and learned about plasmonics from half-naked women. It’s not a big leap for me to accept Naomi as a Maker. I’m glad she’s challenging these biases. I do my best engineering when sitting half-naked at my desk. I find shirts and pants to be uncomfortable. I don’t have the strength to challenge these social norms, and secretly, I’m glad someone is.

Unfortunately, prototype bias is only the first challenge confronted in this situation.

2. Idol Effect

The Idol Effect is the tendency to assign exaggerated capabilities to public figures and celebrities. The adage “never meet your childhood hero” is a corollary of the Idol Effect – people have inflated expectations about what celebrities can do, so it’s often disappointing when you find out they are humans just like us.

One result of the Idol Effect is that people feel justified taking pot shots at public figures for their shortcomings. For example, I have had the great privilege of working with Edward Snowden. One of my favorite things about working with him is that he is humble and quick to correct misconceptions about his personal abilities. Because of his self-awareness of his limitations, it’s easier for me to trust his assertions, and he’s also a fast learner because he’s not afraid to ask questions. Notably, he’s never claimed to be a genius, so I’m always taken aback when intelligent people pull me aside and whisper in my ear, “You know, I hear Ed’s a n00b. He’s just using you.” Somehow, because of Ed’s worldwide level of fame that’s strongly associated with security technology, people assume he should be a genius level crypto-hacker and are quick to point out that he’s not. Really? Ed is risking his life because he believes in something. I admire his dedication to the cause, and I enjoy working with him because he’s got good ideas, a good heart, and he’s fun to be with.

Because I also have a public profile, the Idol Effect impacts me too. I’m bad at math, can’t tie knots, a mediocre programmer…the list goes on. If there’s firmware in a product I’ve touched, it’s likely to have been written by Sean ‘xobs’ Cross, not me. If there’s analytics or informatics involved, it’s likely my partner wrote the analysis scripts. She also edits all my blog posts (including this one) and has helped me craft my most viral tweets – because she’s a genius at informatics, she can run analyses on how to target key words and pick times of day to get maximum impact. The fact that I have a team of people helping me polish my work makes me look better than I really am, and people tend to assign capabilities to me that I don’t really have. Does this mean I am a front, fraud or a persona?

I imagine Naomi is a victim of Idol Effect too. Similar to Snowden, one of the reasons I’ve enjoyed interacting with Naomi is that she’s been quick to correct misconceptions about her abilities, she’s not afraid to ask for help, and she’s a quick learner. Though many may disapprove of her rhetoric on Twitter, please keep in mind English is her second language — her sole cultural context in which she learned English was via the Internet by reading social media and chat rooms.

Based on the rumors I’ve read, it seems fans and observers have inflated expectations for her abilities, and because of uncorrected prototype bias, she faces extra scrutiny to prove her abilities. Somehow the fact that she almost cuts her finger using a scraper to remove a 3D print is “evidence” that she’s not a Maker. If that’s true, I’m not a Maker either. I always have trouble releasing 3D prints from print stages. They’ve routinely popped off and flown across the room, and I’ve almost cut my fingers plenty of times with the scraper. But I still keep on trying and learning – that’s the point. And then there’s the suggestion that because a man holds the camera, he’s feeding her lines.

When a man harnesses the efforts of a team, they call him a CEO and give him a bonus. But when a woman harnesses the efforts of a team, she gets accused of being a persona and a front. This is uncorrected Prototype Bias meeting unrealistic expectations due to the Idol Effect.

The story might end there, but things recently got a whole lot worse…

3. Power Asymmetry

“With great power comes great responsibilities.”
-from Spider Man

Power is not distributed evenly in the world. That’s a fact of life. Not acknowledging the role power plays leads to systemic abuse, like those documented in the Caldbeck or Weinstein scandals.

Editors and journalists – those with direct control over what gets circulated in the media – have a lot of power. Their thoughts and opinions can reach and influence a massive population very quickly. Rumors are just rumors until media outlets breathe life into them, at which point they become an incurable cancer on someone’s career. Editors and journalists must be mindful of the power they wield and held accountable for when it is mis-used.

As CEO of Maker Media and head of an influential media outlet, especially among the DIY community, Dale Dougherty wields substantial power. So a tweet promulgating the idea that Naomi might be a persona or a fake does not land lightly. In the post-truth era, it’s especially incumbent upon traditional media to double-check rumors before citing them in any context.

What is personally disappointing is that Dale reached out to me on November 2nd with an email asking what I thought about an anonymous post that accused Naomi of being a fake. I vouched for Naomi as a real person and as a budding Maker; I wrote back to Dale that “I take the approach of interacting with her like any other enthusiastic, curious Maker and the resulting interactions have been positive. She’s a fast learner.”

Yet Dale decided to take an anonymous poster’s opinion over mine (despite a long working relationship with Make), and a few days later on November 5th he tweeted a link to the post suggesting Naomi could be a fake or a fraud, despite having evidence of the contrary.

So now Naomi, already facing prototype bias and idol-effect expectations, gets a big media personality with substantial power propagating rumors that she is a fake and a fraud.

But wait, it gets worse because Naomi is in China!

4. Guanxi Bias

In China, guanxi (关系) is everything. Public reputation is extremely hard to build, and quick to lose. Faking and cloning is a real problem, but it’s important to not lose sight of the fact that there are good, hard-working people in China as well. So how do the Chinese locals figure out who to trust? Guanxi is a major mechanism used inside China to sort the good from the bad – it’s a social network of credible people vouching for each other.

For better or for worse, the Chinese feel that Western faces and brands are more credible. The endorsement of a famous Western brand carries a lot of weight; for example Leonardo DiCaprio is the brand ambassador for BYD (a large Chinese car maker).

Maker Media has a massive reputation in China. From glitzy Maker Faires to the Communist party’s endorsement of Maker-ed and Maker spaces as a national objective, an association or the lack thereof with Maker Media can make or break a reputation. This is no exception for Naomi. Her uniqueness as a Maker combined with her talent at marketing has enabled her to do product reviews and endorsements as source of income.

However, for several years she’s been excluded from the Shenzhen Maker Faire lineup, even in events that she should have been a shoo-in for her: wearables, Maker fashion shows, 3D printing. Despite this lack of endorsement, she’s built her own social media follower base both inside and outside of China, and built a brand around herself.

Unfortunately, when the CEO of Maker Media, a white male leader of an established American brand, suggested Naomi was a potential fake, the Internet inside China exploded on her. Sponsors cancelled engagements with her. Followers turned into trolls. She can’t be seen publicly with men (because others will say the males are the real Maker, see “prototype bias”), and as a result faces a greater threat of physical violence.

A single innuendo, amplified by Power Asymmetry and Guanxi Bias, on top of Idol Effect meshed against Prototype Bias, has destroyed everything a Maker has worked so hard to build over the past few years.

If someone spread lies about you and destroyed your livelihood – what would you do? Everyone would react a little differently, but make no mistake: at this point she’s got nothing left to lose, and she’s very angry.

Reflection

Although Dale had issued a public apology about the rumors, the apology fixes her reputation as much as saying “sorry” repairs a vase smashed on the floor.

Image: Mindy Georges CC BY-NC

At this point you might ask — why would Dale want to slander Naomi?

I don’t know the background, but prior to Dale’s tweet, Naomi had aggressively dogged Dale and Make about Make’s lack of representation of women. Others have noted that Maker Media has a prototype bias toward white males. Watch this analysis by Leah Buechley, a former MIT Media Lab Professor:

Dale could have recognized and addressed this core issue of a lack of diversity. Instead, Dale elected to endorse unsubstantiated claims and destroy a young female Maker’s reputation and career.

Naomi has a long, uphill road ahead of her. On the other hand, I’m sure Dale will do fine – he’s charismatic, affable, and powerful.

When I sit and think, how would I feel if this happened to the women closest to me? I get goosebumps – the effect would be chilling; the combination of pervasive social biases would overwhelm logic and fact. So even though I may not agree with everything Naomi says or does, I have decided that in the bigger picture, hiding in complicit silence on the sidelines is not acceptable.

We need to acknowledge that prototype bias is real; if equality is the goal, we need to be proactive in correcting it. Just because someone is famous doesn’t mean they are perfect. People with power need to be held accountable in how they wield it. And finally, cross-cultural issues are complicated and delicate. All sides need to open their eyes, ears, and hearts and realize we’re all human. Tweets may seem like harmless pricks to the skin, but we all bleed when pricked. For humanity to survive, we need to stop pricking each other lest we all bleed to death.

On the reinstatement of rights

The enforcement statement then specifically expresses the view of the
signatories on the specific aspect of the license termination.
Particularly in the US, among legal scholars there is a strong opinion
that if the rights under the GPLv2 are terminated due to non-compliance,
the infringing entity needs an explicit reinstatement of rights from the
copyright holder. The enforcement statement now basically states that
the signatories believe the rights should automatically be re-instated
if the license violation ceases within 30 days of being notified of the
license violation

To people like me living in the European (and particularly German) legal
framework, this has very little to no implications. It has been the
major legal position that any user, even an infringing user can
automatically obtain a new license as soon as he no longer violates. He
just (really or imaginary) obtains a new copy of the source code, at
which time he again gets a new license from the copyright holders, as
long as he fulfills the license conditions.

So my personal opinion as a non-legal person active in GPL compliance on
the reinstatement statement is that it changes little to nothing
regarding the jurisdiction that I operate in. It merely expresses that
other developers express their intent and interest to a similar approach
in other jurisdictions.

As the Software Freedom Conservancy (SFC) has publicly disclosed
on their website,
it appears that Software Freedom Law Center (SFLC) has filed for a
trademark infringement lawsuit against SFC.

SFLC has launched SFC in 2006, and SFLC has helped and endorsed SFC in
the past.

This lawsuit is hard to believe. What has this community come to, if
its various members - who used all to be respected equally - start
filing law suits against each other?

It's of course not known what kind of negotiations might have happened
out-of-court before an actual lawsuit has been filed. Nevertheless,
one would have hoped that people are able to talk to each other,
and that the mutual respect for working at different aspects and with
possibly slightly different strategies would have resulted in a less
confrontational approach to resolving any dispute.

To me, this story just looks like there can only be losers on all sides,
by far not just limited to the two entities in question.

On lwn.net some people, including
high-ranking members of the FOSS community have started to spread
conspiracy theories as to whether there's any secret scheming behind the
scenes, particularly from the Linux Foundation towards SFLC to cause
trouble towards the SFC and their
possibly-not-overly-enjoyed-by-everyone enforcement activities.

I think this is complete rubbish. Neither have I ever had the
impression that the LF is completely opposed to license enforcement
to begin with, nor do I have remotely enough phantasy to see them
engage in such malicious scheming.

What motivates SFLC and/or Eben to attack their former offspring
is however unexplainable to the bystander. One hopes there is no
connection to his departure from FSF about one year ago,
where he served as general counsel for more than two decades.

Other videos and slides

Conclusion

With its special one track format, an attendance limited to 100 people, excellent choice of talks and nice social events, Kernel Recipes remains a very good conference that we really enjoyed. Embedded Recipes, which was in its first edition this year, followed the same principle, with the same success. We’re looking forward to attending next year editions, and hopefully contributing a few talks as well. See you all at Embedded and Kernel Recipes in 2018!

November 02, 2017

Free Electrons participated to the Embedded Linux Conference Europe last week in Prague. With 7 engineers attending, 4 talks, one BoF and a poster at the technical showcase, we had a strong presence to this major conference of the embedded Linux ecosystem. All of us had a great time at this event, attending interesting talks and meeting numerous open-source developers.

In this first blog post about ELCE, we want to share the slides and videos of the talks we have given during the conference.

SD/eMMC: New Speed Modes and Their Support in Linux – Gregory Clement

Since the introduction of the original “default”(DS) and “high speed”(HS) modes, the SD card standard has evolved by introducing new speed modes, such as SDR12, SDR25, SDR50, SDR104, etc. The same happened to the eMMC standard, with the introduction of new high speed modes named DDR52, HS200, HS400, etc. The Linux kernel has obviously evolved to support these new speed modes, both in the MMC core and through the addition of new drivers.

This talk will start by introducing the SD and eMMC standards and how they work at the hardware level, with a specific focus on the new speed modes. With this hardware background in place, we will then detail how these standards are supported by Linux, see what is still missing, and what we can expect to see in the future.

An Overview of the Linux Kernel Crypto Subsystem – Boris Brezillon

The Linux kernel has long provided cryptographic support for in-kernel users (like the network or storage stacks) and has been pushed to open these cryptographic capabilities to user-space along the way.

But what is exactly inside this subsystem, and how can it be used by kernel users? What is the official userspace interface exposing these features and what are non-upstream alternatives? When should we use a HW engine compared to a purely software based implementation? What’s inside a crypto engine driver and what precautions should be taken when developing one?

These are some of the questions we’ll answer throughout this talk, after having given a short introduction to cryptographic algorithms.

Buildroot: What’s New? – Thomas Petazzoni

Buildroot is a popular and easy to use embedded Linux build system. Within minutes, it is capable of generating lightweight and customized Linux systems, including the cross-compilation toolchain, kernel and bootloader images, as well as a wide variety of userspace libraries and programs.

Since our last “What’s new” talk at ELC 2014, three and half years have passed, and Buildroot has continued to evolve significantly.

After a short introduction about Buildroot, this talk will go through the numerous new features and improvements that have appeared over the last years, and show how they can be useful for developers, users and contributors.

May it be because of a lack of documentation or because we don’t know where to look or where to start, it is not always easy to get started with U-Boot or Linux, and know how to port them to a new ARM platform.

Based on experience porting modern versions of U-Boot and Linux on a custom Freescale/NXP i.MX6 platform, this talk will offer a step-by-step guide through the porting process. From board files to Device Trees, through Kconfig, device model, defconfigs, and tips and tricks, join this talk to discover how to get U-Boot and Linux up and running on your brand new ARM platform!

October 31, 2017

Back in April 2017, Free Electrons engineer Antoine Ténart participated to NetDev 2.1, the most important conference discussing Linux networking support. After the conference, Antoine published a summary of it, reporting on the most interesting talks and topics that have been discussed.

Next week, NetDev 2.2 takes place in Seoul, South Korea, and this time around, two Free Electrons engineers will be attending the event: Alexandre Belloni and Antoine Ténart. We are getting more and more projects with networking related topics, and therefore the wide range of talks proposed at NetDev 2.2 will definitely help grow our expertise in this field.

Do not hesitate to get in touch with Alexandre or Antoine if you are also attending this event!

October 30, 2017

Chantal van Tour and Jeroen Koelemeij have made new measurements of the performance of the White Rabbit switch [1] with Mattia Rizzi’s additional Low jitter daughterboard (LJD) [2] integrated and enhanced even more with another clean-up oscillator. They describe their incredibly good results in the article “Sub -Nanosecond Time Accuracy and Frequency Distribution” [3]:

The LJD improves the stability by about a factor of five, in good agreement with results earlier obtained by Rizzi et al. The LJD implementation improves the root mean-square (rms) phase jitter, integrated over the range 1 Hz – 100 kHz, from 13 ps to 2.9 ps.… While the LJD implementation already significantly improves the default WR Slave phase noise in the low-frequency range, the PLO (clean-up oscillator) leads to a further suppression of noise by 30 to 40 dB over the frequency range 10 Hz – 100 kHz... As a result, the rms jitter of the 10 MHz output is as low as 0.18 ps over the range 1 Hz – 100 kHz, an improvement by nearly two orders of magnitude over default WR.…The results presented here suggest that the improved WR system may in certain cases provide a good alternative for hydrogen masers. This may be true in particular for situations in which the observational phase coherence is limited by atmospheric conditions, rather than by local-oscillator stability. Another advantage of a WR-based frequency distribution system is that it also distributes phase. In other words, all nodes in the WR network will have the same phase, which may allow reducing the ‘search window’ in initial fringe searches.

Note that the results with the additional clean-up oscillator require the White Rabbit Grandmaster (GM) to be locked to a sufficiently stable oscillator, such as a rubidium or cesium clock, or a hydrogen maser. The clean-up oscillator will not be able to track the free-running clock of a WR master, which is too unstable. With only the Low jitter daughterboard it will work fine on a free-running GM.

We will discuss on how to proceed to make this hardware available via commercial partners, likely as special low-jitter versions of the White Rabbit Switch.

Vivado was empowering because instead of having to code up a complex SoC in Verilog, I could use their pseudo-GUI/TCL interface to create a block diagram that largely automated the task of building the AXI routing fabric. Furthermore, I could access Xilinx’s extensive IP library, which included a very flexible DDR memory controller and a well-vetted PCI-express controller. Because of this level of design automation and available IP, a task that would have taken perhaps months in Verilog alone could be completed in a few days with the help of Vivado.

The downsides of Vivado are that it’s not open source (free to download, but not free to modify), and that it’s not terribly efficient or speedy. Aside from the ideological objections to the closed-source nature of Vivado, there are some real, pragmatic impacts from the lack of source access. At a high level, Xilinx makes money selling FPGAs – silicon chips. However, to attract design wins they must provide design tools and an IP ecosystem. The development of this software is directly subsidized by the sale of chips.

This creates an interesting conflict of interest when it comes to the efficiency of the tools – that is, how good they are at optimizing designs to consume the least amount of silicon possible. Spending money to create area-efficient tools reduces revenue, as it would encourage customers to buy cheaper silicon.

As a result, the Vivado tool is pretty bad at optimizing designs for area. For example, the PCI express core – while extremely configurable and well-vetted – has no way to turn off the AXI slave bridge, even if you’re not using the interface. Even with the inputs unconnected or tied to ground, the logic optimizer won’t remove the unused gates. Unfortunately, this piece of dead logic consumes around 20% of my target FPGA’s capacity. I could only reclaim that space by hand-editing the machine-generated VHDL to comment out the slave bridge. It’s a simple enough thing to do, and it had no negative effects on the core’s functionality. But Xilinx has no incentive to add a GUI switch to disable the logic, because the extra gates encourage you to “upgrade” by one FPGA size if your design uses a PCI express core. Similarly, the DDR3 memory core devotes 70% of its substantial footprint to a “calibration” block. Calibration typically runs just once at boot, so the logic is idle during normal operation. With an FPGA, the smart thing to do would be to run the calibration, store the values, and then jam the pre-measured values into the application design, thus eliminating the overhead of the calibration block. However, I couldn’t implement this optimization since the DDR3 block is provided as an opaque netlist. Finally, the AXI fabric automation – while magical – scales poorly with the number of ports. In my most recent benchmark design done with Vivado, 50% of the chip is devoted to the routing fabric, 25% to the DDR3 block, and the remainder to my actual application logic.

Tim mentioned that he thought the same design when using LiteX would fit in a much smaller FPGA. He has been using LiteX to generate the FPGA “gateware” (bitstreams) to support his HDMI2USB video processing pipelines on various platforms, ranging from the Numato-Opsis to the Atlys, and he even started a port for the NeTV2. Intrigued, I decided to port one of my Vivado designs to LiteX so that I could do an apples-to-apples comparison of the two design flows.

LiteX is a soft-fork of Migen/MiSoC – a python-based framework for managing hardware IP and auto-generating HDL. The IP blocks within LiteX are completely open source, and so can be targeted across multiple FPGA architectures. However, for low-level synthesis, place & route, and bitstream generation, it still relies upon proprietary chip-specific vendor tools, such as Vivado when targeting Artix FPGAs. It’s a little bit like an open source C compiler that spits out assembly, so it still requires vendor-specific assemblers, linkers, and binutils. While it may seem backward to open the compiler before the assembler, remember that for software, an assembler’s scope of work is simple — primarily within well-defined 32-bit or so opcodes. However, for FPGAs, the “assembler” (place and route tool) has the job of figuring out where to place single-bit primitives within an “opcode” that’s effectively several million bits long, with potential cross-dependencies between every bit. The abstraction layers, while parallel, aren’t directly comparable.

Let me preface my experience with the statement that I have a love-hate relationship with Python. I’ve used Python a few times for “recreational” projects and small tools, and for driving bits of automation frameworks. But I’ve found Python to be terribly frustrating. If you can use their frameworks from the ground-up, it’s intuitive, fun, even empowering. But if your application isn’t naturally “Pythonic”, woe to you. And I have a lot of needs for bit-banging, manipulating binary files, or grappling with low-level hardware registers, activities that are decidedly not Pythonic. I also spend a lot of time fighting with the “cuteness” of the Python type system and syntax: I’m more of a Rust person. I like strictly typed languages. I am not fond of novelties like using “-1” as the last-element array index and overloading the heck out of binary operators using magic methods.

Surprisingly, I was able to get LiteX up and running within a day. This is thanks in large part to Tim’s effort to create a really comprehensive bootstrapping script that checks out the git repo, all of the submodules (thank you!), and manages your build environment. It just worked; the only bump I encountered was a bit of inconsistent documentation on installing the Xilinx toolchain (for Artix builds you need to grab Vivado; and Spartan you grab ISE). The whole thing ate about 19GiB of hard drive space, of which 18GiB is the Vivado toolchain.

I was rewarded with a surprisingly powerful and mature framework for defining SoCs. Thanks to the extensive work of the MiSoC and LiteX crowd, there’s already IP cores for DRAM, PCI express, ethernet, video, a softcore CPU (your choice of or1k or lm32) and more. To be fair, I haven’t been able to load these on real hardware and validate their spec-compliance or functionality, but they seem to compile down to the right primitives so they’ve got the right shape and size. Instead of AXI, they’re using Wishbone for their fabric. It’s not clear to me yet how bandwidth-efficient the MiSoC fabric generator is, but the fact that it’s already in use to route 4x HDMI connections to DRAM on the Numato-Opsis would indicate that it’s got enough horsepower for my application (which only requires 3x HDMI connections).

As a high-level framework, it’s pretty magical. Large IP instances and corresponding bus ports are allocated on-demand, based on a very high level description in Python. I feel a bit like a toddler who has been handed a loaded gun with the safety off. I’m praying the underlying layers are making sane inferences. But, at least in the case of LiteX, if I don’t agree with the decisions, it’s open source enough that I could try to fix things, assuming I have the time and gumption to do so.

For my tool flow comparison, I implemented a simple 2x HDMI-in to DDR3 to 1x HDMI-out design in both Vivado and in LiteX. Creating the designs is about the same effort on both flows – once you have the basic IP blocks, instantiating bus fabric and allocation of addressing is largely automated in each case. Vivado is superior for pin/package layout thanks to its graphical planning tool (I find an illustration of the package layout to be much more intuitive than a textual list of ball-grid coordinates), and LiteX is a bit faster for design creation despite the usual frustrations I have with Python (up to the reader’s bias to decide whether it’s just that I have a different way of seeing things or if my intellect is insufficient to fully appreciate the goodness that is Python).

Pad layout planning in Vivado is aided by a GUI

Example of LiteX syntax for pin constraints

But from there, the experience between the two diverges rapidly. The main thing that’s got me excited about LiteX is the speed and efficiency of its high-level synthesis. LiteX produces a design that uses about 20% of an XC7A50 FPGA with a runtime of about 10 minutes, whereas Vivado produces a design that consumes 85% of the same FPGA with a runtime of about 30-45 minutes.

Significantly, LiteX tends to “fail fast”, so syntax errors or small problems with configurations become obvious within a few seconds, if not a couple minutes. However, Vivado tends to “fail late” – a small configuration problem may not pop up until about 20 minutes into the run, due to the clumsy way it manages out-of-context block synthesis and build dependencies. This means that despite my frustrations with the Python syntax, the penalty paid for small errors is much less in terms of time – so overall, I’m more productive.

But the really compelling point is the efficiency. The fact that LiteX generates more efficient HDL means I can potentially shave a significant amount of cost out of a design by going to a smaller FPGA. Remember, both LiteX and Vivado use the same back-end for low-level sythesis and place and route. The difference is entirely in the high-level design automation – and this is a level that I can see being a good match for a Python-based framework. You’re not really designing hardware with Python (eventually it all turns into Verilog) so much as managing and configuring libraries of IP, something that Python is quite well suited for. To wit, I dug around in the MiSoC libraries a bit and there seem to be some serious logic designs using this Python syntax. I’m not sure I want to wrap my head around this coding style, but the good news is I can still write my leaf cells in Verilog and call them from the high-level Python integration framework.

So, I’m cautiously proceeding to use LiteX as the main design flow going forward for NeTV2. We’ll see how the bitstream proves out in terms of timing and functionality once my next generation hardware is available, but I’m optimistic. I have a few concerns about how debugging will work – I’ve found the Xilinx ILA cores to be extremely powerful tools and the ability to automatically reverse engineer any complex design into a schematic (a feature built into Vivado) helps immensely with finding timing and logic bugs. But with a built-in soft CPU core, the “LiteScope” logic analyzer (with sigrok support coming soon), and fast build times, I have a feeling there is ample opportunity to develop new, perhaps even more powerful methods within LiteX to track down tricky bugs.

My final thought is that LiteX, in its current state, is probably best suited for people trained to write software who want to design hardware, rather than for people classically trained in circuit design who want a tool upgrade. The design idioms and intuitions built into LiteX pulls strongly from the practices of software designers, which means a lot of “obvious” things are left undocumented that will throw outsiders (e.g. hardware designers like me) for a loop. There’s no question about the power and utility of the design flow – so, as the toolchain matures and documentation improves I’m optimistic that this could become a popular design flow for hardware projects of all magnitudes.

Interested? Tim has suggested the following links for further reading:

Presentation of the Long Term Supported releases of Buildroot, a topic we also presented in a previous blog post

Appearance of the new top-level utils/ directory, containing various utilities directly useful for the Buildroot user, such as test-pkg, check-package, get-developers or scanpypi

Removal of $(HOST_DIR)/usr/, as everything has been moved up one level to $(HOST_DIR), to make the Buildroot SDK/toolchain more conventional

Document the new organization of the skeleton package, now split into several packages, to properly support various init systems. A new diagram has been added to clarify this topic.

List all package infrastructures that are available in Buildroot, since their number is growing!

Use SPDX license codes for licensing information in packages, which is now mandatory in Buildroot

Remove the indication that dependencies of host (i.e native) packages are derived from the dependencies of the corresponding package, since it’s no longer the case

Indicate that the check for hashes has been extended to also allow checking the hash of license files. This allows to detect changes in the license text.

Update the BR2_EXTERNAL presentation to cover the fact that multiples BR2_EXTERNAL trees are supported now.

Use the new relocatable SDK functionality that appeared in Buildroot 2017.08.

The practical labs have of course been updated to use Buildroot 2017.08, but also Linux 4.13 and U-Boot 2017.07, to remain current with upstream versions. In addition, they have been extended with two additional steps:

Booting the Buildroot generated system using TFTP and NFS, as an alternative to the SD card we normally use

Using genimage to generate a complete and ready to flash SD card image

The Ware for September 2017 is a WP 5007 Electrometer. I’ll give this one to Ingo, for the first mention of an electrometer. Congrats, email me for your prize! And @zebonaut, agreed, polystyrene caps FTW :)

October 19, 2017

Sometimes one is finding an interesting problem and is surprised that
there is not a multitude of blog post, stackoverflow answers or the like
about it.

A (I think) not so uncommon problem when working with datagram sockets
is that you may want to know the local IP address that the OS/kernel
chooses when sending a packet to a given destination.

In an unbound UDP socket, you basically send and receive packets with
any number of peers from a single socket. When sending a packet to
destination Y, you simply pass the destination address/port into the
sendto() socket function, and the OS/kernel will figure out which of
its local IP addresses will be used for reaching this particular
destination.

If you're a dumb host with a single default router, then the answer to
that question is simple. But in any reasonably non-trivial use case,
your host will have a variety of physical and/or virtual network devices
with any number of addresses on them.

Why would you want to know that address? Because maybe you need to
encode that address as part of a packet payload. In the current use
case that we have, it is the OsmoMGW, implementing the IETF
MGCP Media Gateway Control Protocol.

So what can you do? You can actually create a new "trial" socket, not
bind it to any specific local address/port, but connect() it to the
destination of your IP packets. Then you do a getsockname(), which
will give you the local address/port the kernel has selected for this
socket. And that's exactly the answer to your question. You can now
close the "trial" socket and have learned which local IP address the
kernel would use if you were to send a packet to that destination.

At least on Linux, this works. While getsockname() is standard BSD
sockets API, I'm not sure how portable it is to use it on a socket that
has not been explicitly bound by a prior call to bind().

October 13, 2017

One dirty secret of hardware is that a profitable business isn’t just about design innovation, or even product cost reduction: it’s also about how efficiently one can move stuff from point A to B. This explains the insane density of hardware suppliers around Shenzhen; it explains the success of Ikea’s flat-packed furniture model; and it explains the rise of Amazon’s highly centralized, highly automated warehouses.

Unfortunately, reverse logistics – the system for handling returns & exchanges of hardware products – is not something on the forefront of a hardware startup’s agenda. In order to deal with defective products, one has to ship a product first – an all-consuming goal. However, leaving reverse logistics as a “we’ll fix it after we ship” detail could saddle the venture with significant unanticipated customer support costs, potentially putting the entire business model at risk.

This is because logistics are much more efficient in the “forward” direction: the cost of a centralized warehouse to deliver packages to an end consumer’s home address is orders of magnitude less than it is for a residential consumer to mail that same parcel back to the warehouse. This explains the miracle of Amazon Prime, when overnighting a pair of hand-knit mittens to your mother somehow costs you $20. Now repeat the hand-knit mittens thought experiment and replace it with a big-screen TV that has to find its way back to a factory in Shenzhen. Because the return shipment can no longer take advantage of bulk shipping discounts, the postage to China is likely more than the cost of the product itself!

Because of the asymmetry in forward versus reverse logistics cost, it’s generally not cost effective to send defective material directly back to the original factory for refurbishing, recycling, or repair. In many cases the cost of the return label plus the customer support agent’s time will exceed the cost of the product. This friction in repatriating defective product creates opportunities for unscrupulous middlemen to commit warranty fraud.

The basic scam works like this: a customer calls in with a defective product and gets sent a replacement. The returned product is sent to a local processing center, where it may be declared unsalvageable and slated for disposal. However, instead of a proper disposal, the defective goods “escape” the processing center and are resold as new to a different customer. The duped customer then calls in to exchange the same defective product and gets sent a replacement. Rinse lather repeat, and someone gets rich quick selling scrap at full market value.

Similarly, high-quality counterfeits can sap profits from companies. Clones of products are typically produced using cut-rate or recycled parts but sold at full price. What happens when customers then find quality issues with the clone? That’s right – they call the authentic brand vendor and ask for an exchange. In this case, the brand makes zero money on the customer but incurs the full cost of supporting a defective product. This kind of warranty fraud is pandemic in smart phones and can cost producers many millions of dollars per year in losses.

High-quality clones, like the card on the left, can cost businesses millions of dollars in warranty fraud claims.

Serial numbers help mitigate these problems, but it’s easy to guess a simple serial number. More sophisticated schemes tie serial numbers to silicon IDs, but that necessitates a system which can reliably download the serialization data from the factory. This might seem a trivial task but for a lot of reasons – from failures in storage media to human error to poor Internet connectivity in factories – it’s much harder than it seems to make this happen. And for a startup, losing an entire lot of serialization data due to a botched upload could prove fatal.

As a result, most hardware startups ship products with little to no plan for product serialization, much less a plan for reverse logistics. When the first email arrives from an unhappy customer, panic ensues, and the situation is quickly resolved, but by the time the product arrives back at the factory, the freight charges alone might be in the hundreds of dollars. Repeat this exercise a few dozen times, and any hope for a profitable run is rapidly wiped out.

I’ve wrestled with this problem on and off through several startups of my own and finally landed on a solution that looks promising: it’s reasonably robust, fraud-resistant, and dead simple to implement. The key is the bitmark – a small piece of digital data that links physical products to the blockchain.

Most people are familiar with blockchains through Bitcoin. Bitcoin uses the blockchain as a public ledger to prevent double-spending of the same virtual coin. This same public ledger can be applied to physical hardware products through a bitmark. Products that have been bitmarked can have their provenance tracked back to the factory using the public ledger, thus hampering cloning and warranty fraud – the physical equivalent of double-spending a Bitcoin.

One of my most recent hardware startups, Chibitronics has teamed up with Bitmark to develop an end-to-end solution for Chibitronics’ newest microcontroller product, the Chibi Chip.

As an open hardware business, we welcome people to make their own versions of our product, but we can’t afford to give free Chibi Chips to customers that bought cut-rate clones and then report them as defective for a free upgrade to an authentic unit. We’re also an extremely lean startup, so we can’t afford the personnel to build a full serialization and reverse logistics system from scratch. This is where Bitmark comes in.

Bitmark has developed a turn-key solution for serialization and reverse logistics triage. They issue us bitmarks as lists of unique, six-word phrases. The six-word phrases are less frustrating for users to type in than strings of random characters. We then print the phrases onto labels that are stuck onto the back of each Chibi Chip.

Bitmark claim code on the back of a Chibi Chip

We release just enough of these pre-printed labels to the factory to run our authorized production quantities. This allows us to trace a bitmark back to a given production lot. It also prevents “ghost shifting” – that is, authorized factories producing extra bootleg units on a midnight shift that are sold into the market at deep discounts. Bitmark created a website for us where customers can then claim their bitmarks, thus registering their product and making it eligible for warranty service. In the event of an exchange or return, the product’s bitmark is updated to record this event. Then if a product fails to be returned to the factory, it can’t be re-claimed as defective because the blockchain ledger would evidence that bitmark as being mapped to a previously returned product. This allows us to defer the repatriation of the product to the factory. It also enables us to use unverified third parties to handle returned goods, giving us a large range of options to reduce reverse logistics costs.

Bitmark also plans to roll out a site where users can verify the provenance of their bitmarks, so buyers can check if a product’s bitmark is authentic and if it has been previously returned for problems before they buy it. This increases the buyer’s confidence, thus potentially boosting the resale value of used Chibi Chips.

For the cost and convenience of a humble printed label, Bitmark enhances control over our factories, enables production lot traceability, deters cloning, prevents warranty fraud, enhances confidence in the secondary market, and gives us ample options to streamline our reverse logistics.

Of course, the solution isn’t perfect. A printed label can be peeled off one product and stuck on another, so people could potentially just peel labels off good products and resell the labels to users with broken clones looking to upgrade by committing warranty fraud. This scenario could be mitigated by using tamper-resistant labels. And for every label that’s copied by a cloner, there’s one victim who will have trouble getting support on an authentic unit. Also, if users are generally lax about claiming their bitmark codes, it creates an opportunity for labels to be sparsely duplicated in an effort to ghost-shift/clone without being detected; but this can be mitigated with a website update that encouraging customers to immediately register their bitmarks before using the web-based services tied to the product. We also have to exercise care in handling lists of unclaimed phrases because, until a customer registers their bitmark claim phrase in the blockchain, the phrases have value to would-be fraudsters.

But overall, for the cost and convenience, the solution outperforms all the other alternatives I’ve explored to date. And perhaps most importantly for hardware startups like mine that are short on time and long on tasks, printing bitmarks is simple enough for us to implement that it’s hard to justify doing anything else.

October 12, 2017

I wanted to monitor my chickens remotelly and after some web digging found this open source software homeassistant that had allowed me to use cheap devices like RPI and some dlink/digoo cameras in order to monitor then via web.
It doesnt end there, i also bought the sonoff device and after some reflash i got it working with hass mqtt, so now i can also turn on light and fans.
There are also things like automation, so when i'm at home my bluetooth precense gets ligths on, small details that helps to make your life easier, like turning off lights automatically.
There is probably more to do, next challenge is voice :-)
[1] http://home-assistant.io/

October 09, 2017

First of all, I wouldn't have expected netfilter to be that relevant
next to all the other [core] networking topics at netdevconf. Secondly,
I've not been doing any work on netfilter for about a decade now, so my
memory is a bit rusty by now ;)

Speaking of Rusty: Timing wise there is apparently a nice coincidence that
I'll be able to meet up with him in Berlin later this month, i.e. hopefully
we can spend some time reminiscing about old times and see what kind of useful
input he has for the keynote.

I'm also asking my former colleagues and successors in the netfilter
project to share with me any note-worthy events or anecdotes,
particularly also covering the time after my retirement from the core
team. So if you have something that you believe shouldn't miss in a
keynote on netfilter project history: Please reach out to me by e-mail
ASAP and let me know about it.

To try to fend off the elder[ly] statesmen image that goes along with
being invited to give keynotes about the history of projects you were
working on a long time ago, I also submitted an actual technical talk:
TTCN-3 and Eclipse Titan for testing protocol stacks, in
which I'll cover my recent journey into TTCN-3 and TITAN land, and how I
think those tools can help us in the Linux [kernel] networking
community to productively produce tests for the various protocols.

As usual for netdevconf, there are plenty of other exciting talks in
the schedule

I'm very much looking forward to both visiting Seoul again, as well as
meeting lots of the excellent people involved in the Linux networking
subsystems. See ya!

October 08, 2017

On this occasion, a number of the key people managed to gather for an
anniversary dinner in Taipei. Thanks for everyone who could make it, it
was very good to see them together again. Sadly, by far not everyone
could attend. You have been missed!

The award for the most crazy attendee of the meeting goes out to my
friend Milosch, who has actually flown from
his home in the UK to Taiwan, only to meet up with old friends and
attend the anniversary dinner.

October 05, 2017

Buildroot is a widely used embedded Linux build systems. A large number of companies and projects use Buildroot to produce customized embedded Linux systems for a wide range of embedded devices. Most of those devices are now connected to the Internet, and therefore subject to attacks if the software they run is not regularly updated to address security vulnerabilities.

The Buildroot project publishes a new release every three months, with each release providing a mix of new features, new packages, package updates, build infrastructure improvements… and security fixes. However, until earlier this year, as soon as a new version was published, the maintenance of the previous version stopped. This means that in order to stay up to date in terms of security fixes, users essentially had two options:

Update their Buildroot version regularly. The big drawback is that they get not only security updates, but also many other package updates, which may be problematic when a system is in production.

Stick with their original Buildroot version, and carefully monitor CVEs and security vulnerabilities in the packages they use, and update the corresponding packages, which obvisouly is a time-consuming process.

Starting with 2017.02, the Buildroot community has decided to offer one long term supported release every year: 2017.02 will be supported one year in terms of security updates and bug fixes, until 2018.02 is released. The usual three-month release cycle still applies, with 2017.05 and 2017.08 already being released, but users interested in a stable Buildroot version that is kept updated for security issues can stay on 2017.02.

Since 2017.02 was released on February 28th, 2017, six minor versions were published on a fairly regularly basis, almost every month, except in August:

With about 60 to 130 commits between each minor version, it is relatively easy for users to check what has been changed, and evaluate the impact of upgrading to the latest minor version to benefit from the security updates. The commits integrated in those minor versions are carefully chosen with the idea that users should be able to easily update existing systems.

In total, those six minor versions include 526 commits, of which 183 commits were security updates, representing roughly one third of the total number of commits. The other commits have been:

140 commits to fix build issues

57 commits to bump versions of packages for bug fixes. These almost exclusively include updates to the Linux kernel, using its LTS versions. For other packages, we are more conservative and generally don’t upgrade them.

17 commits to address issues in the licensing description of the packages

The Buildroot community has already received a number of bug reports, patches or suggestions specifically targetting the 2017.02 LTS version, which indicates that developers and companies have started to adopt this LTS version.

Therefore, if you are interested in using Buildroot for a product, you should probably consider using the LTS version! We very much welcome feedback on this version, and help in monitoring the security vulnerabilities affecting software packages in Buildroot.

October 04, 2017

In case you're wondering about the lack of activity not only on this
blog but also in git repositories, mailing lists and the like: I've
been on vacation since September 13. It's my usual "one month in Taiwan"
routine, during which I spend some time in Taipei, but also take several
long motorbike tours around mostly rural Taiwan.

October 01, 2017

Whilst attending the latest ION GNSS+ conference I had the confirmation that Prof. Kai Borre disappeared this Summer. He has been a very important reference to me, especially in the early stages of my career. I am sure many other radio-navigation, geodesy and DSP engineers will convene with me. I knew him pretty well and I could not find anywhere an epitaph, so I today feel compelled to my leave my tribute to him here and hope others will intimately share my feeling.

Rest in peace Kai. My gratitude, for inspiring me until your very last moment.

The ware for August 2017 is the controller IC for a self-flashing (two-pin, T1 case) RGB LED. It’s photographed through the lens of the LED, which is why the die appears so distorted. Somehow, Piotr — the first poster — guessed it on the first try without much explanation. Congrats, email me for your prize!

September 28, 2017

After Toulouse and Orange, Lyon is the third city chosen for opening a Free Electrons office. Since September 1st of this year (2017), Alexandre Belloni and Grégory Clement have been working more precisely in Oullins close to the subway and the train station. It is the first step to make the Lyon team grow, with the opportunity to welcome interns and engineers.

Their new desks are already crowded by many boards running our favorite system.

September 27, 2017

As most people know, getting GPU-based 3D acceleration to work on ARM platforms has always been difficult, due to the closed nature of the support for such GPUs. Most vendors provide closed-source binary-only OpenGL implementations in the form of binary blobs, whose quality depend on the vendor.

This situation is getting better and better through vendor-funded initiatives like for the Broadcom VC4 and VC5, or through reverse engineering projects like Nouveau on Tegra SoCs, Etnaviv on Vivante GPUs, Freedreno on Qualcomm’s. However there are still GPUs where you do not have the option to use a free software stack: PowerVR from Imagination Technologies and Mali from ARM (even though there is some progress on the reverse engineering effort).

Allwinner SoCs are using either a Mali GPU from ARM or a PowerVR from Imagination Technologies, and therefore, support for OpenGL on those platforms using a mainline Linux kernel has always been a problem. This is also further complicated by the fact that Allwinner is mostly interested in Android, which uses a different C library that avoids its use in traditional glibc-based systems (or through the use of libhybris).

However, we are happy to announce that Allwinner gave us clearance to publish the userspace binary blobs that allows to get OpenGL supported on Allwinner platforms that use a Mali GPU from ARM, using a recent mainline Linux kernel. Of course, those are closed source binary blobs and not a nice fully open-source solution, but it nonetheless allows everyone to have OpenGL support working, while taking advantage of all the benefits of a recent mainline Linux kernel. We have successfully used those binary blobs on customer projects involving the Allwinner A33 SoCs, and they should work on all Allwinner SoCs using the Mali GPU.

In order to get GPU support to work on your Allwinner platform, you will need:

The kernel-side driver, available on Maxime Ripard’s Github repository. This is essentially the Mali kernel-side driver from ARM, plus a number of build and bug fixes to make it work with recent mainline Linux kernels.

The Device Tree description of the GPU. We introduced Device Tree bindings for Mali GPUs in the mainline kernel a while ago, so that Device Trees can describe such GPUs. Such description has been added for the Allwinner A23 and A33 SoCs as part of this commit.

The userspace blob, which is available on Free Electrons GitHub repository. It currently provides the r6p2 version of the driver, with support for both fbdev and X11 systems. Hopefully, we’ll gain access to newer versions in the future, with additional features (such as GBM support).

If you want to use it in your system, the first step is to have the GPU definition in your device tree if it’s not already there. Then, you need to compile the kernel module:

You should be all set. Of course, you will have to link your OpenGL applications or libraries against those user-space blobs. You can check that everything works using OpenGL test programs such as es2_gears for example.

September 21, 2017

Four-camera stereo rig prototype is capable of measuring distances thousands times exceeding the camera baseline over wide (60 by 45 degrees) field of view. With 150 mm distance between lenses it provides ranging data at 200 meters with 10% accuracy, production units will have higher accuracy. Initial implementation uses software post-processing, but the core part of the software (tile processor) is designed as FPGA simulation and will be moved to the actual FPGA of the camera for the real time applications.

Scroll down or just hyper-jump to Scene viewer for the links to see example images and reconstructed scenes.

Background

Most modern advances in the area of the visual 3d reconstruction are related to structure from motion (SfM) where high quality models are generated from the image sequences, including those from the uncalibrated cameras (such as cellphone ones). Another fast growing applications depend on the active ranging with either LIDAR scanning technology or time of flight (ToF) sensors.

Each of these methods has its limitations and while widespread smart phone cameras attracted most of the interest in the algorithms and software development, there are some applications where the narrow baseline (distance between the sensors is much smaller, than the distance to the objects) technology has advantages.

Such applications include autonomous vehicle navigation where other objects in the scene are moving and 3-d data is needed immediately (not when the complete image sequence is captured), and the elements to be ranged are ahead of the vehicle so previous images would not help much. ToF sensors are still very limited in range (few meters) and the scanning LIDAR systems are either slow to update or have very limited field of view. Passive (visual only) ranging may be desired for military applications where the system should stay invisible by not shining lasers around.

Technology snippets

Narrow baseline and subpixel resolution

The main challenge for the narrow baseline systems is that the distance resolution is much worse than the lateral one. The minimal resolved 3d element, voxel is very far from resembling a cube (as 2d pixels are usually squares) – with the dimensions we use: pixel size – 0.0022 mm, lens focal length f = 4.5 mm and the baseline of 150 mm such voxel at 100 m distance is 50 mm high by 50 mm wide and 32 meters deep. The good thing is that while the lateral resolution generally is just one pixel (can be better only with additional knowledge about the object), the depth resolution can be improved with reasonable assumptions by an order of magnitude by using subpixel resolution. It is possible when there are multiple shifted images of the same object (that for such high range to baseline ratio can safely be assumed fronto-parallel) and every object is presented in each image by multiple pixels. With 0.1 pixel resolution in disparity (or shift between the two images) the depth dimension of the voxel at 100 m distance is 3.2 meters. And as we need multiple pixel objects for the subpixel disparity resolution, the voxel lateral dimensions increase (there is a way to restore the lateral resolution to a single pixel in most cases). With fixed-width window for the image matching we use 8×8 pixel grid (16×16 pixel overlapping tiles) similar to what is used by some image/video compression algorithms (such as JPEG) the voxel dimensions at 100 meter range become 0.4 m x 0.4 m x 3.2 m. Still not a cube, but the difference is significantly less dramatic.

Subpixel accuracy and the lens distortions

Matching images with subpixel accuracy requires that lens optical distortion of each lens is known and compensated with the same or better precision. Most popular way to present lens distortions is to use radial distortion model where relation of distorted and ideal pin-hole camera image is expressed as polynomial of point radius, so in polar coordinates the angle stays the same while the radius changes. Fisheye lenses are better described with “f-theta” model, where linear radial distance in the focal plane corresponds to the angle between the lens axis and ray to the object.

Such radial models provide accurate results only with ideal lens elements and when such elements are assembled so that the axis of each individual lens element precisely matches axes of the other elements – both in position and orientation. In the real lenses each optical element has minor misalignment, and that limits the radial model. For the lenses we had dealt with and with 5MPix sensors it was possible to get down to 0.2 – 0.3 pixels, so we supplemented the radial distortion described by the analytical formula with the table-based residual image correction. Such correction reduced the minimal resolved disparity to 0.05 – 0.08 pixels.

Fixed vs. variable window image matching and FPGA

Modern multi-view stereo systems that work with wide baselines use elaborate algorithms with variable size windows when matching image pairs, down to single pixels. They aggregate data from the neighbor pixels at later processing stages, that allows them to handle occlusions and perspective distortions that make paired images different. With the narrow baseline system, ranging objects at distances that are hundreds to thousands times larger than the baseline, the difference in perspective distortions of the images is almost always very small. And as the only way to get subpixel resolution requires matching of many pixels at once anyway, use of the fixed size image tiles instead of the individual pixels does not reduce flexibility of the algorithm much.

Processing of the fixed-size image tiles promises significant advantage – hardware-accelerated pixel-level tile processing combined with the higher level software that operates with the per-tile data rather than with per-pixel one. Tile processing can be implemented within the FPGA-friendly stream processing paradigm leaving decision making to the software. Matching image tiles may be implemented using methods similar to those used for image and especially video compression where motion vector estimation is similar to calculation of the disparity between the stereo images and similar algorithms may be used, such as phase-only correlation (PoC).

Two dimensional array vs. binocular and inline camera rigs

Usually stereo cameras or fixed baseline multi-view stereo are binocular systems, with just two sensors. Less common systems have more than two lenses positioned along the same line. Such configurations improve the useful camera range (ability to measure near and far objects) and reduce ambiguity when dealing with periodic object structures. Even less common are the rigs where the individual cameras form a 2d structure.

In this project we used a camera with 4 sensors located in the corners of a square, so they are not co-linear. Correlation-based matching of the images depends on the detailed texture in the matched areas of the images – perfectly uniform objects produce no data for depth estimation. Additionally some common types of image details may be unsuitable for certain orientations of the camera baselines. Vertical concrete pole can be easily correlated by the two horizontally positioned cameras, but if the baseline is turned vertical, the same binocular camera rig would fail to produce disparity value. Similar is true when trying to capture horizontal features with the horizontal binocular system – such predominantly horizontal features are common when viewing near flat horizontal surfaces at high angles of incidents (almost parallel to the view direction).

With four cameras we process four image pairs – 2 horizontal (top and bottom) and 2 vertical (right and left), and depending on the application requirements for particular image region it is possible to combine correlation results of all 4 pairs, or just horizontal and vertical separately. When all 4 baselines have equal length it is easier to combine image data before calculating the precise location of the correlation maximums – 2 pairs can be combined directly, and the 2 others after rotating tiles by 90 degrees (swapping X and Y directions, transposing the tiles 2d arrays).

Image rectification and resampling

Many implementations of the multi-view stereo processing start with the image rectification that involves correction for the perspective and lens distortions, projection of the individual images to the common plane. Such projection simplifies image tiles matching by correlation, but as it involves resampling of the images, it either reduces resolution or requires upsampling and so increases required memory size and processing complexity.

This implementation does not require full de-warping of the images and related resampling with fractional pixel shifts. Instead we split geometric distortion of each lens into two parts:

common (average) distortion of all four lenses approximated by analytical radial distortion model, and

small residual deviation of each lens image transformation from the common distortion model

Common radial distortion parameters are used to calculate matching tile location in each image, and while integer rounded pixel shifts of the tile centers are used directly when selecting input pixel windows, the fractional pixel remainders are preserved and combined with the other image shifts in the FPGA tile processor. Matching of the images is performed in this common distorted space, the tile grid is also mapped to this presentation, not to the fully rectified rectilinear image.

Small individual lens deviations from the common distortion model are smooth 2-d functions over the 2-d image plane, they are interpolated from the calibration data stored for the lower resolution grid.

We use low distortion sorted lenses with matching focal lengths to make sure that the scale mismatch between the image tiles is less than tile size in the target subpixel intervals (0.1 pix). Low distortion requirement extends the distances range to the near objects, because with the higher disparity values matching tiles in the different images land to the differently distorted areas. Focal length matching allows to use modulated complex lapped transform (CLT) that similar to discrete Fourier transform (DFT) is invariant to shifts, but not to scaling (log-polar coordinates are not applicable here, as such transformation would deny shift invariance).

Matching of the images acquired with the almost identical lenses is rather insensitive to the lens aberrations that degrade image quality (mostly reduce sharpness), especially in the peripheral image areas. Aberration correction is still needed to get sharp textures in the result 3d models over full field of view, the resolution of the modern sensors is usually better than what lenses can provide. Correction can be implemented with space-variant (different kernels for different areas of the image) deconvolution, we routinely use it for post-processing of Eyesis4π images. The DCT-based implementation is described in the earlier blog post.

Space-variant deconvolution kernels can absorb (be combined with during calibration processing) the individual lens deviations from the common distortion model, described above. Aberration correction and image rectification to the common image space can be performed simultaneously using the same processing resources.

Two dimensional vs. single dimensional matching along the epipolar lines

Common approach for matching image pairs is to replace the two-dimensional correlation with a single-dimensional task by correlating pixels along the epipolar lines that are just horizontal lines for horizontally built binocular systems with the parallel optical axes. Aggregation of the correlation maximums locations between the neighbor parallel lines of pixels is preformed in the image pixels domain after each line is processed separately.

For tile-based processing it is beneficial to perform a full 2-d correlation as the phase correlation is performed in the frequency domain, and after the pointwise multiplication during aberration correction the image tiles are already available in the 2d frequency domain. Two dimensional correlation implies aggregation of data from multiple scan lines, it can tolerate (and be used to correct) small lens misalignments, with appropriate filtering it can be used to detect (and match) linear features.

Implementation

Prototype camera

Experimental camera looks similar to Elphel regular H-camera – we just incorporated different sensor front ends (3d CAD model) that are used in Eyesis4π and added adjustment screws to align optical axes of the lenses (heading and tilt) and orientations of the image sensors (roll). Sensors are 5 Mpix 1/2″ format On Semiconductor MT9P006, lenses – Evetar N125B04530W.

We selected lenses with the same focal length within 1%, and calibrated the camera using our standard camera rotation machine and the target pattern. As we do not yet have production adjustment equipment and software, the adjustment took several iterations: calibrating the camera and measuring extrinsic parameters of each sensor front end, then rotating each of the adjustment screws according to spreadsheet-calculated values, and then re-running the whole calibration process again. Finally the calibration results: radial distortion parameters, SFE extrinsic parameters, vignetting and deconvolution kernels were converted to the form suitable for run-time application (now – during post-processing of the captured images).

Figure 2. Camera block diagram

This prototype still uses 3d-printed parts and such mounts proved to be not stable enough, so we had to add field calibration and write code for bundle adjustment of the individual imagers orientations from the 2-d correlation data for each of the 4 individual pairs.

Camera performance depends on the actual mechanical stability, software-compensation can only partially mitigate this misalignment problem and the precision of the distance measurements was reduced when cameras went off by more than 20 pixels after being carried in a backpack. Nevertheless the scene reconstruction remained possible.

Software

Multi-view stereo rigs are capable of capturing dynamic scenes so our goal is to make a real-time system with most of the heavy-weight processing be done in the FPGA.

One of the major challenges here is how to combine parallel and stream processing capabilities of the FPGA with the flexibility of the software needed for implementation of the advanced 3d reconstruction algorithms. This approach is to use the FPGA-based tile processor to perform uniform operations on the lists of “tiles” – fixed square overlapping windows in the images. FPGA processes tile data at the pixel level, while software operates the whole tiles.

Initial implementation does not contain actual FPGA processing, so far we only tested in FPGA some of the core functions – two dimensional 8×8 DCT-IV needed for both 16×16 CLT and ICLT. Current code consists of the two separate parts – one part (tile processor) simulates what will be moved to the FPGA (it handles image tiles at the pixel level), and the other one is what will remain software – it operates on the tile level and does not deal with the individual pixels. These two parts interact using shared system memory, tile processor has exclusive access to the dedicated image buffer and calibration data.

center disparity, so the each of the 4 image tiles will be shifted accordingly, and

the code of operation(s) to be performed on that tile.

Figure 4. Correlation processor

Tile processor performs all or some (depending on the tile operation codes) of the following operations:

Reads the tile tasks from the shared system memory.

Calculates locations and loads image and calibration data from the external image buffer memory (using on-chip memory to cache data as the overlapping nature of the tiles makes each pixel to participate on average in 4 neighbor tiles).

Converts tiles to frequency domain using CLT based on 2d DCT-IV and DST-IV.

Performs aberration correction in the frequency domain by pointwise multiplication by the calibration kernels.

Calculates correlation-related data (Figure 4) for the tile pairs, resulting in tile disparity and disparity confidence values for all pairs combined, and/or more specific correlation types by pointwise multiplication, inverse CLT to the pixel domain, filtering and local maximums extraction by quadratic interpolation or windowed center of mass calculation.

Calculates combined texture for the tile (Figure 5), using alpha channel to mask out pixels that do not match – this is the way how to effectively restore single-pixel lateral resolution after aggregating individual pixels to tiles. Textures can be combined after only programmed shifts according to specified disparity, or use additional shift calculated in the correlation module.

Calculates other integral values for the tiles (Figure 5), such as per-channel number of mismatched pixels – such data can be used for quick second-level (using tiles instead of pixels) correlation runs to determine which 3d volumes potentially have objects and so need regular (pixel-level) matching.

Single tile processor operation deals with the scene objects that would be projected to this tile’s 16×16 pixels square on the sensor of the virtual camera located in the center between the four actual physical cameras. The single pass over the tile data is limited not just laterally, but in depth also because for the tiles to correlate they have to have significant overlap. 50% overlap corresponds to the correlation offset range of ±8 pixels, better correlation contrast needs 75% overlap or ±4 pixels. The tile processor “probes” not all the voxels that project to the same 16×16 window of the virtual image, but only those that belong to the certain distance range – the distances that correspond to the disparities ±4 pixels from the value provided for the tile.

That means that a single processing pass over a tile captures data in a disparity space volume, or a macro-voxel of 8 pixels wide by 8 pixels high by 8 pixels deep (considering the central part of the overlapping volumes). And capturing the whole scene may require multiple passes for the same tile with different disparity. There are ways how to avoid full range disparity sweep (with 8 pixel increments) for all tiles – following surfaces and detecting occlusions and discontinuities, second-level correlation of tiles instead of the individual pixels.

Another reason for the multi-pass processing of the same tile is to refine the disparity measured by correlation. When dealing with subpixel coordinates of the correlation maximums – either located by quadratic approximation or by some form of center of mass evaluation, the calculated values may have bias and disparity histograms reveal modulation with the pixel period. Second “refine” pass, where individual tiles are shifted by the disparity measured in the previous pass reduces the residual offset of the correlation maximum to a fraction of a pixel and mitigates this type of bias. Tile shift here means a combination of the integer pixel shift of the source images and the fractional (in the ±0.5 pixel range) shift that is performed in the frequency domain by multiplication by the cosine/sine phase rotator.

Total processing time and/or required FPGA resources linearly depend on the number of required tile processor operations and the software may use several methods to reduce this number. In addition to the two approaches mentioned above (following surfaces and second-level correlation) it may be possible to reduce the field of view to a smaller area of interest, predict current frame scene from the previous frames (as in 2d video compression) – tile processor paradigm preserves flexibility of the various algorithms that may be used in the scene 3d reconstruction software stack.

Scene viewer

Index page shows a map (you may select from several providers) with the markers for the locations of the captured scenes. On the left there is a vertical ribbon of the thumbnails – you may scroll it with a mouse wheel or by dragging.

Thumbnails are shown only for the markers that fit on screen, so zooming in on the map may reduce number of the visible thumbnails. When you select some thumbnail, the corresponding marker opens on the map, and one or several scenes are shown – one line per each scene (identified by the Unix timestamp code with fractional seconds) captured at the same locations.

The scene that matches the selected thumbnail is highlighted (as 4-th line in the Figure 6). Some scenes have different versions of reconstruction from the same source images – they are listed in the same line (like first line in the Figure 6). Links lead to the viewers of the selected scene/version.

Figure 7. Selection of the map / satellite imagery provider

We do not have ground truth models for the captured scenes build with the active scanners. Instead as the most interesting is ranging of the distant objects (hundreds of meters) it is possible to use publicly available satellite imagery and match it to the captured models. We had ideal view from Elphel office window – each crack on the pavement was visible in the satellite images so we could match them with the 3d model of the scene. Unfortunately they ruined it recently by replacing asphalt :-).

The scene viewer combines x3dom representation of the 3d scene and the re-sizable overlapping map view. You may switch the map imagery provider by clicking on the map icon as shown in the Figure 7.

The scene and map views are synchronized to each other, there are several ways of navigation in either 3d or map area:

use scroll wheel over the 3d area to change camera zoom (field of view is indicated on the map);

drag with middle button pressed in the 3d view to move camera perpendicular to the view direction;

drag the the camera icon (green circle) on the map to move camera horizontally;

toggle ⇅ button and move the camera vertically;

press a hotkey t over the 3d area to reset to the initial view: set azimuth and elevation same as captured;

press a hotkey r over the 3d area to set view azimuth as captured, elevation equal to zero (horizontal view).

Figure 8. 3D model to map comparison

Comparison of the 3d scene model and the map uses ball markers. By default these markers are one meter in diameter, the size can be changed on the settings (⋮) page.

Moving pointer over the 3d area with Ctrl key pressed causes the ball to follow the cursor at a distance where the view line intersects the nearest detected surface in the scene. It simultaneously moves the corresponding marker over the map view and indicates the measured distance.

Ctrl-click places the ball marker on the 3d scene and on the map. It is then possible to drag the marker over the map and read the ground truth distance. Dragging the marker over the 3d scene updates location on the map, but not the other way around, in edit mode mismatch data is used to adjust the captured scene location and orientation.

Program settings used during reconstruction limit the scene far distance to z = 1000 meters, all more distant objects are considered to be located at infinity. X3d allows to use images at infinity using backdrop element, but it is not flexible enough and is not supported by some other programs. In most models we place infinity textures to a large billboard at z = 10,000 meters, and it is where the ball marker will appear if placed on the sky or other far objects.

Figure 9. Settings and link to four images

The settings page (⋮) shown in the Figure 9 has a link to the four-image viewer (Figure 10). These four images correspond to the captured views and are almost “raw images” used for scene reconstruction. These images were subject to the optical aberration correction and are partially rectified – they are rendered as if they were captured by the same camera that has only strictly polynomial radial distortion.

Such images are not actually used in the reconstruction process, they are rendered only for the debug and demonstration purposes. The equivalent data exists in the tile processor only in the frequency domain form as an intermediate result, and was subject to just linear processing (to avoid possible unintended biases) so the images have some residual locally-checkerboard pattern that is due to the Bayer mosaic filter (discussed in the earlier blog). Textures that are generated from the combination of all four images have the contrast of such pattern significantly lower. It is possible to add some non-linear filtering at the very last stage of the texture generation.

Each scene model has a download link for the archive that contains the model itself as *.x3d file and Wavefront *.obj and *.mtl as well as the corresponding RGBA texture files as PNG images. Initially I missed the fact that x3d and obj formats have opposite direction of surface normals for the same triangular faces, so almost half of the Wavefront files still have incorrect (opposite direction) surface normals.

Results

Our initial plan was to test algorithms for the tile processor before implementing them in FPGA. The tile processor provides data for the disparity space image (DSI) – confidence value of having certain disparity for specified 2d position in the image, it also generates texture tiles.

When the tile processor code was written and tested, we still needed some software to visualize the results. DSI itself seemed promising (much better coverage than what I had with earlier experiments with binocular images), but when I tried to convert these textured tiles into viewable x3d model directly, it was a big disappointment. Result did not look like a 3d scene – there were many narrow triangles that made sense only when viewed almost directly from the camera actual location, a small lateral viewpoint movement – and the image was falling apart into something unrecognizable.

I was not able to find ready to use code and the plan to write a quick demo for the tile processor and generated DSI seemed less and less realistic. Eventually it took at least three times longer to get somewhat usable output than to develop DCT-based tile processor code itself.

Current software is still incomplete, lacks many needed features (it even does not cut off background so wires over the sky steal a lot of surrounding space), it runs slow (several minutes per single scene), but it does provide a starting point to evaluate performance of the long range 4-camera multi-view stereo system. Much of the intended functionality does not work without more parameter tuning, but we decided to postpone improvements to the next stage (when we will have cameras that are more stable mechanically) and instead try to capture more of very different scenes, process them in batch mode (keeping the same parameter values for all new scenes) and see what will be the output.

As soon as the program was able to produce somewhat coherent 3d model from the very first image set captured through Elphel office window, Oleg Dzhimiev started development of the web application that allows to match the models with the map data. After adding more image sets I noticed that the camera calibration did not hold. Each individual sub-camera performed nicely (they use thermally compensated mechanical design), but their extrinsic parameters did change and we had to add code for field calibration that uses image themselves. The best accuracy in disparity measurement over the field of view still requires camera poses to match ones used at full calibration, so later scenes with more developed misalignment (>20 pixels) are less precise than earlier (captured in Salt Lake City).

We do not have an established method to measure ranging precision for different distances to object – the disparity values are calculated together with the confidence and in lower confidence areas the accuracy is lower, including places where no ranging is possible due to the complete absence of the visible details in the images. Instead it is possible to compare distances in various scene models to those on the map and see where such camera is useful. With 0.1 pixel disparity resolution and 150 mm baseline we should be able to measure 300 m distances with 10% accuracy, and for many captured scene objects it already is not much worse. We now placed orders to machine the new camera parts that are needed to build a more mechanically stable rig. And parallel to upgrading the hardware, we’ll start migrating the tile processor code from Java to Verilog.

And what’s next?Elphel goal is to provide our users with the high performance hackable products and freedom to modify them in the ways and for the purposes we could not imagine ourselves. But it is fun to fantasize about at least some possible applications:

Obviously, self-driving cars – increased number of cameras located in a 2d pattern (square) results in significantly more robust matching even with low-contrast textures. It does not depend on sequential scanning and provides simultaneous data over wide field of view. Calculated confidence of distance measurements tells when alternative (active) ranging methods are needed – that would help to avoid infamous accident with a self-driving car that went under a truck.

Visual odometry for the drones would also benefit from the higher robustness of image matching.

Rovers on Mars or other planets using low-power passive (visual based) scene reconstruction.

Maybe self-flying passenger multicopters in the heavy 3d traffic? Sure they will all be equipped with some transponders, but what about aerial roadkills? Like a flock of geese that forced water landing.

High speed boating or sailing over uneven seas with active hydrofoils that can look ahead and adjust to the future waves.

Landing on the asteroids for physical (not just Bitcoin) mining? With 150 mm baseline such camera can comfortably operate within several hundred meters from the object, with 1.5 m that will scale to kilometers.

Cinematography: post-production depth of field control that would easily beat even the widest format optics, HDR with a pair of 4-sensor cameras, some new VFX?

Multi-spectral imaging where more spatially separate cameras with different bandpass filters can be combined to the same texture in the 3d scene.

Capturing underwater scenes and measuring how far the sea creatures are above the bottom.

September 12, 2017

The software PTP Track Hound which can capture and analyze PTP network traffic now understands White Rabbit TLVs. So the Track Hound can now sniff the tracks that the White Rabbit leaves behind.Track Hound is made freely available by Meinberg. One may want to know that the source code is not available under an Open Licence.

Free Electrons engineers Quentin Schulz, Maxime Ripard and Miquèl Raynal will also be attending the event, which means that 7 people from Free Electrons will participate to ELCE!

In addition to the main ELCE conference, Thomas Petazzoni will participate to the Buildroot Developers Days, a 2-day hackaton organized on Saturday and Sunday prior to ELCE, and will participate to the Device Tree workshop organized on Thursday afternoon.

Once again, we’re really happy to participate to this conference, and looking forward to meeting again with a large number of Linux kernel and embedded Linux developers!

This release gathers 13006 non-merge commits, amongst which 239 were made by Free Electrons engineers. According to the LWN article on 4.13 statistics, this makes Free Electrons the 13th contributing company by number of commits, the 10th by lines changed.

The most important contributions from Free Electrons for this release have been:

In the RTC subsystem

Alexandre Belloni introduced a new method for registering RTC devices, with one step for the allocation, and one step for the registration itself, which allows to solve race conditions in a number of drivers.

Alexandre Belloni added support for exposing the non-volatile memory found in some RTC devices through the Linux kernel nvmem framework, making them usable from userspace. A few drivers were changed to use this new mechanism.

In the MTD/NAND subsystem

Boris Brezillon did a large number of fixes and minor improvements in the NAND subsystem, both in the core and in a few drivers.

Thomas Petazzoni contributed the support for on-die ECC, specifically with Micron NANDs. This allows to use the ECC calculation capabilities of the NAND chip itself, as opposed to using software ECC (calculated by the CPU) or ECC done by the NAND controller.

Thomas Petazzoni contributed a few improvements to the FSMC NAND driver, used on ST Spear platforms. The main improvement is to support the ->setup_data_interface() callback, which allows to configure optimal timings in the NAND controller.

Support for Allwinner ARM platforms

Alexandre Belloni improved the sun4i PWM driver to use the so-called atomic API and support hardware read out.

Antoine Ténart improved the sun4i-ss cryptographic engine driver to support the Allwinner A13 processor, in addition to the already supported A10.

Maxime Ripard contributed HDMI support for the Allwinner A10 processor (in the DRM subsystem) and a number of related changes to the Allwinner clock support.

Quentin Schulz improved the support for battery charging through the AXP20x PMIC, used on Allwinner platforms.

Support for Atmel ARM platforms

Alexandre Belloni added suspend/resume support for the Atmel SAMA5D2 clock driver. This is part of a larger effort to implement the backup mode for the SAMA5D2 processor.

Alexandre Belloni added suspend/resume support in the tcb_clksrc driver, used as for clocksource and clockevents on Atmel SAMA5D2.

Alexandre Belloni cleaned up a number of drivers, removing support for non-DT probing, which is possible now that the AVR32 architecture has been dropped. Indeed, the AVR32 processors used to share the same drivers as the Atmel ARM processors.

Alexandre Belloni added the core support for the backup mode on Atmel SAMA5D2, a suspend/resume state with significant power savings.

Boris Brezillon switched Atmel platforms to use the new binding for the EBI and NAND controllers.

Boris Brezillon added support for timing configuration in the Atmel NAND driver.

Quentin Schulz added suspend/resume support to the Bosch m_can driver, used on Atmel platforms.

Support for Marvell ARM platforms

Antoine Ténart contributed a completely new driver (3200+ lines of code) for the Inside Secure EIP197 cryptographic engine, used in the Marvell Armada 7K and 8K processors. He also subsequently contributed a number of fixes and improvements for this driver.

Antoine Ténart improved the existing mvmdio driver, used to communicate with Ethernet PHYs over MDIO on Marvell platforms to support the XSMI variant found on Marvell Armada 7K/8K, used to communicate with 10G capable PHYs.

Antoine Ténart contributed minimal support for 10G Ethernet in the mvpp2 driver, used on Marvell Armada 7K/8K. For now, the driver still relies on low-level initialization done by the bootloader, but additional changes in 4.14 and 4.15 will remove this limitation.

Grégory Clement added a new pinctrl driver to configure the pin-muxing on the Marvell Armada 37xx processors.

Grégory Clement did a large number of changes to the clock drivers used on the Marvell Armada 7K/8K processors to prepare the addition of pinctrl support.

Grégory Clement added support for Marvell Armada 7K/8K to the existing mvebu-gpio driver.

Thomas Petazzoni added support for the ICU, a specialized interrupt controller used on the Marvell Armada 7K/8K, for all devices located in the CP110 part of the processor.

Free Electrons engineers are not only contributors, but also maintainers of various subsystems in the Linux kernel, which means they are involved in the process of reviewing, discussing and merging patches contributed to those subsystems:

Maxime Ripard, as the Allwinner platform co-maintainer, merged 113 patches from other contributors

Boris Brezillon, as the MTD/NAND maintainer, merged 62 patches from other contributors

Alexandre Belloni, as the RTC maintainer and Atmel platform co-maintainer, merged 57 patches from other contributors

Grégory Clement, as the Marvell EBU co-maintainer, merged 47 patches from other contributors

September 02, 2017

There's a new project currently undergoing crowd funding that might be
of interest to the former Openmoko community: The Purism Librem 5
campaign.

Similar to Openmoko a decade ago, they are
aiming to build a FOSS based smartphone built on GNU/Linux without any
proprietary drivers/blobs on the application processor, from
bootloader to userspace.

Furthermore (just like Openmoko) the baseband processor is fully
isolated, with no shared memory and with the Linux-running application
processor being in full control.

They go beyond what we wanted to do at Openmoko in offering hardware
kill switches for camera/phone/baseband/bluetooth. During Openmoko days
we assumed it is sufficient to simply control all those bits from the
trusted Linux domain, but of course once that might be compromised, a
physical kill switch provides a completely different level of security.

I wish them all the best, and hope they can leave a better track record
than Openmoko. Sure, we sold some thousands of phones, but the company
quickly died, and the state of software was far from end-user-ready. I
think the primary obstacles/complexities are verification of the
hardware design as well as the software stack all the way up to the UI.

The budget of ~ 1.5 million seems extremely tight from my point of view,
but then I have no information about how much Puri.sm is able to invest
from other sources outside of the campaign.

If you're a FOSS developer with a strong interest in a Free/Open
privacy-first smartphone, please note that they have several job openings, from
Kernel Developer to
OS Developer
to UI Developer.
I'd love to see some talents at work in that area.

It's a bit of a pity that almost all of the actual technical details are
unspecified at this point (except RAM/flash/main-cpu). No details on
the cellular modem/chipset used, no details on the camera, neither on
the bluetooth chipset, wifi chipset, etc. This might be an indication
of the early stage of their plannings. I would have expected that one
has ironed out those questions before looking for funding - but then,
it's their campaign and they can run it as they see it fit!

I for my part have just put in a pledge for one phone. Let's see what
will come of it. In case you feel motivated by this post to join in:
Please keep in mind that any crowdfunding campaign bears significant
financial risks. So please make sure you made up your mind and don't
blame my blog post for luring you into spending money :)

September 01, 2017

Cellular modems have existed for decades and come in many shapes and kinds. They contain the cellular
baseband processor, RF frontend, protocol stack software and anything else required to communicate with a
cellular network. Basically a phone without display or input.

During the last decade or so, the vast majority of cellular modems come as LGA modules, i.e. a small PCB with
all components on the top side (and a shielding can), which has contact pads on the bottom so you can solder
it onto your mainboard. You can obtain them from vendors such as Sierra Wireless, u-blox, Quectel, ZTE,
Huawei, Telit, Gemalto, and many others.

In most cases, the vendors now also solder those modules to small adapter boards to offer the same product
in mPCIe form-factor. Other modems are directly manufactured in mPCIe or NGFF aka m.2 form-factor.

As long as those modems were still 2G / 2.5G / 2.75G, the main interconnection with the host (often some
embedded system) was a serial UART. The Audio input/output for voice calls was made available as analog
signals, ready to connect a microphone and spekaer, as that's what the cellular chipsets were designed for in
the smartphones. In the Openmoko phones we also interfaced the audio of the cellular modem in analog, exactly
for that reason.

From 3G onwards, the primary interface towards the host is now USB, with the modem running as a USB device.
If your laptop contains a cellular modem, you will see it show up in the lsusb output.

From that point onwards, it would have made a lot of sense to simply expose the audio also via USB. Simply
offer a multi-function USB device that has both whatever virutal serial ports for AT commands and network
device for IP, and add a USB Audio device to it. It would simply show up as a "USB sound card" to the host,
with all standard drivers working as expected. Sadly, nobody seems to have implemented this, at least not in
a supported production version of their product

Instead, what some modem vendors have implemented as an ugly hack is the transport of 8kHz 16bit PCM samples
over one of the UARTs. See for example the Quectel UC-20 or the Simcom SIM7100 which implement such a method.

All the others ignore any acess to the audio stream from software to a large part. One wonders why that is.
From a software and systems architecture perspective it would be super easy. Instead, what most vendors do,
is to expose a digital PCM interface. This is suboptimal in many ways:

there is no mPCIe standard on which pins PCM should be exposed

no standard product (like laptop, router, ...) with mPCIe slot will have anything connected to those PCM
pins

Furthermore, each manufacturer / modem seems to support a different subset of dialect of the PCM interface in
terms of

voltage (almost all of them are 1.8V, while mPCIe signals normally are 3.3V logic level)

master/slave (almost all of them insist on being a clock master)

sample format (alaw/ulaw/linear)

clock/bit rate (mostly 2.048 MHz, but can be as low as 128kHz)

frame sync (mostly short frame sync that ends before the first bit of the sample)

It's a real nightmare, when it could be so simple. If they implemented USB-Audio, you could plug a cellular
modem into any board with a mPCIe slot and it would simply work. As they don't, you need a specially designed
mainboard that implements exactly the specific dialect/version of PCM of the given modem.

By the way, the most "amazing" vendor seems to be u-blox. Their Modems support PCM audio, but only the
solder-type version. They simply didn't route those signals to the mPCIe slot, making audio impossible to use
when using a connectorized modem. How inconvenient.

Summary

If you want to access the audio signals of a cellular modem from software, then you either

have standard hardware and pick one very specific modem model and hope this is available sufficiently long during your application, or

On the Osmocom mpcie-breakout board and the sysmocom
QMOD board we have exposed the PCM related pins on
2.54mm headers to allow for some separate board to pick up that PCM and offer it to the host system. However,
such separate board hasn't been developed so far.

For many years I've been fascinated by the XMOS XCore architecture. It offers a surprisingly refreshing alternative
virtually any other classic microcontroller architectures out there. However, despite reading a lot about it
years ago, being fascinated by it, and even giving a short informal presentation about it once, I've so far
never used it. Too much "real" work imposes a high barrier to spending time learning about new architectures,
languages, toolchains and the like.

Introduction into XCore

Rather than having lots of fixed-purpose built-in "hard core" peripherals for interfaces such as SPI, I2C,
I2S, etc. the XCore controllers have a combination of

xCONNECT links that can be used to connect multiple processors over 2 or 5 wires per direction

channels as a means of communication (similar to sockets) between threads, whether on the same xCORE or
a remote core via xCONNECT

an extended C (xC) programming language to make use of parallelism, channels and the I/O ports

In spirit, it is like a 21st century implementation of some of the concepts established first with
Transputers.

My main interest in xMOS has been the flexibility that you get in implementing not-so-standard electronics
interfaces. For regular I2C, UART, SPI, etc. there is of course no such need. But every so often one
encounters some interface that's very rately found (like the output of an E1/T1 Line Interface Unit).

Also, quite often I run into use cases where it's simply impossible to find a microcontroller with a
sufficient number of the related peripherals built-in. Try finding a microcontroller with 8 UARTs, for
example. Or one with four different PCM/I2S interfaces, which all can run in different clock domains.

The existing options of solving such problems basically boil down to either implementing it in hard-wired
logic (unrealistic, complex, expensive) or going to programmable logic with CPLD or FPGAs. While the latter
is certainly also quite interesting, the learning curve is steep, the tools anything but easy to use and the
synthesising time (and thus development cycles) long. Furthermore, your board design will be more complex as
you have that FPGA/CPLD and a microcontroller, need to interface the two, etc (yes, in high-end use cases
there's the Zynq, but I'm thinking of several orders of magnitude less complex designs).

Of course one can also take a "pure software" approach and go for high-speed bit-banging. There are some ARM
SoCs that can toggle their pins. People have reported rates like 14 MHz being possible on a Raspberry Pi.
However, when running a general-purpose OS in parallel, this kind of speed is hard to do reliably over long
term, and the related software implementations are going to be anything but nice to write.

So the XCore is looking like a nice alternative for a lot of those use cases. Where you want a
microcontroller with more programmability in terms of its I/O capabilities, but not go as far as to go full-on
with FPGA/CPLD development in Verilog or VHDL.

My current use case

My current use case is to implement a board that can accept four independent PCM inputs (all in slave mode,
i.e. clock provided by external master) and present them via USB to a host PC. The final goal is to have
a board that can be combined with the sysmoQMOD and
which can interface the PCM audio of four cellular modems concurrently.

While XMOS is quite strong in the Audio field and you can find existing examples and app notes for I2S and
S/PDIF, I couldn't find any existing code for a PCM slave of the given requirements (short frame sync, 8kHz
sample rate, 16bit samples, 2.048 MHz bit clock, MSB first).

I wanted to get a feeling how well one can implement the related PCM slave. In order to test the slave, I
decided to develop the matching PCM master and run the two against each other. Despite having never written
any code for XMOS before, nor having used any of the toolchain, I was able to implement the PCM master and PCM
slave within something like ~6 hours, including simulation and verification. Sure, one can certainly do that
in much less time, but only once you're familiar with the tools, programming environment, language, etc. I
think it's not bad.

The biggest problem was that the clock phase for a clocked output port cannot be configured, i.e. the XCore
insists on always clocking out a new bit at the falling edge, while my use case of course required the
opposite: Clocking oout new signals at the rising edge. I had to use a second clock block to generate the
inverted clock in order to achieve that goal.

The good parts

Documentation excellent

I found the various pieces of documentation extremely useful and very well written.

Fast progress

I was able to make fast progress in solving the first task using the XMOS / Xcore approach.

Soft Cores developed in public, with commit log

You can find plenty of soft cores that XMOS has been developing on github at https://github.com/xcore,
including the full commit history.

This type of development is a big improvement over what most vendors of smaller microcontrollers like
Atmel are doing (infrequent tar-ball code-drops without commit history). And in the case of the classic uC
vendors, we're talking about drivers only. In the XMOS case it's about the entire logic of the peripheral!

You can for example see that for their I2C core, the very active commit history goes back to January 2011.

xSIM simulation extremely helpful

The xTIMEcomposer IDE (based on Eclipse) contains extensive tracing support and an extensible near cycle
accurate simulator (xSIM). I've implemented a PCM mater and PCM slave in xC and was able to simulate the
program while looking at the waveforms of the logic signals between those two.

The bad parts

Hard to get XCore chips

While the product portfolio on on the xMOS website looks
extremely comprehensive, the vast majority of the parts is not available from stock at distributors. You
won't even get samples, and lead times are 12 weeks (!). If you check at digikey, they have listed a total of
302 different XMOS controllers, but only 35 of them are in stock. USB capable are 15. With other distributors
like Farnell it's even worse.

I've seen this with other semiconductor vendors before, but never to such a large extent. Sure, some
packages/configurations are not standard products, but having only 11% of the portfolio actually available is
pretty bad.

In such situations, where it's difficult to convince distributors to stock parts, it would be a good idea for
XMOS to stock parts themselves and provide samples / low quantities directly. Not everyone is able to order
large trays and/or capable to wait 12 weeks, especially during the R&D phase of a board.

Extremely limited number of single-bit ports

In the smaller / lower pin-count parts, like the XU[F]-208 series in QFN/LQFP-64, the number of usable,
exposed single-bit ports is ridiculously low. Out of the total 33 I/O lines available, only 7 can be used as
single-bit I/O ports. All other lines can only be used for 4-, 8-, or 16-bit ports. If you're dealing
primarily with serial interfaces like I2C, SPI, I2S, UART/USART and the like, those parallel ports are of no
use, and you have to go for a mechanically much larger part (like XU[F]-216 in TQFP-128) in order to have a
decent number of single-bit ports exposed. Those parts also come with twice the number of cores, memory, etc-
which you don't need for slow-speed serial interfaces...

Insufficient number of exposed xLINKs

The smaller parts like XU[F]-208 only have one xLINK exposed. Of what use is that? If you don't have at least
two links available, you cannot daisy-chain them to each other, let alone build more complex structures like
cubes (at least 3 xLINKs).

So once again you have to go to much larger packages, where you will not use 80% of the pins or resources,
just to get the required number of xLINKs for interconnection.

Change to a non-FOSS License

XMOS deserved a lot of praise for releasing all their soft IP cores as Free / Open Source Software
on github at https://github.com/xcore. The License has basically been a 3-clause BSD license. This was a good move, as it meant that anyone
could create derivative versions, whether proprietary or FOSS, and there would be virtually no license
incompatibilities with whatever code people wanted to write.

However, to my very big disappointment, more recently XMOS seems to have changed their policy on this.
New soft cores (released at https://github.com/xmos as opposed to the old https://github.com/xcore) are made
available under a non-free license. This license
is nothing like BSD 3-clause license or any other Free Software or Open Source license. It restricts the
license to use the code together with an XMOS product, requires the user to contribute fixes back to XMOS and
contains references to importand export control. This license is incopatible with probably any FOSS license
in existance, making it impossible to write FOSS code on XMOS while using any of the new soft cores released
by XMOS.

But even beyond that license change, not even all code is provided in source code format anymore. The new
USB library (lib_usb) is provided as binary-only library, for example.

If you know anyone at XMOS management or XMOS legal with whom I could raise this topic of license change
when transitioning from older sc_* software to later lib_* code, I would appreciate this a lot.

Proprietary Compiler

While a lot of the toolchain and IDE is based on open source (Eclipse, LLVM, ...), the actual xC compiler is
proprietary.

The time transfer to Metsähovi, Kirkkonummi, occurs from the UTC-laboratory at VTT MIKES Metrology in Otaniemi via optical fibre using the White Rabbit protocol. VTT MIKES Metrology has been an early adopter of the White Rabbit technology for time transfer across long distances. White Rabbit was developed at CERN, the European Organization for Nuclear Research.

The measurements show, for example, how the travel time of light each way in a 50-kilometre fibre optic cable varies by approx. 7 nanoseconds within a 24-hour period as temperature changes affect the properties of the fibre optic cable, particularly its length.

The uncertainty of time transfer is expected to be 100 ps or better. The precision of frequency transfer is currently approx. 15 digits.

August 29, 2017

The Linux Plumbers conference has established itself as a major conference in the Linux ecosystem, discussing numerous aspects of the low-level layers of the Linux software stack. Linux Plumbers is organized around a number of micro-conferences, plus a number of more regular talks.

If you’re attending this conference, or are located in the Los Angeles area, and want to meet us, do not hesitate to drop us a line at info@free-electrons.com. You can also follow Free Electrons Twitter feed for updates during the conference.

August 25, 2017

Since v5.0 was released we have found a few problems in the WR Switch software package. This new v5.0.1 release does not include new functionality but contains important hotfixes to the v5.0. The FPGA bitstream used in v5.0.1 is exactly the same as in 5.0, therefore those same calibration values apply. As for any other release, you can find all the links to download the firmware binaries and manuals on our v5.0.1 release wiki page

August 19, 2017

Automatic Testing in Osmocom

So far, in many Osmocom projects we have unit tests next to the code.
Those unit tests are executing test on a per-C-function basis, and
typically use the respective function directly from a small test
program, executed at make check time. The actual main program (like
OsmoBSC or OsmoBTS) is not executed at that time.

We also have VTY testing, which specifically tests that the VTY
has proper documentation for all nodes of all commands.

Then there's a big gap, and we have osmo-gsm-tester for testing a full
cellular network end-to-end. It includes physical GSM modesm, coaxial
distribution network, attenuators, splitter/combiners, real BTS hardware
and logic to run the full network, from OsmoBTS to the core - both for
OsmoNITB and OsmoMSC+OsmoHLR based networks.

However, I think a lot of testing falls somewhere in between, where you
want to run the program-under-test (e.g. OsmoBSC), but you don't want to
run the MS, BTS and MSC that normally surroudns it. You want to test it
by emulating the BTS on the Abis sid and the MSC on the A side, and just
test Abis and A interface transactions.

For this kind of testing, I have recently started to investigate
available options and tools.

The somewhat difficult part is that they are implemented in scheme,
using the guile interpreter/compiler, as well as a C-language based
execution wrapper, which then is again called by another guile wrapper
script.

I've also cleaned up the Dockerfiles and related image generation for
the osmo-stp-master, m3ua-test and sua-test images, as well
as some scripts to actually execute them on one of the Builders. You
can find related Dockerfiles as well as associtaed Makfiles in
http://git.osmocom.org/docker-playground

The end result after integration with Osmocom jenkins can be seen in the
following examples on jenkins.osmocom.org
for M3UA
and for SUA

Triggering the builds is currently periodic once per night, but we could
of course also trigger them automatically at some later point.

I've also packaged the GGSN and the test cases each into separate Docker
containers called osmo-ggsn-latest and ggsn-test. Related
Dockerfiles and Makefiles can again be found in
http://git.osmocom.org/docker-playground - together with a Eclipse
TITAN Docker base image using Debian Stretch called debian-stretch-titan

Further Work

I've built some infrastructure for Gb (NS/BSSGP), VirtualUm and other
testing, but yet have to build Docker images and related jenkins
integration for it. Stay tuned about that. Also, lots more actual
tests cases are required. I'm very much looking forward to any
contributions.

Bleeding edge toolchain updates

All our bleeding edge toolchains have been updated, with the latest version of the toolchain components:

gcc 7.2.0, which was released 2 days ago

glibc 2.26, which was released 2 weeks ago

binutils 2.29

gdb 8.0

Those bleeding edge toolchains are now based on Buildroot 2017.08-rc2, which brings a nice improvement: the host tools (gcc, binutils, etc.) are no longer linked statically against gmp, mpfr and other host libraries. They are dynamically linked against them with an appropriate rpath encoded into the gcc and binutils binaries to find those shared libraries regardless of the installation location of the toolchain.

However, due to gdb 8.0 requiring a C++11 compiler on the host machine (at least gcc 4.8), our bleeding edge toolchains are now built in a Debian Jessie system instead of Debian Squeeze, which means that at least glibc 2.14 is needed on the host system to use them.

The only toolchains for which the tests are not successful are the MIPS64R6 toolchains, due to the Linux kernel not building properly for this architecture with gcc 7.x. This issue has already been reported upstream.

Stable toolchain updates

We haven’t changed the component versions of our stable toolchains, but we made a number of fixes to them:

The armv7m and m68k-coldfire toolchains have been rebuilt with a fixed version of elf2flt that makes the toolchain linker directly usable. This fixes building the Linux kernel using those toolchains.

The mips32r5 toolchain has been rebuilt with NaN 2008 encoding (instead of NaN legacy), which makes the resulting userspace binaries actually executable by the Linux kernel, which expects NaN 2008 encoding on mips32r5 by default.

Most mips toolchains for musl have been rebuilt, with Buildroot fixes for the creation of the dynamic linker symbolic link. This has no effect on the toolchain itself, but also the tests under Qemu to work properly and validate the toolchains.

Other improvements

Each architecture now has a page that lists all toolchain versions available. This allows to easily find a toolchain that matches your requirements (in terms of gcc version, kernel headers version, etc.). See All aarch64 toolchains for an example.

Amplifier S/N 005 was assembled and tested, housing PDA2017.07 and FDA2017.07 boards.Initial phase-noise tests show good performance, similar to the previous generation of the board.The new PDA2017.07 design using IDT5PB1108 has very fast rise-time (to be measured), possibly a quite low output-impedance (to be fixed?) and a preliminary channel-to-channel output skew of max 250 ps.

At the end of 2016, the MIPI consortium has finalized the first version of its I3C specification, a new communication bus that aims at replacing older busses like I2C or SPI. According to the specification, I3C gets closer to SPI data rate while requiring less pins and adding interesting mechanisms like in-band interrupts, hotplug capability or automatic discovery of devices connected on the bus. In addition, I3C provides backward compatibility with I2C: I3C and legacy I2C devices can be connected on a common bus controlled by an I3C master.

For more details about I3C, we suggest reading the MIPI I3C Whitepaper, as unfortunately MIPI has not publicly released the specifications for this protocol.

For the last few months, Free Electrons engineer Boris Brezillon has been working with Cadence to develop a Linux kernel subsystem to support this new bus, as well as Cadence’s I3C master controller IP. We have now posted the first version of our patch series to the Linux kernel mailing list for review, and we already received a large number of very useful comments from the kernel community.

Free Electrons is proud to be pioneering the support for this new bus in the Linux kernel, and hopes to see other developers contribute to this subsystem in the near future!

The ware for July 2017 is a PMT (photomultiplier tube) module. I’d say wrm gets the prize this month, for getting that it’s a PMT driver first, and for linking to a schematic. :) That’s an easy way to win me over. Gratz, email me to claim your prize!

August 08, 2017

Preface

Cellular systems ever since GPRS are using a tunnel based architecture to provide IP
connectivity to cellular terminals such as phones, modems, M2M/IoT devices and the
like. The MS/UE establishes a PDP context between itself and the GGSN on the other
end of the cellular network. The GGSN then is the first IP-level router, and the
entire cellular network is abstracted away from the User-IP point of view.

This architecture didn't change with EGPRS, and not with UMTS, HSxPA and even
survived conceptually in LTE/4G.

While the concept of a PDP context / tunnel exists to de-couple the
transport layer from the structure and type of data inside the tunneled
data, the primary user plane so far has been IPv4.

In Osmocom, we made sure that there are no impairments / assumptions
about the contents of the tunnel, so OsmoPCU and OsmoSGSN do not care at
all what bits and bytes are transmitted in the tunnel.

The only Osmocom component dealing with the type of tunnel and its
payload structure is OpenGGSN.
The GGSN must allocate the address/prefix assigned to each individual
MS/UE, perform routing between the external IP network and the cellular
network and hence is at the heart of this. Sadly, OpenGGSN was an
abandoned project for many years until Osmocom adopted it, and it only
implemented IPv4.

This is actually a big surprise to me. Many of the users of the Osmocom
stack are from the IT security area. They use the Osmocom stack to
test mobile phones for vulnerabilities, analyze mobile malware and the
like. As any penetration tester should be interested in analyzing all
of the attack surface exposed by a given device-under-test, I would have
assumed that testing just on IPv4 would be insufficient and over the
past 9 years, somebody should have come around and implemented the
missing bits for IPv6 so they can test on IPv6, too.

In reality, it seems nobody appears to have shared line of thinking and
invested a bit of time in growing the tools used. Or if they did, they
didn't share the related code.

In June 2017, Gerrie Roos submitted a patch for OpenGGSN IPv6 support that raised hopes about soon
being able to close that gap. However, at closer sight it turns out
that the code was written against a more than 7 years old version of
OpenGGSN, and it seems to primarily focus on IPv6 on the outer
(transport) layer, rather than on the inner (user) layer.

OpenGGSN IPv6 PDP Context Support

So in July 2017, I started to work on IPv6 PDP support in OpenGGSN.

Initially I thought How hard can it be? It's not like IPv6 is new to
me (I joined 6bone under 3ffe prefixes back in the 1990ies and worked on
IPv6 support in ip6tables ages ago. And aside
from allocating/matching longer addresses, what kind of complexity does
one expect?

After my initial attempt of implementation, partially mislead by the
patch that was contributed against that 2010-or-older version of
OpenGGSN, I'm surprised how wrong I was.

In IPv4 PDP contexts, the process of establishing a PDP context is
simple:

So I implemented the identical approach for IPv6. Maintain a pool of
IPv6 addresses, allocate one, and use IPCP for DNS. And nothing worked.

IPv6 PDP contexts assign a /64 prefix, not a single address or a
smaller prefix

The End User Address that's part of the Signalling plane of Layer 3
Session Management and GTP is not the actual address, but just serves
to generate the interface identifier portion of a link-local IPv6
address

IPv6 stateless autoconfiguration is used with this link-local IPv6
address inside the User Plane, after the control plane signaling to
establish the PDP context has completed. This means the GGSN needs to
parse ICMPv6 router solicitations and generate ICMPV6 router
advertisements.

To make things worse, the stateless autoconfiguration is modified in
some subtle ways to make it different from the normal SLAAC used on
Ethernet and other media:

the timers / lifetimes are different

only one prefix is permitted

only a prefix length of 64 is permitted

A few days later I implemented all of that, but it still didn't work.
The problem was with DNS server adresses. In IPv4, the 3GPP protocols
simply tunnel IPCP frames for this. This makes a lot of sense, as IPCP
is designed for point-to-point interfaces, and this is exactly what a
PDP context is.

In IPv6, the corresponding IP6CP protocol does not have the capability
to provision DNS server addresses to a PPP client. WTF? The IETF
seriously requires implementations to do DHCPv6 over PPP, after
establishing a point-to-point connection, only to get DNS server
information?!? Some people suggested an IETF draft to change this
butthe draft has expired in 2011 and we're still stuck.

While 3GPP permits the use of DHCPv6 in some scenarios, support in
phones/modems for it is not mandatory. Rather, the 3GPP has come up
with their own mechanism on how to communicate DNS server IPv6
addresses during PDP context activation: The use of containers as part
of the PCO Information Element used in L3-SM and GTP (see Section
10.5.6.3 of 3GPP TS 24.008. They by the
way also specified the same mechanism for IPv4, so there's now two
competing methods on how to provision IPv4 DNS server information: IPCP
and the new method.

In any case, after some more hacking, OpenGGSN can now also provide
DNS server information to the MS/UE. And once that was implemented,
I had actual live uesr IPv6 data over a full Osmocom cellular stack!

Summary

We now have working IPv6 User IP in OpenGGSN. Together with the rest
of the Osmocom stack you can operate a private GPRS, EGPRS, UMTS or
HSPA network that provide end-to-end transparent, routed IPv6
connectivity to mobile devices.

All in all, it took much longer than nneeded, and the following
questions remain in my mind:

why did the IETF not specify IP6CP capabilities to configure DNS
servers?

why the complex two-stage address configuration with PDP EUA
allocation for the link-local address first and then stateless
autoconfiguration?

why don't we simply allocate the entire prefix via the End User
Address information element on the signaling plane? For sure next
to the 16byte address we could have put one byte for prefix-length?

why do I see duplication detection flavour neighbour solicitations
from Qualcomm based phones on what is a point-to-point link with
exactly two devices: The UE and the GGSN?

why do I see link-layer source address options inside the ICMPv6
neighbor and router solicitation from mobile phones, when that option
is specifically not to be used on point-to-point links?

why is the smallest prefix that can be allocated a /64? That's such a
waste for a point-to-point link with a single device on the other end,
and in times of billions of connected IoT devices it will just
encourage the use of non-public IPv6 space (i.e. SNAT/MASQUERADING)
while wasting large parts of the address space

Some of those choices would have made sense if one would have made it
fully compatible with normal IPv6 like e.g. on Ethernet. But
implementing ICMPv6 router and neighbor solicitation without getting
any benefit such as ability to have multiple prefixes, prefixes of
different lengths, I just don't understand why anyone ever thought
You can find the code at http://git.osmocom.org/openggsn/log/?h=laforge/ipv6
and the related ticket at https://osmocom.org/issues/2418

July 31, 2017

For the past few months we were working with INCAA Computers BV on a new WR Switch Production Test Suite. Thissystem allows to verify during the production or after delivery that all the components of the WR Switch hardware work properly.Please check the WRS PTS wiki page for all the sources and documentation.

The Ware for June 2017 is an ultrasonic delay line. Picked this beauty up while wandering the junk shops of Akihabara. There’s something elegant about the Old Ways that’s simply irresistible to me…back when the answer to all hard problems was not simply “transform it into the software domain and then compute the snot out of it”.

July 19, 2017

The video workshop Alex and I gave was one of the best I have delivered, all 15 attendees got to take home a working CHA/V module they built in the class, it's a hacked VGA signal generator that basically allows you to build a simple video synth by adding some home brew or off the shelf oscillators. We had a great mix of attendees and they were all from really interesting backgrounds and super engaged. Alex as usual did a nicely paced video synthesis tutorial and I then lead the theory and building part of the class. We rounded up with Alex leading a discussion around historical video synthesis work and then proceeded to enjoy the evening concerts that were also part of the fantastic Brighton modular meet. (Pics 3+9 here are from Fabrizio D'Amico who runs Video Hack Space) Thanks to Andrew for organising the amazing meet which hosted the workshop, Matt for making our panels last minute, George for helping us out on the day and Steve from Thonk for supplying some components for the kits.

July 18, 2017

During the last couple of days, I've been working on completing, cleaning up and merging a Virtual Um interface
(i.e. virtual radio layer) between OsmoBTS and OsmocomBB. After I started with the implementation and left it in an early
stage in January 2016, Sebastian Stumpf has been completing it around early 2017, with now some subsequent
fixes and improvements by me. The combined result allows us to run a complete GSM network with 1-N BTSs and
1-M MSs without any actual radio hardware, which is of course excellent for all kinds of testing scenarios.

The Virtual Um layer is based on sending L2 frames (blocks) encapsulated via GSMTAP UDP multicast packets.
There are two separate multicast groups, one for uplink and one for downlink. The multicast nature simulates
the shared medium and enables any simulated phone to receive the signal from multiple BTSs via the downlink
multicast group.

In OsmoBTS, this is implemented via the new osmo-bts-virtual BTS model.

In OsmocomBB, this is realized by adding virtphy virtual L1, which speaks the same
L1CTL protocol that is used between the
real OsmcoomBB Layer1 and the Layer2/3 programs such as mobile and the like.

Now many people would argue that GSM without the radio and actual handsets is no fun. I tend to agree, as I'm
a hardware person at heart and I am not a big fan of simulation.

Nevertheless, this forms the basis of all kinds of possibilities for automatized (regression) testing in a way
and for layers/interfaces that osmo-gsm-tester cannot cover as it uses a black-box proprietary mobile phone
(modem). It is also pretty useful if you're traveling a lot and don't want to carry around a BTS and phones
all the time, or get some development done in airplanes or other places where operating a radio transmitter is
not really a (viable) option.

If you're curious and want to give it a shot, I've put together some setup instructions at the
Virtual Um page of the Osmocom Wiki.

July 15, 2017

I’ve often said that there are no secrets in hardware — you just need a bigger, better microscope.

I think I’ve found the limit to that statement. To give you an idea, here’s the “lightbulb” that powers the microscope:

It’s the size of a building, and it’s the Swiss Light Source. Actually, not all of that building is dedicated to this microscope, just one beamline of an X-ray synchrotron capable of producing photons at an energy of 6.5keV (X-rays) at a flux of close to a billion coherent photons per second — but still, it’s a big light bulb. It might be a while before you see one of these popping up in a hacker’s garage…err, hangar…somewhere.

The result? One can image, in 3-D and “non-destructively” (e.g., without having to delayer or etch away dielectrics), chips down to a resolution of 14.6nm.

That’s a pretty neat trick if you’re trying to reverse engineer modern silicon.

You can read the full article at Nature (“High Resolution non-destructive three-dimensional imaging of integrated circuits” by Mirko Holler et al). I’m a paying subscriber to Nature so I’m supposed to have access to the article, but at the moment, their paywall is throwing a null pointer exception. Once the paywall is fixed you can buy a copy of the article to read, but in the meantime, SciHub seems more reliable.

July 11, 2017

Recently we had an inquiry whether our cameras are capable of streaming low latency video. The short answer is yes, the camera’s average output latency for 1080p at 30 fps is ~16 ms. It is possible to reduce it to almost 0.5 ms with a few changes to the driver.

However the total latency of the system, from capturing to displaying, includes delays caused by network, pc, software and display.

In the results of the experiment (similar to this one) these delays contribute the most (around 40-50 ms) to the stream latency – at least, for the given equipment.

Goal

Measure the total latency of a live stream over network from 10393 camera.

tEXP < 1 ms – typical exposure time for outdoors. A display is bright enough to set 1.7 ms with the gains maxed.

Compressor

The compressor is implemented in fpga and works 3x times faster but needs a stripe of 20 rows in memory. Thus, the compressor will finish ~20/3*tROW after the whole image is read out.

tCMP = 20/3*tROW

Summary

tCAM = tERS + tEXP + tCMP

Since the image is read and compressed by fpga logic of the Zynq and this pipeline has been simulated we can be sure in these numbers.

Table 4: Average output latency + exposure

Resolution

tCAM, ms

720p

9.9

1080p

17.9

Stopwatch accuracy

Not accurate. For simplicity, we will rely on the camera’s internal clock that time stamps every image, and take the javascript timer readings as unique labels, thus not caring what time they are showing.

Results

Fig.2 1080p 30fps

Fig.3 720p 60fps

GStreamer has shown the best results among the tested programs.
Since the camera fps is discrete the result is a multiple of 1/fps (see this article):

30 fps => 33.3 ms

60 fps => 16.7 ms

Resolution/fps

Total Latency, ms

Network+PC+SW latency, ms

720p@60fps

33.3-50

23.4-40.1

1080p@30fps

33.3-66.7

15.4-48.8

Possible improvements

Camera

Currently, the driver waits for the interrupt from the compressor that indicates the image is fully compressed and ready for transfer. Meanwhile one does not have to wait for the whole image but start the transfer when the minimum of the compressed is data ready.

There are 3 more interrupts related to the image pipeline events. One of them is “compression started” – switching to it can reduce the output latency to (10+20/3)*tROW or 0.4 ms for 720p and 0.5 ms for 1080p.

VLC

Chrome/Firefox

July 10, 2017

Linus Torvalds has released the 4.12 Linux kernel a week ago, in what is the second biggest kernel release ever by number of commits. As usual, LWN had a very nice coverage of the major new features and improvements: first part, second part and third part.

LWN has also published statistics about the Linux 4.12 development cycles, showing:

Free Electrons as the #14 contributing company by number of commits, with 221 commits, between Broadcom (230 commits) and NXP (212 commits)

Free Electrons as the #14 contributing company number of changed lines, with 16636 lines changed, just two lines less than Mellanox

Free Electrons engineer and MTD NAND maintainer Boris Brezillon as the #17 most active contributor by number of lines changed.

Our most important contributions to this kernel release have been:

On Atmel AT91 and SAMA5 platforms:

Alexandre Belloni has continued to upstream the support for the SAMA5D2 backup mode, which is a very deep suspend to RAM state, offering very nice power savings. Alexandre touched the core code in arch/arm/mach-at91 as well as pinctrl and irqchip drivers

Boris Brezillon has converted the Atmel PWM driver to the atomic API of the PWM subsystem, implemented suspend/resume and did a number of fixes in the Atmel display controller driver, and also removed the no longer used AT91 Parallel ATA driver.

Quentin Schulz improved the suspend/resume hooks in the atmel-spi driver to support the SAMA5D2 backup mode.

On Allwinner platforms:

Mylène Josserand has made a number of improvements to the sun8i-codec audio driver that she contributed a few releases ago.

Maxime Ripard added devfreq support to dynamically change the frequency of the GPU on the Allwinner A33 SoC.

Quentin Schulz added battery charging and ADC support to the X-Powers AXP20x and AXP22x PMICs, found on Allwinner platforms.

Quentin Schulz added a new IIO driver to support the ADCs found on numerous Allwinner SoCs.

Quentin Schulz added support for the Allwinner A33 built-in thermal sensor, and used it to implement thermal throttling on this platform.

On Marvell platforms:

Antoine Ténart contributed Device Tree changes to describe the cryptographic engines found in the Marvell Armada 7K and 8K SoCs. For now only the Device Tree description has been merged, the driver itself will arrive in Linux 4.13.

Grégory Clement has improved the Device Tree description of the Marvell Armada 3720 and Marvell Armada 7K/8K SoCs and corresponding evaluation boards: SDHCI and RTC are now enabled on Armada 7K/8K, USB2, USB3 and RTC are now enabled on Armada 3720.

Thomas Petazzoni made a significant number of changes to the mvpp2 network driver, finally adding support for the PPv2.2 version of this Ethernet controller. This allowed to enable network support on the Marvell Armada 7K/8K SoCs.

Thomas Petazzoni contributed a number of fixes to the mv_xor_v2dmaengine driver, used for the XOR engines on the Marvell Armada 7K/8K SoCs.

Thomas Petazzoni cleaned-up the MSI support in the Marvell pci-mvebu and pcie-aardvark PCI host controller drivers, which allowed to remove a no-longer used MSI kernel API.

On the ST SPEAr600 platform:

Thomas Petazzoni added support for the ADC available on this platform, by adding its Device Tree description and fixing a clock driver bug

Thomas did a number of small improvements to the Device Tree description of the SoC and its evaluation board

Thomas cleaned up the fsmc_nand driver, which is used for the NAND controller driver on this platform, removing lots of unused code

In the MTD NAND subsystem:

Boris Brezillon implemented a mechanism to allow vendor-specific initialization and detection steps to be added, on a per-NAND chip basis. As part of this effort, he has split into multiple files the vendor-specific initialization sequences for Macronix, AMD/Spansion, Micron, Toshiba, Hynix and Samsung NANDs. This work will allow in the future to more easily exploit the vendor-specific features of different NAND chips.

It was exciting times, and there was a lot of pioneering spirit:
Building a Linux based smartphone with a 100% FOSS software stack on the
application processor, including all drivers, userland, applications -
at a time before Android was known or announced. As history shows, we'd
been working in parallel with Apple on the iPhone, and Google on
Android. Of course there's little chance that a small taiwanese company
can compete with the endless resources of the big industry giants, and
the many Neo1973 delays meant we had missed the window of opportunity to
be the first on the market.

It's sad that Openmoko (or similar projects) have not survived even as a
special-interest project for FOSS enthusiasts. Today, virtually all
options of smartphones are encumbered with way more proprietary blobs
than we could ever imagine back then.

In any case, the tenth anniversary of trying to change the amount of
Free Softwware in the smartphone world is worth some celebration. I'm
reaching out to old friends and colleagues, and I guess we'll have
somewhat of a celebration party both in Germany and in Taiwan (where
I'll be for my holidays from mid-September to mid-October).

July 03, 2017

The article Open Doors for Universal Embedded Design in Embedded Systems Engineering, written by Caroline Hayes, Senior Editor, reads:

Charged with finding cost-effective integration for multicore platforms, the European Union’s (EU) Artemis EMC2 project finished at the end of May this year. A further initiative with CERN could mean the spirit of co-operation and the principles of open hardware herald an era of innovation.

and

This collaboration is a new initiative. The PC/104 Consortium will provide design-in examples of new and mature boards, with a reference design, for others to use and create something new. Although the Sundance board is the only [PC/104] product on the CERN Open Hardware Repository, there will be more news in the summer, promises Christensen. “My goal is to get five designs within the first year,” he says, and he is actively working to promote to PC/104 Consortium members that there is a place where they can download—and upload—reference designs which are PC/104-compatible.

The Ware for May 2017 is the “Lorentz and Hertz” carriage board from an HP Officejet Pro 8500. Congrats to MegabytePhreak for nailing both the make and model of the printer it came from! email me for your prize.

For all embedded Linux developers, cross-compilation toolchains are part of the basic tool set, as they allow to build code for a specific CPU architecture and debug it. Until a few years ago, CodeSourcery was providing a lot of high quality pre-compiled toolchains for a wide range of architectures, but has progressively stopped doing so. Linaro provides some freely available toolchains, but only targetting ARM and AArch64. kernel.org has a set of pre-built toolchains for a wider range of architectures, but they are bare metal toolchains (cannot build Linux userspace programs) and updated infrequently.

This web site provides a large number of cross-compilation toolchains, available for a wide range of architectures, in multiple variants. The toolchains are based on the classical combination of gcc, binutils and gdb, plus a C library. We currently provide a total of 138 toolchains, covering many combinations of:

Versions: for each combination, we provide a stable version which uses slightly older but more proven versions of gcc, binutils and gdb, and we provide a bleeding edge version with the latest version of gcc, binutils and gdb.

After being generated, most of the toolchains are tested by building a Linux kernel and a Linux userspace, and booting it under Qemu, which allows to verify that the toolchain is minimally working. We plan on adding more tests to validate the toolchains, and welcome your feedback on this topic. Of course, not all toolchains are tested this way, because some CPU architectures are not emulated by Qemu.

The toolchains are built with Buildroot, but can be used for any purpose: build a Linux kernel or bootloader, as a pre-built toolchain for your favorite embedded Linux build system, etc. The toolchains are available in tarballs, together with licensing information and instructions on how to rebuild the toolchain if needed.

We are very much interested in your feedback about those toolchains, so do not hesitate to report bugs or make suggestions in our issue tracker!

This work was done as part of the internship of Florent Jacquet at Free Electrons.

June 15, 2017

Keep in mind: Osmocom is a much larger umbrella project, and beyond
the Networks-side cellular stack is home many different community-based
projects around open source mobile communications. All of those have
started more or less as just for fun projects, nothing serious,
just a hobby[1]

The projects implementing the network-side protocol stacks and network
elements of GSM/GPRS/EGPRS/UMTS cellular networks are somewhat the
exception to that, as they have evolved to some extent professionalized.
We call those projects collectively the Cellular Infrastructure
projects inside Osmocom. This post is about that part of Osmocom only

History

From late 2008 through 2009, People like Holger and I were working on
bs11-abis and later OpenBSC only in our spare time. The name Osmocom
didn't even exist back then. There was a strong technical community with
contributions from Sylvain Munaut, Andreas Eversberg, Daniel Willmann,
Jan Luebbe and a few others. None of this would have been possible if
it wasn't for all the help we got from Dieter Spaar with the BS-11 [2].
We all had our dayjob in other places, and OpenBSC work was really
just a hobby. People were working on it, because it was where no
FOSS hacker has gone before. It was cool. It was a big and pleasant
challenge to enter the closed telecom space as pure autodidacts.

Holger and I were doing freelance contract development work on Open
Source projects for many years before. I was mostly doing Linux related
contracting, while Holger has been active in all kinds of areas
throughout the FOSS software stack.

In 2010, Holger and I saw some first interest by companies into OpenBSC,
including Netzing AG and On-Waves ehf. So we were able to spend at
least some of our paid time on OpenBSC/Osmocom related contract work,
and were thus able to do less other work. We also continued to spend
tons of spare time in bringing Osmocom forward. Also, the amount of
contract work we did was only a fraction of the many more hours of spare
time.

In 2011, Holger and I decided to start the company sysmocom in order to generate more funding for the
Osmocom GSM projects by means of financing software development by
product sales. So rather than doing freelance work for companies who
bought their BTS hardware from other places (and spent huge amounts of
cash on that), we decided that we wanted to be a full solution
supplier, who can offer a complete product based on all hardware and
software required to run small GSM networks.

The only problem is: We still needed an actual BTS for that. Through
some reverse engineering of existing products we figured out who one of
the ODM suppliers for the hardware + PHY layer was, and decided to
develop the OsmoBTS software to
do so. We inherited some of the early code from work done by Andreas
Eversberg on the jolly/bts branch of OsmocomBB (thanks), but much was
missing at the time.

What follows was Holger and me working several years for free [3], without
any salary, in order to complete the OsmoBTS software, build an embedded
Linux distribution around it based on OE/poky, write documentation, etc.
and complete the first sysmocom product: The
sysmoBTS 1002

We did that not because we want to get rich, or because we want to run a
business. We did it simply because we saw an opportunity to generate
funding for the Osmocom projects and make them more sustainable and
successful. And because we believe there is a big, gaping, huge vacuum
in terms of absence of FOSS in the cellular telecom sphere.

Funding by means of sysmocom product sales

Once we started to sell the sysmoBTS products, we were able to fund
Osmocom related development from the profits made on hardware /
full-system product sales. Every single unit sold made a big
contribution towards funding both the maintenance as well as the ongoing
development on new features.

This source of funding continues to be an important factor today.

Funding by means of R&D contracts

The probably best and most welcome method of funding Osmocom related
work is by means of R&D projects in which a customer funds our work to
extend the Osmocom GSM stack in one particular area where he has a
particular need that the existing code cannot fulfill yet.

This kind of project is the ideal match, as it shows where the true
strength of FOSS is: Each of those customers did not have to fund the
development of a GSM stack from scratch. Rather, they only had to fund
those bits that were missing for their particular application.

Our reference for this is and has been On-Waves, who have been funding
development of their required features (and bug fixing etc.) since 2010.

We've of course had many other projects from a variety of customers over
over the years. Last, but not least, we had a customer who willingly
co-funded (together with funds from NLnet foundation and lots of unpaid
effort by sysmocom) the 3G/3.5G support in the Osmocom stack.

The problem here is:

we have not been able to secure anywhere nearly as many of those R&D
projects within the cellular industry, despite believing we have a
very good foundation upon which we can built. I've been writing many
exciting technical project proposals

you almost exclusively get funding only for new features. But it's
very hard to get funding for the core maintenance work. The
bug-fixing, code review, code refactoring, testing, etc.

So as a result, the profit margin you have on selling R&D projects is
basically used to (do a bad job of) fund those bits and pieces that
nobody wants to pay for.

Funding by means of customer support

There is a way to generate funding for development by providing support
services. We've had some success with this, but primarily alongside the
actual hardware/system sales - not so much in terms of pure
software-only support.

Also, providing support services from a R&D company means:

either you distract your developers by handling support inquiries.
This means they will have less time to work on actual code, and likely
get side tracked by too many issues that make it hard to focus

or you have to hire separate support staff. This of course means that
the size of the support business has to be sufficiently large to not
only cover the cots of hiring + training support staff, but also still
generate funding for the actual software R&D.

We've tried shortly with the second option, but fallen back to the first
for now. There's simply not sufficient user/admin type support business
to rectify dedicated staff for that.

Funding by means of cross-subsizing from other business areas

sysmocom also started to do some non-Osmocom projects in order to
generate revenue that we can feed again into Osmocom projects. I'm not
at liberty to discuss them in detail, but basically we've been doing
pretty much anything from

custom embedded Linux board designs

M2M devices with GSM modems

consulting gigs

public tendered research projects

Profits from all those areas went again into Osmocom development.

Last, but not least, we also operate the sysmocom webshop. The profit we make on those products
also is again immediately re-invested into Osmocom development.

Funding by grants

We've had some success in securing funding from NLnet Foundation for specific features. While this is useful, the
size of their projects grants of up to EUR 30k is not a good fit for the
scale of the tasks we have at hand inside Osmocom. You may think that's
a considerable amount of money? Well, that translates to 2-3 man-months
of work at a bare cost-covering rate. At a team size of 6 developers,
you would theoretically have churned through that in two weeks. Also,
their focus is (understandably) on Internet and IT security, and not so
much cellular communications.

There are of course other options for grants, such as government
research grants and the like. However, they require long-term planning,
they require you to match (i.e. pay yourself) a significant portion,
and basically mandate that you hire one extra person for doing all the
required paperwork and reporting. So all in all, not a particularly
attractive option for a very small company consisting of die hard engineers.

Funding by more BTS ports

At sysmocom, we've been doing some ports of the OsmoBTS + OsmoPCU
software to other hardware, and supporting those other BTS vendors with
porting, R&D and support services.

If sysmocom was a classic BTS vendor, we would not help our
"competition". However, we are not. sysmocom exists to help Osmocom,
and we strongly believe in open systems and architectures, without a
single point of failure, a single supplier for any component or any type
of vendor lock-in.

So we happily help third parties to get Osmocom running on their
hardware, either with a proprietary PHY or with OsmoTRX.

However, we expect that those BTS vendors also understand their
responsibility to share the development and maintenance effort of the
stack. Preferably by dedicating some of their own staff to work in
the Osmocom community. Alternatively, sysmocom can perform that work as
paid service. But that's a double-edged sword: We don't want to be a
single point of failure.

Osmocom funding outside of sysmocom

Osmocom is of course more than sysmocom. Even for the cellular
infrastructure projects inside Osmocom is true: They are true,
community-based, open, collaborative development projects. Anyone can
contribute.

Over the years, there have been code contributions by e.g.
Fairwaves. They, too, build GSM base station hardware and use that as a
means to not only recover the R&D on the hardware, but also to
contribute to Osmocom. At some point a few years ago, there was a lot
of work from them in the area of OsmoTRX, OsmoBTS and OsmoPCU.
Unfortunately, in more recent years, they have not been able to keep up
the level of contributions.

There are other companies engaged in activities with and around Osmcoom.
There's Rhizomatica, an NGO helping
indigenous communities to run their own cellular networks. They have
been funding some of our efforts, but being an NGO helping rural regions
in developing countries, they of course also don't have the deep
pockets. Ideally, we'd want to be the ones contributing to them, not
the other way around.

State of funding

We're making some progress in securing funding from players we cannot
name [4] during recent years. We're also making occasional progress in
convincing BTS suppliers to chip in their share. Unfortunately there
are more who don't live up to their responsibility than those who do.
I might start calling them out by name one day. The wider community and
the public actually deserves to know who plays by FOSS rules and who
doesn't. That's not shaming, it's just stating bare facts.

Which brings us to:

sysmocom is in an office that's actually too small for the team,
equipment and stock. But we certainly cannot afford more space.

we cannot pay our employees what they could earn working at similar
positions in other companies. So working at sysmocom requires
dedication to the cause :)

Holger and I have invested way more time than we have ever paid us,
even more so considering the opportunity cost of what we would have
earned if we'd continued our freelance Open Source hacker path

we're [just barely] managing to pay for 6 developers dedicated to
Osmocom development on our payroll based on the various funding
sources indicated above

Nevertheless, I doubt that any such a small team has ever implemented an
end-to-end GSM/GPRS/EGPRS network from RAN to Core at
comparative feature set. My deepest respects to everyone involved. The
big task now is to make it sustainable.

Summary

So as you can see, there's quite a bit of funding around. However, it
always falls short of what's needed to implement all parts properly, and
even not quite sufficient to keep maintaining the status quo in a proper
and tested way. That can often be frustrating (mostly to us but
sometimes also to users who run into regressions and oter bugs).
There's so much more potential. So many things we wanted to add or
clean up for a long time, but too little people interested in joining
in, helping out - financially or by writing code.

On thing that is often a challenge when dealing with traditional
customers: We are not developing a product and then selling a ready-made
product. In fact, in FOSS this would be more or less suicidal: We'd
have to invest man-years upfront, but then once it is finished, everyone
can use it without having to partake in that investment.

So instead, the FOSS model requires the customers/users to chip in
early during the R&D phase, in order to then subsequently harvest the
fruits of that.

I think the lack of a FOSS mindset across the cellular / telecom
industry is the biggest constraining factor here. I've seen that some
20-15 years ago in the Linux world. Trust me, it takes a lot of
dedication to the cause to endure this lack of comprehension so many
years later.

sysmocom is 100% privately held by Holger and me, we intentionally have no external investors and are proud to never had to take a bank loan. So all we could invest was our own money and, most of all, time.

"I have yet to understand why we would open source something we think is
really good software"

This completely misses the point. FOSS is not about making a charity
donation of a finished product to the planet.

FOSS is about sharing the development costs among multiple players, and
avoiding that everyone has to reimplement the wheel.
Macro-Economically, it is complete and utter nonsense that each 3GPP
specification gets implemented two dozens of times, by at least a dozen
of different entities. As a result, products are way more expensive
than needed.

If large Telco players (whether operators or equipment manufacturers)
were to collaboratively develop code just as much as they
collaboratively develop the protocol specifications, there would be no
need for replicating all of this work.

As a result, everyone could produce cellular network elements at reduced
cost, sharing the R&D expenses, and competing in key areas, such as who
can come up with the most energy-efficient implementation, or can
produce the most reliable hardware, the best receiver sensitivity, the
best and most fair scheduling implementation, or whatever else. But
some 80% of the code could probably be shared, as e.g. encoding and
decoding messages according to a given publicly released 3GPP
specification document is not where those equipment suppliers actually
compete.

So my dear cellular operator executives: Next time you're cursing about
the prohibitively expensive pricing that your equipment suppliers quote
you: You only have to pay that much because everyone is reimplementing
the wheel over and over again.

Equally, my dear cellular infrastructure suppliers: You are all dying
one by one, as it's hard to develop everything from scratch. Over the
years, many of you have died. One wonders, if we might still have more
players left, if some of you had started to cooperate in developing FOSS
at least in those areas where you're not competing. You could replicate
what Linux is doing in the operating system market. There's no need in
having a phalanx of different proprietary flavors of Unix-like OSs. It's
way too expansive, and it's not an area in which most companies need to
or want to compete anyway.

Management Summary

You don't first develop and entire product until it is finished and
then release it as open source. This makes little economic sense in
a lot of cases, as you've already invested into developing 100% of it.
Instead, you actually develop a new product collaboratively as FOSS in
order to not have to invest 100% but maybe only 30% or even less. You
get a multitude of your R&D investment back, because you're not only
getting your own code, but all the other code that other community
members implemented. You of course also get other benefits, such as
peer review of the code, more ideas (not all bright people work inside
one given company), etc.

that article is actually a heavily opinionated post by somebody
who appears to be pushing his own anti-FOSS agenda for some time. The
author is misinformed about the fact that the TIP has always included
projects under both FRAND and FOSS terms. As a TIP member I can
attest to that fact. I'm only referencing it here for the purpose of
that that Ericsson quote.

June 13, 2017

What is Elixir?

Since 2006, we have provided a Linux source code cross-referencing online tool as a service to the community. The engine behind this website was LXR, a Perl project almost as old as the kernel itself. For the first few years, we used the then-current 0.9.5 version of LXR, but in early 2009 and for various reasons, we reverted to the older 0.3.1 version (from 1999!). In a nutshell, it was simpler and it scaled better.

Recently, we had the opportunity to spend some time on it, to correct a few bugs and to improve the service. After studying the Perl source code and trying out various cross-referencing engines (among which LXR 2.2 and OpenGrok), we decided to implement our own source code cross-referencing engine in Python.

Why create a new engine?

Our goal was to extend our existing service (support for multiple projects, responsive design, etc.) while keeping it simple and fast. When we tried other cross-referencing engines, we were dissatisfied with their relatively low performance on a large codebase such as Linux. Although we probably could have tweaked the underlying database engine for better performance, we decided it would be simpler to stick to the strategy used in LXR 0.3: get away from the relational database engine and keep plain lists in simple key-value stores.

Another reason that motivated a complete rewrite was that we wanted to provide an up-to-date reference (including the latest revisions) while keeping it immutable, so that external links to the source code wouldn’t get broken in the future. As a direct consequence, we would need to index many different revisions for each project, with potentially a lot of redundant information between them. That’s when we realized we could leverage the data model of Git to deal with this redundancy in an efficient manner, by indexing Git blobs, which are shared between revisions. In order to make sure queries under this strategy would be fast enough, we wrote a proof-of-concept in Python, and thus Elixir was born.

What service does it provide?

First, we tried to minimize disruption to our users by keeping the user interface close to that of our old cross-referencing service. The main improvements are:

We now support multiple projects. For now, we provide reference for Linux, Busybox and U-Boot.

Every tag in each project’s git repository is now automatically indexed.

The design has been modernized and now fits comfortably on smaller screens like tablets.

The URL scheme has been simplified and extended with support for multiple projects. An HTTP redirector has been set up for backward compatibility.

Elixir supports multiple projects

Among other smaller improvements, it is now possible to copy and paste code directly without line numbers getting in the way.

How does it work?

Elixir is made of two Python scripts: “update” and “query”. The first looks for new tags and new blobs inside a Git repository, parses them and appends the new references to identifiers to a record inside the database. The second uses the database and the Git repository to display annotated source code and identifier references.

The parsing itself is done with Ctags, which provides us with identifier definitions. In order to find the references to these identifiers, Elixir then simply checks each lexical token in the source file against the definition database, and if that word is defined, a new reference is added.

Like in LXR 0.3, the database structure is kept very simple so that queries don’t have much work to do at runtime, thus speeding them up. In particular, we store references to a particular identifier as a simple list, which can be loaded and parsed very fast. The main difference with LXR is that our list includes references from every blob in the project, so we need to restrict it first to only the blobs that are part of the current version. This is done at runtime, simply by computing the intersection of this list with the list of blobs inside the current version.

Finally, we kept the user interface code clearly segregated from the engine itself by making these two modules communicate through a Unix command-line interface. This means that you can run queries directly on the command-line without going through the web interface.

Elixir code example

What’s next?

Our current focus is on improving multi-project support. In particular, each project has its own quirky way of using Git tags, which needs to be handled individually.

At the user-interface level, we are evaluating the possibility of having auto-completion and/or fuzzy search of identifier names. Also, we are looking for a way to provide direct line-level access to references even in the case of very common identifiers.

On the performance front, we would like to cut the indexation time by switching to a new database back-end that provides efficient appending to large records. Also, we could make source code queries faster by precomputing the references, which would also allow us to eliminate identifier “bleeding” between versions (the case where an identifier shows up as “defined in 0 files” because it is only defined in another version).

If you think of other ways we could improve our service, don’t hesitate to drop us a feature request or a patch!

Bonus: why call it “Elixir”?

In the spur of the moment, it seemed like a nice pun on the name “LXR”. But in retrospect, we wish to apologize to the Elixir language team and the community at large for unnecessary namespace pollution.

June 09, 2017

Since April 2016, we have our own automated testing infrastructure to validate the Linux kernel on a large number of hardware platforms. We use this infrastructure to contribute to the KernelCI project, which tests every day the Linux kernel. However, the tests being done by KernelCI are really basic: it’s mostly booting a basic Linux system and checking that it reaches a shell prompt.

However, LAVA, the software component at the core of this testing infrastructure, can do a lot more than just basic tests.

The need for custom tests

With some of our engineers being Linux maintainers and given all the platforms we need to maintain for our customers, being able to automatically test specific features beyond a simple boot test was a very interesting goal.

In addition, manually testing a kernel change on a large number of hardware platforms can be really tedious. Being able to quickly send test jobs that will use an image you built on your machine can be a great advantage when you have some new code in development that affects more than one board.

We identified two main use cases for custom tests:

Automatic tests to detect regression, as does KernelCI, but with more advanced tests, including platform specific tests.

Manual tests executed by engineers to validate that the changes they are developing do not break existing features, on all platforms.

An appropriate root filesystem, that contains the various userspace programs needed to execute the tests (benchmarking tools, validation tools, etc.)

A test suite, which contains various scripts executing the tests

A custom test tool that glues together the different components

The custom test tool knows all the hardware platforms available and which tests and kernel configurations apply to which hardware platforms. It identifies the appropriate kernel image, Device Tree, root filesystem image and test suite and submits a job to LAVA for execution. LAVA will download the necessary artifacts and run the job on the appropriate device.

Building custom rootfs

When it comes to test specific drivers, dedicated testing, validation or benchmarking tools are sometimes needed. For example, for storage device testing, bonnie++ can be used, while iperf is nice for networking testing. As the default root filesystem used by KernelCI is really minimalist, we need to build our owns, one for each architecture we want to test.

Buildroot is a simple yet efficient tool to generate root filesystems, it is also used by KernelCI to build their minimalist root filesystems. We chose to use it and made custom configuration files to match our needs.

We ended up with custom rootfs built for ARMv4, ARMv5, ARMv7, and ARMv8, that embed for now Bonnie++, iperf, ping (not the Busybox implementation) and other tiny tools that aren’t included in the default Buildroot configuration.

Our Buildroot fork that includes our custom configurations is available as the buildroot-ci Github project (branch ci).

The custom test tool

The custom test tool is the tool that binds the different elements of the overall architecture together.

One of the main features of the tool is to send jobs. Jobs are text files used by LAVA to know what to do with which device. As they are described in LAVA as YAML files (in the version 2 of the API), it is easy to use templates to generate them based on a single model. Some information is quite static such as the device tree name for a given board or the rootfs version to use, but other details change for every job such as the kernel to use or which test to run.

We made a tool able to get the latest kernel images from KernelCI to quickly send jobs without having a to compile a custom kernel image. If the need is to test a custom image that is built locally, the tool is also able to send files to the LAVA server through SSH, to provide a custom kernel image.

The entry point of the tool is ctt.py, which allows to create new jobs, providing a lot of options to define the various aspects of the job (kernel, Device Tree, root filesystem, test, etc.).

The test suite

The test suite is a set of shell scripts that perform tests returning 0 or 1 depending on the result. This test suite is included inside the root filesystem by LAVA as part of a preparation step for each job.

We currently have a small set of tests:

boot test, which simply returns 0. Such a test will be successful as soon as the boot succeeds.

simple network test, that just validates network connectivity using ping

All those tests only require the target hardware platform itself. However, for more elaborate network tests, we needed to get two devices to interact with each other: the target hardware platform and a reference PC platform. For this, we use the LAVA MultiNode API. It allows to have a test that spans multiple devices, which we use to perform multiple iperf sessions to benchmark the bandwidth. This test has therefore one part running on the target device (network-board) and one part running on the reference PC platform (network-laptop).

Our current test suite is available as the test_suite Github project. It is obviously limited to just a few tests for now, we hope to extend the tests in the near future.

First use case: daily tests

As previously stated, it’s important for us to know about regressions introduced in the upstream kernel. Therefore, we have set up a simple daily cron job that:

Sends custom jobs to all boards to validate the latest mainline Linux kernel and latest linux-nextli>

Aggregates results from the past 24 hours and sends emails to subscribed addresses

Updates a dashboard that displays results in a very simple page

A nice dashboard showing the tests of the Beaglebone Black and the Nitrogen6x.

Second use case: manual tests

The custom test tool ctt.py has a simple command line interface. It’s easy for someone to set it up and send custom jobs. For example:

ctt.py -b beaglebone-black -m network

will start the network test on the BeagleBone Black, using the latest mainline Linux kernel built by KernelCI. On the other hand:

will run the mmc test on the Marvell Armada 7040 and Armada 8040 development boards, using the locally built kernel image and Device Tree.

The result of the job is sent over e-mail when the test has completed.

Conclusion

Thanks to this custom test tool, we now have an infrastructure that leverages our existing lab and LAVA instance to execute more advanced tests. Our goal is now to increase the coverage, by adding more tests, and run them on more devices. Of course, we welcome feedback and contributions!

May 30, 2017

This is another one where the level difficulty will depend on if I cropped enough detail out of the photo to make it challenging but not impossible. If you do figure this one out quickly, curious to hear which detail tipped you off!

May 29, 2017

SaFariPark is not a site to book holidays on the African plains - though with additional personal funding I am willing to add that feature. It is a software tool to read and write the digital interface of SFP/SFP+ transceiver modules. Together with a device to plug in multiple (4) SFP/SFP+ modules, creatively called MultiSFP (see Figure 1), it is a versatile tool for all your SFP needs. MultiSFP and SaFariPark have been developed by Nikhef as part of the ASTERICS program, and all is open hardware/open source.

Figure 1 - MultiSFP front panel

MultiSFP supports a 10 Gigabit capable connection to the electrical interface of each SFP. Via one USB port each SFP I2C bus can be exercised using SaFariPark. The software main window (Figure 2) exposes most functionality, which are:

Editing of individual fields in the SFP module

Fixing corrupted SFP EEPROM data, recalculating checksums

Showing and saving SFP+ sensor data such as TX/RX power and temperature.

Selectively copying content of one SFP module to multiple other modules

Laser tuning of optical SFP+ modules

Figure 2 - Main window of SaFariPark

Apart from this SaFariPark allows you to dump the entire EEPROM content, and extend the SFP+ EEPROM data dictionary with custom fields using XML. This enables users to add fields for custom or exotic SFP+ modules. As the software is written Java, it has been verified to work on Linux and Windows. Mac has not been tested yet.

May 28, 2017

Chapter 0: Problem Statement

In an all-IP GSM network, where we use Abis, A and other interfaces
within the cellular network over IP transport, the audio of voice calls
is transported inside RTP frames. The codec payload in those RTP frames
is the actual codec frame of the respective cellular voice codec. In
GSM, there are four relevant codecs: FR, HR, EFR and AMR.

Every so often during the (meanwhile many years of ) development of
Osmocom cellular infrastructure software it would have been useful to be
able to quickly play back the audio for analysis of given issues.

However, until now we didn't have that capability. The reason is
relatively simple: In Osmocom, we genally don't do transcoding but
simply pass the voice codec frames from left to right. They're only
transcoded inside the phones or inside some external media gateway (in
case of larger networks).

Chapter 1: GSM Audio Pocket Knife

Back in 2010, when we were very actively working on OsmocomBB, the
telephone-side GSM protocol stack implementation, Sylvain Munaut wrote
the GSM Audio Pocket Knife (gapk) in order to be able to
convert between different formats (representations) of codec frames. In
cellular communcations, everyoe is coming up with their own
representation for the codec frames: The way they look on E1 as a TRAU
frame is completely different from how RTP payload looks like, or what
the TI Calypso DSP uses internally, or what a GSM Tester like the Racal
61x3 uses. The differences are mostly about data types used,
bit-endinanness as well as padding and headers. And of course those
different formats exist for each of the four codecs :/

Last week, I picked up this subject again and added a long series of
patches to gapk:

support for variable-length codec frames (required for AMR support)

support for AMR codec encode/decode using libopencore-amrnb

support of all known RTP payload formats for all four codecs

support for direct live playback to a sound card via ALSA

All of the above can now be combined to make GAPK bind to a specified
UDP port and play back the RTP codec frames that anyone sends to that
port using a command like this:

$ gapk -I 0.0.0.0/30000 -f rtp-amr -A default -g rawpcm-s16le

I've also merged a chance to OsmoBSC/OsmoNITB which allows the
administrator to re-direct the voice of any active voice channel towards
a user-specified IP address and port. Using that you can simply
disconnect the voice stream from its normal destination and play
back the audio via your sound card.

Chapter 2: Bugs in OsmoBTS GSM-HR

While going through the exercise of implementing the above extension to
gapk, I had lots of trouble to get it to work for GSM-HR.

After some more digging, it seems there are two conflicting
specification on how to format the RTP payload for half-rate GSM:

Those merely worked around those issues in the rtp_proxy of OsmoNITB,
rather than addressing the real issue. That's ok, they were "quick"
hacks to get something working at all during a four-day conference. I'm
now working on "real" fixes in osmo-bts-sysmo. The devil is of course
in the details, when people upgrade one BTS but not the other and want
to inter-operate, ...

It yet remains to be investigated how osmo-bts-trx and other osmo-bts
ports behave in this regard.

Chapter 3: Conclusions

Most definitely it is once again a very clear sign that more testing is
required. It's tricky to see even wih osmo-gsm-tester, as GSM-HR
works between two phones or even two instances of osmo-bts-sysmo, as
both sides of the implementation have the same (wrong) understanding of
the spec.

Given that we can only catch this kind of bug together with the hardware
(the DSP runs the PHY code), pure unit tests wouldn't catch it. And the
end-to-end test is also not very well suited to it. It seems to call
for something in betewen. Something like an A-bis interface level test.

We need more (automatic) testing. I cannot say that often enough. The
big challenge is how to convince contributors and customers that they
should invest their time and money there, rather than
yet-another (not automatically tested) feature?

Credits

First things first: Credits!
The problem with credits is you usually forget somebody and that’s most likely happening here as well.
I read around quite a lot, gathered information and partially don’t even remember anymore where I read what (first).

Of course I’m impressed by the entire Tasmota project and what it enables one to do with the Itead Sonoff and similar devices.

Special thanks go to khcnz who helped me a lot in a discussion documented here.

Introduction Sonoff devices

Quite recently the Itead Sonoff series — a bunch of ESP8266 based IoT homeautomation devices — was brought to my attention.

The ESP8266 is a low-power consumption SoC especially designed for IoT purposes. It’s sold by Espressif, running a 32-Bit processor featuring the Xtensa instruction set (licensed from Tensilica) and having an ASIC IP core and WiFi onboard.

Those Sonoff devices using this SoC basically expect high voltage input, therewith having an AC/DC (5V) converter, the ESP8266 SoC and a relais switching the high voltage output.
They’re sold as wall switches (“Sonoff Touch”), E27 socket adapters (“Slampher”), power sockets (“S20 smart socket”) or as just — that’s most basic cheapest model — all that in a simple case (“Sonoff Basic”).
They also have a bunch of sensoric devices, measuring temperature, power comsumption, humidty, noise levels, fine dust, etc.

Though I’m rather sceptical about the whole IoT (development) philosophy, I always was (and still am) interested into low-cost and power-saving home automation which is completely and exclusively under my control.

That implies I’m obviously not interested in some random IoT devices being necessarily connected to some Google/Amazon/Whatever cloud, even less if sensible data is transmitted without me knowing (but very well suspecting) what it’s used for.

Guess what the Itead Sonoff devices do? Exactly that! They even feature Amazon Alexa and Google Nest support! And of course you have to use their proprietary app to confgure and control your devices — via the Amazon cloud.

However, as said earlier, they’re based on the ESP8266 SoC, around which a great deal of OpenSource projects evolved. For some reason especially the Arduino community pounced on that SoC, enabling a much broader range of people to play around with and program for those devices. Whether that’s a good and/or bad thing is surely debatable.

I’ll spare you the details about all the projects I ran into, there’s plenty of cool stuff out there.

I decided to go for the Sonoff-Tasmota project which is quite actively developed and supports most of the currently available Sonoff devices.

It provides an HTTP and MQTT interface and doesn’t need any connection to the internet at all. As MQTT sever (in MQTT speech called broker) I use mosquitto which I’m running on my OpenWrt WiFi router.

Anyway, as I didn’t want to open and solder every device I intend to use, I took a closer look at the original firmware and its OTA update mechanism.

Protocol analysis

First thing after the device is being configured (meaning, the device got configured by the proprietary app and is therewith now having internet access via your local WiFi network) is to resolve the hostname `eu-disp.coolkit.cc` and attempt to establish a HTTPS connection.

Though the connection is SSL, it doesn’t do any server certificate verification — so splitting the SSL connection and *man-in-the-middle it is fairly easy.

As a side effect I ported the mitm project sslsplit to OpenWrt and created a seperate “interception”-network on my WiFi router. Now I only need to join that WiFi network and all SSL connections get split, its payload logged and being provided on an FTP share. Intercepting SSL connections never felt easier.

Back to the protocol: We’re assuming at this point the Sonoff device was already configured (e.g. by the official WeLink app) which means it has joined our WiFi network, acquired IP settings via DHCP and has access to the internet.

The Sonoff device sends a dispatch call as HTTPS POST request to eu-disp.coolkit.cc including some JSON encoded data about itself:

which consecutively will be used for further interchange.
Payload via the established WebSocket channel continues to be encoded in JSON.
The messages coming from the device can be classified into action-requests initiated by the device (which expect ackknowledgements by the server) and acknowledgement messages for requests initiated by the server.

As can be seen, action-requests initiated from server side also have an apikey field which can be — as long its used consistently in that WebSocket session — any generated UUID but the one used by the device.

Pay attention to the date format: it is some kind ISO 8601 but the parser is really picky about it. While python’s datetime.isoformat() function e.g. returns a string taking microseconds into account, the parser on the device will just fail parsing that. It also always expects the actually optional timezone being specified as UTC and only as a trailing Z (though according to the spec “00:00” would be valid as well).

3) action: update — the device tells the server its switch status, the MAC address of the accesspoint it is connected to, signal quality, etc.
This message also appears everytime the device status changes, e.g. it got switched on/off via the app or locally by pressing the button.

That’s it – that’s the basic handshake after the (configured) device powers up.

Now the server can tell the device to do stuff.

The sequence number is used by the device to acknowledge particular action-requests so the response can be mapped back to the actual request. It appears to be a UNIX timestamp with millisecond precision which doesn’t seem like the best source for generating a sequence number (duplicates, etc.) but seems to work well enough.

The downloadUrl field should be self-explanatory (the following HTTP GET request to those URLs contain some more data as CGI parameters which however can be ommitted).

The digest is a sha256 hash of the file and the name is the partition onto which the file should be written on.

Implementing server side

After some early approaches I decided to go for a Python implementation using the tornado webserver stack.
This decision was mainly based on it providing functionality for HTTP (obviously) as well as websockets and asynchronous handling of requests.

==> Trial & Error

1st attempt

As user1.1024.new.2.bin and user2.20124.new.2.bin almost look the same, let’s just use the same image for both, in this case a tasmota build:

MOEP! Boot fails.

Reason: The tasmota build also contains the bootloader which the Espressif OTA mechanism doesn’t expect being in the image.

2nd attempt

Chopping off the first 0x1000 bytes which contain the bootloader plus padding (filled up with 0xAA bytes).

MOEP! Boot fails.

Boot mode 1 and 2 / v1 and v2 image headers

The (now chopped) image and the original upgrade images appear to have different headers — even the very first byte (the files’ magic byte) differ.

The original image starts with 0xEA while the Tasmota build starts with 0xE9.

Apparently there are two image formats (called v1 and v2 or boot mode 1 and boot mode 2).
The former (older) one — used by Arduino/Tasmota — starts with 0xE9, while the latter (and apparently newer one) — used by the original firmware — starts with 0xEA.

The original bootloader only accepts images starting with 0xEA while the bootloader provided by Arduino/Tasmota only accepts such starting with 0xE9.

3rd attempt

Converting Arduino images to v2 images

Easier said than done, as the Arduino framework doesn’t seem to be capable of creating v2 images and none of the common tools appear to have conversion functionality.

Taking a closer look at the esptool.py project however, there seems to be (undocumented) functionality.esptool.py has the elf2image argument which — according source — allows switching between conversion to v1 and v2 images.

When using elf2image and also passing the --version parameter — which normally prints out the version string of the tool — the --version parameter gets redefined and expects an then argument: 1 or 2.

Besides the sonoff.ino.bin file the Tasmota project also creates an sonoff.ino.elf which can now be used in conjunction with esptool.py and the elf2image-parameter to create v2 images.

Remember the upgrade-action passed a 2-element list of download URLs to the device, having different names (user1.bin and user2.bin)?

This procedure now only works if the user1.bin image is being fetched and flashed.

Differences between user1.bin and user2.bin

The flash on the Sonoff devices is split into 2 parts (simplified!) which basically contain the same data (user1 and user2). As OTA upgrades are proven to fail sometimes for whatever reason, the upgrade will always happen on the currently inactive part, meaning, if the device is currently running the code from the user1 part, the upgrade will happen onto the user2 part.
That mechanism is not invented by Itead, but actually provided as off-the-shelf OTA solution by Espressif (the SoC manufacturer) itself.

For 1MB flash chips the user1 image is stored at offset 0x01000 while the user2 image is stored at 0x81000.

And indeed, the two original upgrade images (user1 and user2) differ significantly.

If flashing a user2 image onto the user1 part of the flash the device refuses to boot and vice versa.

While there’s not much information about how user1.bin and user2.bin technically differ from each other, khcnz pointed me to an Espressif document stating:

user1.bin and user2.bin are [the] same software placed to different regions of [the] flash. The only difference is [the] address mapping on flash.

So we will now create an user1 (without above applied modification> and an user2 (with above modification> image and converting them to v2 images with esptool.py as described above.

–> WORKS!

Depending on whether the original firmware was loaded from the user1 or user2 partition, it will fetch and flash the other image, telling the bootloader afterwards to change the active partition.

Issues

Mission accomplished? Not just yet…

Although our custom firmware is now flashed via the original OTA mechanism and running, the final setup differs in 2 major aspects (compared to if we would have flashed the device via serial):

The bootloader is still the original one

Our custom image might have ended up in the user2 partition

Each point alone already results in the Tasmota/Adruino OTA mechniasm not working.
Additionally — since the bootloader stays the original one — it still only expects v2 images and still messes with us with its ping-pong-mechanism.

This issue is already being addressed though and discussed on how to be solved best in the issue ticket mentioned at the very beginning.

May 26, 2017

After a massive refactoring & upgrade process, we have finally published the brand-new HDLMake 3.0 version. This version not only sports a whole set of new features, but has been carefully crafted so that the source code providing a common interface for the growing set of supported tools can be easily maintained.

These are some of the highlighted features for the new HDLMake v3.0 Release:

Updated HDL code parser and solver: the new release includes by default the usage of an embedded HDL code parser and file dependency solver to manage the synthesis and simulation process in an optimal way.

Support for Python 3.x: the new release supports both Python2.7 and Python3.x deployments in a single source code branch, enabling an easier integration into newer O.S. distributions.

Native support for Linux & Windows shells: The new release not only supports Linux shells as the previous ones, but features native support too for Windows shells such as the classic CMD promt or the new PowerShell.

TCL based Makefiles: in order to streamline the process of supporting as many tools as possible in a hierarchical way, in a changing world and rapidly evolving world of FPGA technology and tool providers, we have adopted TCL as the common language layer used by the generated synthesis Makefiles.

Proper packaging: from the HDLMake 3.0 onwards, the source code is distributed as a Python package, what allows for a much cleaner installation procedure.

May 23, 2017

Every so often I happen to be involved in designing electronics
equipment that's supposed to run reliably remotely in inaccessible
locations,without any ability for "remote hands" to perform things like
power-cycling or the like. I'm talking about really remote locations,
possible with no but limited back-haul, and a very high cost of ever
sending somebody there for remote maintenance.

Given that a lot of computer peripherals (chips, modules, ...) use USB
these days, this is often some kind of an embedded ARM (rarely x86) SoM
or SBC, which is hooked up to a custom board that contains a USB hub
chip as well as a line of peripherals.

One of the most important lectures I've learned from experience is:
Never trust reset signals / lines, always include power-switching
capability. There are many chips and electronics modules available on
the market that have either no RESET, or even might claim to have a
hardware RESET line which you later (painfully) discover just to be a
GPIO polled by software which can get stuck, and hence no way to really
hard-reset the given component.

In the case of a USB-attached device (even though the USB
might only exist on a circuit board between two ICs), this is typically
rather easy: The USB hub is generally capable of switching the power of
its downstream ports. Many cheap USB hubs don't implement this at all,
or implement only ganged switching, but if you carefully select your USB
hub (or in the case of a custom PCB), you can make sure that the given
USB hub supports individual port power switching.

Now the next step is how to actually use this from your (embedded) Linux
system. It turns out to be harder than expected. After all, we're
talking about a standard feature that's present in the USB
specifications since USB 1.x in the late 1990ies. So the expectation is
that it should be straight-forward to do with any decent operating
system.

I don't know how it's on other operating systems, but on Linux I
couldn't really find a proper way how to do this in a clean way. For
more details, please read my post to the linux-usb mailing list.

Why am I running into this now? Is it such a strange idea? I mean,
power-cycling a device should be the most simple and straight-forward
thing to do in order to recover from any kind of "stuck state" or other
related issue. Logical enabling/disabling of the port, resetting the
USB device via USB protocol, etc. are all just "soft" forms of a reset
which at best help with USB related issues, but not with any other part
of a USB device.

And in the case of e.g. an USB-attached cellular modem, we're actually
talking about a multi-processor system with multiple built-in
micro-controllers, at least one DSP, an ARM core that might run another
Linux itself (to implement the USB gadget), ... - certainly enough
complex software that you would want to be able to power-cycle it...

May 22, 2017

In two previous blog posts, we presented the hardware and software architecture of the automated testing platform we have created to test the Linux kernel on a large number of embedded platforms.

The primary use case for this infrastructure was to participate to the KernelCI.org testing effort, which tests the Linux kernel every day on many hardware platforms.

However, since our embedded boards are now fully controlled by LAVA, we wondered if we could not only use our lab for KernelCI.org, but also provide remote control of our boards to Free Electrons engineers so that they can access development boards from anywhere. lavabo was born from this idea and its goal is to allow full remote control of the boards as it is done in LAVA: interface with the serial port, control the power supply and provide files to the board using TFTP.

The advantages of being able to access the boards remotely are obvious: allowing engineers working from home to work on their hardware platforms, avoid moving the boards out of the lab and back into the lab each time an engineer wants to do a test, etc.

User’s perspective

From a user’s point of view, lavabo is used through the eponymous command lavabo, which allows to:

List the boards and their status$ lavabo list

Reserve a board for lavabo usage, so that it is no longer used for CI jobs$ lavabo reserve am335x-boneblack_01

Upload a kernel image and Device Tree blob so that it can be accessed by the board through TFTP$ lavabo upload zImage am335x-boneblack.dtb

Connect to the serial port of the board$ lavabo serial am335x-boneblack_01

Reset the power of the board$ lavabo reset am335x-boneblack_01

Power off the board$ lavabo power-off am335x-boneblack_01

Release the board, so that it can once again be used for CI jobs$ lavabo release am335x-boneblack_01

Overall architecture and implementation

The following diagram summarizes the overall architecture of lavabo (components in green) and how it connects with existing components of the LAVA architecture.

lavabo reuses LAVA tools and configuration files

A client-server software

lavabo follows the classical client-server model: the lavabo client is installed on the machines of users, while the lavabo server is hosted on the same machine as LAVA. The server-side of lavabo is responsible for calling the right tools directly on the server machine and making the right calls to LAVA’s API. It controls the boards and interacts with the LAVA instance to reserve and release a board.

On the server machine, a specific Unix user is configured, through its .ssh/authorized_keys to automatically spawn the lavabo server program when someone connects. The lavabo client and server interact directly using their stdin/stdout, by exchanging JSON dictionaries. This interaction model has been inspired from the Attic backup program. Therefore, the lavabo server is not a background process that runs permanently like traditional daemons.

Handling serial connection

Exchanging JSON over SSH works fine to allow the lavabo client to provide instructions to the lavabo server, but it doesn’t work well to provide access to the serial ports of the boards. However, ser2net is already used by LAVA and provides a local telnet port for each serial port. lavabo simply uses SSH port-forwarding to redirect those telnet ports to local ports on the user’s machine.

These additions to the LAVA API are used by the lavabo server to make reserve and release boards, so that there is no conflict between the CI related jobs (such as the ones submitted by KernelCI.org) and the direct use of boards for remote development.

Interaction with the boards

Now that we know how the client and the server interact and also how the server communicates with LAVA, we need a way to know which boards are in the lab, on which port the serial connection of a board is exposed and what are the commands to control the board’s power supply. All this configuration has already been given to LAVA, so lavabo server simply reads the LAVA configuration files.

The last requirement is to provide files to the board, such as kernel images, Device Tree blobs, etc. Indeed, from a network point of view, the boards are located in a different subnet not routed directly to the users machines. LAVA already has a directory accessible through TFTP from the boards which is one of the mechanisms used to serve files to boards. Therefore, the easiest and most obvious way is to send files from the client to the server and move the files to this directory, which we implemented using SFTP.

User authentication

Since the serial port cannot be shared among several sessions, it is essential to guarantee a board can only be used by one engineer at a time. In order to identify users, we have one SSH key per user in the .ssh/authorized_keys file on the server, each associated to a call to the lavabo-server program with a different username.

This allows us to identify who is reserving/releasing the boards, and make sure that serial port access, or requests to power off or reset the boards are done by the user having reserved the board.

For TFTP, the lavabo upload command automatically uploads files into a per-user sub-directory of the TFTP server. Therefore, when a file called zImage is uploaded, the board will access it over TFTP by downloading user/zImage.

Availability and installation

As you could guess from our love for FOSS, lavabo is released under the GNU GPLv2 license in a GitHub repository. Extensive documentation is available if you’re interested in installing lavabo. Of course, patches are welcome!

While the module includes an SGTL5000 codec, one of the requirements for that project was to handle up to eight audio channels. The SGTL5000 uses I²S and handles only two channels.

I2S timing diagram from the SGTL5000 datasheet

Thankfully, the i.MX7 has multiple audio interfaces and one is fully available on the SODIMM connector of the Colibri iMX7. A TI PCM3168 was chosen for the carrier board and is connected to the second Synchronous Audio Interface (SAI2) of the i.MX7. This codec can handle up to 8 output channels and 6 input channels. It can take multiple formats as its input but TDM takes the smaller number of signals (4 signals: bit clock, word clock, data input and data output).

TDM timing diagram from the PCM3168 datasheet

The current Linux long term support version is 4.9 and was chosen for this project. It has support for both the i.MX7 SAI (sound/soc/fsl/fsl_sai.c) and the PCM3168 (sound/soc/codecs/pcm3168a.c). That’s two of the three components that are needed, the last one being the driver linking both by describing the topology of the “sound card”. In order to keep the custom code to the minimum, there is an existing generic driver called simple-card (sound/soc/generic/simple-card.c). It is always worth trying to use it unless something really specific prevents that. Using it was as simple as writing the following DT node:

Only 4 input channels and 4 output channels are routed because the carrier board only had that wired.

There are two DAI links because the pcm3168 driver exposes inputs and outputs separately

As per the PCM3168 datasheet:

left justified mode is used

dai-tdm-slot-num is set to 8 even though only 4 are actually used

dai-tdm-slot-width is set to 32 because the codec takes 24-bit samples but requires 32 clocks per sample (this is solved later in userspace)

The codec is master which is usually best regarding clock accuracy, especially since the various SoMs on the market almost never expose the audio clock on the carrier board interface. Here, a crystal was used to clock the PCM3168.

The PCM3168 codec is added under the ecspi3 node as that is where it is connected:

Finally, an ALSA configuration file (/usr/share/alsa/cards/imx7-pcm3168.conf) was written to ensure samples sent to the card are in the proper format, S32_LE. 24-bit samples will simply have zeroes in the least significant byte. For 32-bit samples, the codec will properly ignore the least significant byte.
Also this describes that the first subdevice is the playback (output) device and the second subdevice is the capture (input) device.

On top of that, the dmix and dsnoop ALSA plugins can be used to separate channels.

To conclude, this shows that it is possible to easily leverage existing code to integrate an audio codec in a design by simply writing a device tree snippet and maybe an ALSA configuration file if necessary.

May 04, 2017

At Free Electrons, we regularly work on networking topics as part of our Linux kernel contributions and thus we decided to attend our very first Netdev conference this year in Montreal. With the recent evolution of the network subsystem and its drivers capabilities, the conference was a very good opportunity to stay up-to-date, thanks to lots of interesting sessions.

Eric Dumazet presenting “Busypolling next generation”

The speakers and the Netdev committee did an impressive job by offering such a great schedule and the recorded talks are already available on the Netdev Youtube channel. We particularly liked a few of those talks.

Andrew Lunn, Viven Didelot and Florian Fainelli presented DSA, the Distributed Switch Architecture, by giving an overview of what DSA is and by then presenting its design. They completed their talk by discussing the future of this subsystem.

DSA in one slide

The goal of the DSA subsystem is to support Ethernet switches connected to the CPU through an Ethernet controller. The distributed part comes from the possibility to have multiple switches connected together through dedicated ports. DSA was introduced nearly 10 years ago but was mostly quiet and only recently came back to life thanks to contributions made by the authors of this talk, its maintainers.

The main idea of DSA is to reuse the available internal representations and tools to describe and configure the switches. Ports are represented as Linux network interfaces to allow the userspace to configure them using common tools, the Linux bridging concept is used for interface bridging and the Linux bonding concept for port trunks. A switch handled by DSA is not seen as a special device with its own control interface but rather as an hardware accelerator for specific networking capabilities.

DSA has its own data plane where the switch ports are slave interfaces and the Ethernet controller connected to the SoC a master one. Tagging protocols are used to direct the frames to a specific port when coming from the SoC, as well as when received by the switch. For example, the RX path has an extra check after netif_receive_skb() so that if DSA is used, the frame can be tagged and reinjected into the network stack RX flow.

Finally, they talked about the relationship between DSA and Switchdev, and cross-chip configuration for interconnected switches. They also exposed the upcoming changes in DSA as well as long term goals.

As part of the network performances workshop, Jesper Dangaard Brouer presented memory bottlenecks in the allocators caused by specific network workloads, and how to deal with them. The SLAB/SLUB baseline performances are found to be too slow, particularly when using XDP. A way from a driver to solve this issue is to implement a custom page recycling mechanism and that’s what all high-speed drivers do. He then displayed some data to show why this mechanism is needed when targeting the 10G network budget.

Jesper is working on a generic solution called page pool and sent a first RFC at the end of 2016. As mentioned in the cover letter, it’s still not ready for inclusion and was only sent for early reviews. He also made a small overview of his implementation.

These two talks were given by Gilberto Bertin from Cloudflare and Martin Lau from Facebook. While they were not talking about device driver implementation or improvements in the network stack directly related to what we do at Free Electrons, it was nice to see how XDP is used in production.

XDP, the eXpress Data Path, provides a programmable data path at the lowest point of the network stack by processing RX packets directly out of the drivers’ RX ring queues. It’s quite new and is an answer to lots of userspace based solutions such as DPDK. Gilberto andMartin showed excellent results, confirming the usefulness of XDP.

From a driver point of view, some changes are required to support it. RX hooks must be added as well as some API changes and the driver’s memory model often needs to be updated. So far, in v4.10, only a few drivers are supporting XDP.

David S. Miller, the maintainer of the Linux networking stack and drivers, did an interesting keynote about XDP and eBPF. The eXpress Data Path clearly was the hot topic of this Netdev 2.1 conference with lots of talks related to the concept and David did a good overview of what XDP is, its purposes, advantages and limitations. He also quickly covered eBPF, the extended Berkeley Packet Filters, which is used in XDP to filter packets.

This presentation was a comprehensive introduction to the concepts introduced by XDP and its different use cases.

Conclusion

Netdev 2.1 was an excellent experience for us. The conference was well organized, the single track format allowed us to see every session on the schedule, and meeting with attendees and speakers was easy. The content was highly technical and an excellent opportunity to stay up-to-date with the latest changes of the networking subsystem in the kernel. The conference hosted both talks about in-kernel topics and their use in userspace, which we think is a very good approach to not focus only on the kernel side but also to be aware of the users needs and their use cases.

This is a much smaller group, typically about 20 people, and is limited
to actual developers who have a past record of contributing to any of
the many Osmocom projects.

We had a large number of presentation and discussions. In fact, so
large that the schedule of talks extended from 10am to midnight on some
days. While this is great, it also means that there was definitely too
little time for more informal conversations, chatting or even actual
work on code.

We also have such a wide range of topics and scope inside Osmocom, that
the traditional ad-hoch scheduling approach no longer seems to be
working as it used to. Not everyone is interested in (or has time for)
all the topics, so we should group them according to their topic/subject
on a given day or half-day. This will enable people to attend only
those days that are relevant to them, and spend the remaining day in an
adjacent room hacking away on code.

It's sad that we only have OsmoDevCon once per year. Maybe that's
actually also something to think about. Rather than having 4 days once
per year, maybe have two weekends per year.

Overhyped Docker missing the most basic features

I've always been extremely skeptical of suddenly emerging over-hyped
technologies, particularly if they advertise to solve problems by adding
yet another layer to systems that are already sufficiently complex
themselves.

There are of course many issues with containers, ranging from replicated
system libraries and the basic underlying statement that you're giving
up on the system packet manager to properly deal with dependencies.

I'm also highly skeptical of FOSS projects that are primarily driven by
one (VC funded?) company. Especially if their offering includes a
so-called cloud service which they can stop to operate at any given
point in time, or (more realistically) first get everybody to use and
then start charging for.

But well, despite all the bad things I read about it over the years, on
one day in May 2017 I finally thought let's give it a try. My problem
to solve as a test balloon is fairly simple.

My basic use case

The plan is to start OsmoSTP, the m3ua-testtool and the sua-testtool,
which both connect to OsmoSTP. By running this setup inside containers
and inside an internal network, we could then execute the entire
testsuite e.g. during jenkins test without having IP address or port
number conflicts. It could even run multiple times in parallel on one
buildhost, verifying different patches as part of the continuous
integration setup.

This application is not so complex. All it needs is three containers,
an internal network and some connections in between. Should be a piece
of cake, right?

But enter the world of buzzword-fueled web-4000.0 software-defined
virtualised and orchestrated container NFW + SDN vodoo: It turns out to
be impossible, at least not with the preferred tools they advertise.

Dockerfiles

The part that worked relatively easily was writing a few Dockerfiles to
build the actual containers. All based on debian:jessie from the
library.

As m3ua-testsuite is written in guile, and needs to build some guile
plugin/extension, I had to actually include guile-2.0-dev and other
packages in the container, making it a bit bloated.

I couldn't immediately find a nice example Dockerfile recipe that would
allow me to build stuff from source outside of the container, and then
install the resulting binaries into the container. This seems to be a
somewhat weak spot, where more support/infrastructure would be helpful.
I guess the idea is that you simply install applications via package
feeds and apt-get. But I digress.

So after some tinkering, I ended up with three docker containers:

one running OsmoSTP

one running m3ua-testtool

one running sua-testtool

I also managed to create an internal bridged network between the
containers, so the containers could talk to one another.

However, I have to manually start each of the containers with ugly long
command line arguments, such as docker run --network sigtran --ip
172.18.0.200 -itosmo-stp-master. This is of course sub-optimal, and
what Docker Services + Stacks should resolve.

Services + Stacks

The idea seems good: A service defines how a given container is run,
and a stack defines multiple containers and their relation to each
other. So it should be simple to define a stack with three
services, right?

Well, it turns out that it is not. Docker documents that you can
configure a static ipv4_address[1] for each service/container, but it
seems related configuration statements are simply silently
ignored/discarded [2], [3], [4].

This seems to be related that for some strange reason stacks can (at
least in later versions of docker) only use overlay type networks,
rather than the much simpler bridge networks. And while bridge
networks appear to support static IP address allocations, overlay
apparently doesn't.

I still have a hard time grasping that something that considers itself a
serious product for production use (by a company with estimated value
over a billion USD, not by a few hobbyists) that has no support for
running containers on static IP addresses. that. How many applications
out there have I seen that require static IP address configuration? How
much simpler do setups get, if you don't have to rely on things like
dynamic DNS updates (or DNS availability at all)?

So I'm stuck with having to manually configure the network between my
containers, and manually starting them by clumsy shell scripts, rather
than having a proper abstraction for all of that. Well done :/

Exposing Ports

Unrelated to all of the above: If you run some software inside
containers, you will pretty soon want to expose some network services
from containers. This should also be the most basic task on the planet.

However, it seems that the creators of docker live in the early 1980ies,
where only TCP and UDP transport protocols existed. They seem to have
missed that by the late 1990ies to early 2000s, protocols like SCTP or DCCP were invented.

Now some of the readers may think 'who uses SCTP anyway'. I will give
you a straight answer: Everyone who has a mobile phone uses SCTP. This
is due to the fact that pretty much all the connections inside cellular
networks (at least for 3G/4G networks, and in reality also for many 2G
networks) are using SCTP as underlying transport protocol, from the
radio access network into the core network. So every time you switch
your phone on, or do anything with it, you are using SCTP. Not on your
phone itself, but by all the systems that form the network that you're
using. And with the drive to C-RAN, NFV, SDN and all the other
buzzwords also appearing in the Cellular Telecom field, people should
actually worry about it, if they want to be a part of the software stack
that is used in future cellular telecom systems.

Summary

After spending the better part of a day to do something that seemed like
the most basic use case for running three networked containers using
Docker, I'm back to step one: Most likely inventing some custom
scripts based on unshare to run my three
test programs in a separate network namespace for isolated execution of
test suite execution as part of a Jenkins CI setup :/

It's also clear that Docker apparently don't care much about playing a
role in the Cellular Telecom world, which is increasingly moving away
from proprietary and hardware-based systems (like STPs) to virtualised,
software-based systems.

With 137 patches contributed, Free Electrons is the 18th contributing company according to the Kernel Patch Statistics. Free Electrons engineer Maxime Ripard appears in the list of top contributors by changed lines in the LWN statistics.

Our most important contributions to this release have been:

Support for Atmel platforms

Alexandre Belloni improved suspend/resume support for the Atmel watchdog driver, I2C controller driver and UART controller driver. This is part of a larger effort to upstream support for the backup mode of the Atmel SAMA5D2 SoC.

Boris Brezillon contributed a fix for the Atmel HLCDC display controller driver, as well as fixes for the atmel-ebi driver.

Support for Allwinner platforms

Boris Brezillon contributed a number of improvements to the sunxi-nand driver.

Mylène Josserand contributed a new driver for the digital audio codec on the Allwinner sun8i SoC, as well a the corresponding Device Tree changes and related fixes. Thanks to this driver, Mylène enabled audio support on the R16 Parrot and A33 Sinlinx boards.

Maxime Ripard contributed official Device Tree bindings for the ARM Mali GPU, which allows the GPU to be described in the Device Tree of the upstream kernel, even if the ARM kernel driver for the Mali will never be merged upstream.

Maxime Ripard contributed a number of fixes for the rtc-sun6i driver.

Maxime Ripard enabled display support on the A33 Sinlinx board, by contributing a panel driver and the necessary Device Tree changes.

Maxime Ripard continued his clean-up effort, by converting the GR8 and sun5i clock drivers to the sunxi-ng clock infrastructure, and converting the sun5i pinctrl driver to the new model.

Quentin Schulz added a power supply driver for the AXP20X and AXP22X PMICs used on numerous Allwinner platforms, as well as numerous Device Tree changes to enable it on the R16 Parrot and A33 Sinlinx boards.

Support for Marvell platforms

Grégory Clement added support for the RTC found in the Marvell Armada 7K and 8K SoCs.

Grégory Clement added support for the Marvell 88E6141 and 88E6341 Ethernet switches, which are used in the Armada 3700 based EspressoBin development board.

Thomas Petazzoni contributed a number of fixes to the OMAP hwrng driver, which turns out to also be used on the Marvell 7K/8K platforms for their HW random number generator.

Thomas Petazzoni contributed a number of patches for the mvpp2 Ethernet controller driver, preparing the future addition of PPv2.2 support to the driver. The mvpp2 driver currently only supports PPv2.1, the Ethernet controller used on the Marvell Armada 375, and we are working on extending it to support PPv2.2, the Ethernet controller used on the Marvell Armada 7K/8K. PPv2.2 support is scheduled to be merged in 4.12.

Support for RaspberryPi platforms

Boris Brezillon contributed Device Tree changes to enable the VEC (Video Encoder) on all bcm283x platforms. Boris had previously contributed the driver for the VEC.

In addition to our direct contributions, a number of Free Electrons engineers are also maintainers of various subsystems in the Linux kernel. As part of this maintenance role:

May 01, 2017

My former gpl-violations.org colleague Armijn Hemel and Shane Coughlan
(former coordinator of the FSFE Legal Network) have written a book on
practical GPL compliance issues.

I've read through it (in the bath tub of course, what better place to
read technical literature), and I can agree wholeheartedly with its
contents. For those who have been involved in GPL compliance
engineering there shouldn't be much new - but for the vast majority of
developers out there who have had little exposure to the
bread-and-butter work of providing complete an corresponding source
code, it makes an excellent introductory text.

The book focuses on compliance with GPLv2, which is probably not too
surprising given that it's published by the Linux foundation, and Linux
being GPLv2.

Given the subject matter is Free Software, and the book is written by
long-time community members, I cannot help to notice a bit of a surprise
about the fact that the book is released in classic copyright under All
rights reserved with no freedom to the user.

Considering the sensitive legal topics touched, I can understand the
possible motivation by the authors to not permit derivative works. But
then, there still are licenses such as CC-BY-ND which prevent derivative
works but still permit users to make and distribute copies of the work
itself. I've made that recommendation / request to Shane, let's see
if they can arrange for some more freedom for their readers.

We've not only been sold out, but we also had to turn down some last
minute registrations due to the venue being beyond capacity (60
seats). People traveled from Japan, India, the US, Mexico and many
other places to attend.

We've had an amazing audience ranging from commercial operators to
community cellular operators to professional developers doing work
relate to osmocom, academia, IT security crowds and last but not least
enthusiasts/hobbyists, with whom the project[s] started.

We have very professional live streaming + video recordings courtesy
of the C3VOC team. Thanks a lot for your
support and for having the video recordings of all talks online already at
the next day after the event.

We also received some requests for improvements, many of which we will
hopefully consider before the next Osmocom Conference:

have a multiple day event. Particularly if you're traveling
long-distance, it is a lot of overhead for a single-day event. We of
course fully understand that. On the other hand, it was the first
Osmocom Conference, and hence it was a test balloon where it was
initially unclear if we'll be able to get a reasonable number of
attendees interested at all, or not. And organizing an event with
venue and talks for multiple days if in the end only 10 people attend
would have been a lot of effort and financial risk. But now that we
know there are interested folks, we can definitely think of a multiple
day event next time

Signs indicating venue details on the last meters. I agree, this cold
have been better. The address of the venue was published, but we
could have had some signs/posters at the door pointing you to the
right meeting room inside the venue. Sorry for that.

Better internet connectivity. This is a double-edged sword. Of
course we want our audience to be primarily focused on the talks and
not distracted :P I would hope that most people are able to survive
a one day event without good connectivity, but for sure we will have
to improve in case of a multiple-day event in the future

In terms of my requests to the attendees, I only have one

Participate in the discussions on the schedule/programme while it is
still possible to influence it. When we started to put together the
programme, I posted about it on the openbsc mailing list and invited
feedback. Still, most people seem to have missed the time window
during which talks could have been submitted and the schedule still
influenced before finalizing it

Register in time. We have had almost no registrations until about two
weeks ahead of the event (and I was considering to cancel it), and
then suddenly were sold out in the week ahead of the event. We've had
people who first booked their tickets, only to learn that the tickets
were sold out. I guess we will introduce early bird pricing and add
a very expensive last minute ticket option next year in order to
increase motivation to register early and thus give us flexibility
regarding venue planning.

April 28, 2017

For those of you who aren't keeping up with my occasional Twitter/Facebook posts on the subject, I volunteer with a local search and rescue unit. This means that a few times a month I have to grab my gear and run out into the woods on zero notice to find an injured hiker, locate an elderly person with Alzheimer's, or whatever the emergency du jour is.

Since I don't have time to grab fresh food on my way out the door when duty calls, I keep my pack and load-bearing vest stocked with shelf-stable foods like energy bars and surplus military rations. Many missions are short and intense, leaving me no time to eat anything but finger-food items (Clif bars and First Strike Ration sandwiches are my favorites) kept in a vest pocket.

On the other hand, during longer missions there may be opportunities to make hot food while waiting for a medevac helicopter, ground team with stretcher, etc - and of course there's plenty of time to cook a hot dinner during training weekends. Besides being a convenience, hot food and drink helps us (and the subject) avoid hypothermia so it can be a literal life-saver.

I've been using MRE chemical heaters for this, because they're small, lightweight (20 g / 0.7 oz each), and not too pricey (about $1 each from surplus dealers). Their major flaw is that they don't get all that hot, so during cold weather it's hard to get your food more than lukewarm.

I've used many kinds of camp stoves (propane and white gas primarily) over the course of my camping, but didn't own one small enough to use for SAR. My full 48-hour gear loadout (including water) weighs around 45 pounds / 20 kg, and I really didn't want to add much more to this. The MSR Whisperlite, for example, weighs in at 430 g / 15.2 oz for the stove, fuel pump, and wind shield. Add to this 150 g / 5.25 oz for the fuel bottle, a pot to cook in, and the fuel itself and you're looking at close to 1 kg / 2 pounds all told.

I have an aluminum camp frying pan that, including lid, weighs 121 g / 4.3 oz. It seemed hard to get much lighter for something large enough that you could squeeze an MRE entree into, so I kept it.

After a bit of browsing in the local Wal-Mart, I found a tiny sheet metal folding stove that weighed 112 g / 3.98 oz empty. It's designed to burn pellets of hexamine fuel.

The stove. Ignore the aluminum foil, it was there from a previous experiment.

In my testing it worked pretty well. One pellet brought 250 ml of water from 10C to boiling in six minutes, and held it at a boil for a minute before burning out. The fuel burned fairly cleanly and didn't leave that much soot on the pot either, which was nice.

What's not so nice, however, was the fuel. According to the MSDS, hexamine decomposes upon heating or contact with skin into formaldehyde, which is toxic and carcinogenic. Combustion products include such tasty substances as hydrogen cyanide and ammonia. This really didn't seem like something that I wanted to handle, or burn, in close proximity to food! Thus began my quest for a safer alternative.

My first thought was to use tea light candles, since I already had a case of a hundred for use as fire starters. In my testing, one tea light was able to heat a pot of water from 10C to 30C in a whopping 21 minutes before starting to reach an equilibrium where the pot lost heat as fast as it gained it. I continued the test out to 34 minutes, at which point it was a toasty 36C.

The stove was big enough to fit more than one tea light, so the obvious next step was to put six of them in a 3x2 grid. This heated significantly more, at the 36-minute mark my water measured a respectable 78C.

I figured I was on the right track, but needed to burn more wax per unit time. Some rough calculations suggested that a brick of paraffin wax the size of the stove and about as thick as a tea light contained 1.5 kWh of energy, and would output about 35 W of heat per wick. Assuming 25% energy transfer efficiency, which seemed reasonable based on the temperature data I had measured earlier, I needed to put out around 675 W to bring my pot to a boil in ten minutes. This came out to approximately 20 candle wicks.

I started out by folding a tray out of heavy duty aluminum foil, and reinforcing it on the outside with aluminum foil duct tape. I then bought a pack of tea light wicks on Amazon and attached them to the tray with double-sided tape.

Giant 20-wicked candle before adding wax

I made a water bath on my hot plate and melted a bunch of tea lights in a beaker. I wasn't in the mood to get spattered with hot wax so I wore long-sleeved clothes and a face shield. I was pretty sure that the water bath wouldn't get anywhere near the ignition point of the wax but did the work outside on a concrete patio and had a CO2 fire extinguisher on standby just in case.

Melting wax. Safety first, everyone!

The resulting behemoth of a candle actually looked pretty nice!

20-wick, 700W thermal output candle with tea lights for scale

After I was done and the wax had solidified I put the candle in my stove and lit it off. It took a while to get started (a light breeze kept blowing out one wick or another and I used quite a few matches to get them all lit), but after a while I had a solid flame going. At the six-minute mark my water had reached 37C.

A few minutes later, disaster struck! The pool of molten wax reached the flash point and ignited across the whole surface. At this point I had a massive flame - my pot went from 48 to 82C in two minutes! This translates to 2.6 kW assuming 100% energy transfer efficiency, so actual power output was probably upwards of 5 kW.

I removed the pot (using welding gloves since the flames were licking up the handle) and grabbed a photo of the fireball before thinking about how to extinguish the fire.

Pretty sure this isn't what a stove is supposed to look like

Since I was outside on a non-flammable surface the fire wasn't an immediate safety hazard, but I wanted to put it out non-destructively to preserve evidence for failure analysis. I opted to smother it with a giant candle snuffer that I rapidly folded out of heavy-duty aluminum foil.

The carnage after the fire was extinguished. Note the discolored wax!

It took me a while to clean up the mess - the giant candle had turned tan from incomplete combustion. It had also sprung a leak at some point, spilling a bit of wax out onto my patio.

On top of that, my pot was coal-black from all of the soot the super-rich flame was putting out. My wife wouldn't let it anywhere near the sink so I scrubbed it as best I could in the bathtub, then spent probably 20 minutes scrubbing all of the gray stains off the tub itself.

In order to avoid the time-consuming casting of wax, my next test used a slug of wax from a tea light that I drilled holes in, then inserted four wicks. I covered the top of the candle with aluminum foil tape to reflect heat back up at the pot, in a bid to increase efficiency and keep the melt puddle below the flash point.

Quad-wick tea light

This performed pretty well in my test. It got my pot up to 35C at the 12-minute mark, which was right about where I expected based on the x1 and x6 candle tests, and didn't flash over.

The obvious next step was to make five of them and see if this would work any better. It ignited more easily than the "brick" candle, and reached 83C at the 6-minute mark. Before T+ 7 minutes, however, the glue on the tape had failed from the heat, and the wax flashed. By the time I got the pot out of harm's way the water was boiling and it was covered in soot (again).

This time, it was a little bit breezier and my snuffer failed to exclude enough air to extinguish the flames. I ended up having to blast it with the CO2 extinguisher I had ready for just this situation. It wasn't hard to put out and I only used about two of the ten pounds of gas. (Ironically, I had planned to take the extinguisher in to get serviced the next morning because it was almost due for annual preventive maintenance. I ended up needing a recharge too...)

After cleaning off my pot and stove, and scraping some of the spilled wax off my driveway, it was back to the drawing board. I thought about other potential fuels I had lying around, and several obvious options came to mind.

Testing booze for flammability

I'm not a big drinker but houseguests have resulted in me having a few bottles of liquor around so I tested it out. Jack didn't burn at all, Captain Morgan white rum burned fitfully and left a sugary residue without putting out much heat. 100-proof vodka left a bit of starchy residue and was tricky to light.

A tea light cup full of 99% isopropyl alcohol brought my pot to 75C in five minutes before burning out, but was filthy and left soot everywhere. Hand sanitizer (about 60% ethanol) burned cleanly, but slower and cooler due to the water content - peak temperature of 54C and 12 minute burn time.

Ethanol seemed like a viable fuel if I could get it up to a higher concentration. I wanted to avoid liquid fuels due to difficulty of handling and the risk of spills, but a thick gel that didn't spill easily looked like a good option.

After a bit of research I discovered that calcium acetate (a salt of acetic acid) was very soluble in water, but not in alcohols. When a saturated solution of it in water is added to an alcohol it forms a stiff gel, commonly referred to as a "California snowball" because it burns and has a consistency like wet snow. I don't have any photos of my test handy, but here's a video from somebody else that shows it off nicely.

Two tea light cups full of the stuff brought my pot of water to a boil in 8 minutes, and held it there until burning out just before the 13-minute mark. I also tried boiling a FSR sandwich packet in a half-inch or so of water, and it was deliciously warm by the end. This seemed like a pretty good fuel!

Testing the calcium acetate fuel. I put a lid on the pot after taking this pic.

I filled two film-canister type containers with the calcium acetate + ethanol gel fuel and left it in my SAR pack. As luck would have it, I spent the next day looking for a missing hiker so it spent quite a while bouncing around driving on dirt roads and hiking.

When I got home I was disappointed to see clear liquid inside the bag that my stove and fuel were stored in. I opened the canisters only to find a thin whitish liquid instead of a stiff gel.

It seemed that the calcium acetate gel was not very stable, and over time the calcium acetate particles would precipitate out and the solution would revert to a liquid state. This clearly would not do.

Hand sanitizer seemed like a pretty good fuel other than being underpowered and perfumed, so I went to the grocery store and started looking at ingredient lists. They all seemed pretty similar - ethanol, water, aloe and other moisturizers, perfumes, maybe colorants, and a thickener. The thickener was typically either hydroxyethyl cellulose or a carbomer.

A few minutes on Amazon turned up a bag of Carbomer 940, a polyvinyl carboxy polymer cross-linked with esters of pentaerythritol. It's supposed to produce a viscosity of 45,000 to 70,000 CPS when added to water at 0.5% by weight. I also ordered a second bottle of Reagent Alcohol (90% ethanol / 5% methanol / 5% isopropanol with no bittering agents, ketones, or non-volatile ingredients) since my other one was pretty low after the calcium acetate failure.Carbomer 940 is fairly acidic (pH 2.7 - 3.3 at 0.5% concentration) in its pure form and gel when neutral or alkaline, so it needs to be neutralized. The recommended base for alcohol-based gels was triethanolamine, so I picked up a bottle of that too.

Preparing to make carbomer-alcohol fuel gel

I made a 50% alcohol-water solution and added an 0.5% mass of carbomer. It didn't seem to fully dissolve, leaving a bunch of goopy chunks in the beaker.

Incompletely dissolved Carbomer 940 in 50/50 water/alcohol

I left it overnight to dissolve, blended it more, and then filtered off any big clumps with a coffee filter. I then added a few drops of triethanolamine, at which point the solution immediately turned cloudy. Upon blending, a rubbery white substance preciptated out of solution and stuck to my stick blender and the sidewalls of the beaker. This was not supposed to happen!

Rubbery goop on the blender head

Precipitate at the bottom of the beaker

I tried everything I could think of - diluting the triethanolamine and adding it slowly to reduce sudden pH changes, lowering the alcohol concentration, and even letting the carbomer sit in solution for a few days before adding the triethanolamine. Nothing worked.I went back to square one and started reading more papers and watching process demonstration videos from the manufacturer. Eventually I noticed one source that suggested increasing the pH of the water to about 8 *before* adding the carbomer. This worked and gave a beautiful clear gel!

After a bit of tinkering I found a good process: Starting with 100 ml of water, titrate to pH 8 with triethanolamine. Add 1 g of carbomer powder and blend until fully gelled. Add 300 ml of reagent alcohol a bit at a time, mixing thoroughly after each addition. About halfway through adding the alcohol the gel started to get pretty runny so I mixed in a few more drops of triethanolamine and another 500 mg of carbomer powder before mixing in the rest of the alcohol. I had only a little more alcohol left in the bottle (maybe 50 ml) so I stirred that in without bothering to measure.The resulting gel was quite stiff and held its shape for a little while after pouring, but could still be transferred between containers without muich difficulty.

Tea light can full of my final fuel

I left the beaker of fuel in my garage for several days and shook it around a bit, but saw no evidence of degradation. Since it's basically just turbo-strength hand sanitizer (~78% instead of the usual 30-60%) without all of the perfumes and moisturizers, it should be pretty stable. I had no trouble igniting it down to 10C ambient temperatures, but may find it necessary to mix in some acetone or other low-flash-point fuel to light it reliably in the winter.The final batch of fuel filled two polypropylene specimen jars perfectly with just a little bit left over for a cooking test.

One of my two fuel jars

One tea light canister held 10.7 g / 0.38 oz of fuel, and I typically use two at a time, so 21.4 / 0.76 oz. One jar thus holds enough fuel for about five cook sessions, which is more than I'd ever need for a SAR mission or weekend camping trip. The final weight of my entire cooking system (stove, one fuel jar, tea light cans, and pot) comes out to 408 g / 14.41 oz, or a bit less than an empty Whisperlite stove (not counting the pot, fuel tank, or fuel)!The only thing left was to try cooking on it. I squeezed a bacon-cheddar FSR sandwich into my pot, added a bit of water, and put it on top of the stove with two candle cups of fuel.

Nice clean blue flame, barely visible

By the six-minute mark the water was boiling away merrily and a cloud of steam was coming up around the edge of the lid. I took the pot off around 8 minutes and removed my snack.

Munching on my sandwich. You can't tell in this lighting, but the stove is still burning.

For those of you who haven't eaten First Strike Rations, the sandwiches in them are kind of like Hot Pockets or Toaster Strudels, except with a very thick and dense bread rather than a fluffy, flaky one. The fats in the bread are solid at room temperature and liquefy once it gets warm. This significantly softens the texture of the bread and makes it taste a lot better, so reaching this point is generally the primary goal when cooking one.My sandwich was firmly over that line and tasted very good (for Army food baked two years ago). The bacon could have been a bit warmer, but the stove kept on burning until a bit after the ten-minute mark so I could easily have left it in the boiling water for another two minutes and made it even hotter.

Once I was done eating it was time to clean up. The stove had no visible dirt (beyond what was there from my previous experiments), and the tea light canisters were clean and fairly free of soot except in one or two spots around the edges. Almost no goopy residue was left behind.

Stove after the cook test

The pot was quite clean as well, with no black soot and only a very thin film of discoloration that was thin enough to leave colored interference fringes. Some of this was left over from previous testing, so if this test had been run on a virgin pot there'd be even less residue.

Bottom of the pot after the cook test

Overall, it was a long journey with many false steps, but I now have the ability to cook for myself over a weekend trip in less than a pound of weight, so I'm pretty happy.

EDIT: A few people have asked to see the raw data from my temperature-vs-time cook tests, so here it is.

The ware for March 2017 seems to be a Schneider ATV61 industrial variable speed drive controller. As rasz_pl pointed out, I left the sticker unredacted. I had a misgiving about hiding it fearing the ware would be unguessable, but leaving it in made it perhaps a bit too easy. Prize goes to rasz_pl for being the first to guess, email me for your prize!

April 25, 2017

TeleMini V3.0 Dual-deploy altimeter with telemetry now available

TeleMini v3.0 is an update to our original TeleMini v1.0 flight
computer. It is a miniature (1/2 inch by 1.7 inch) dual-deploy flight
computer with data logging and radio telemetry. Small enough to fit
comfortably in an 18mm tube, this powerful package does everything you
need on a single board:

I don't have anything in these images to show just how tiny this board
is—but the spacing between the screw terminals is 2.54mm (0.1in), and the
whole board is only 13mm wide (1/2in).

This was a fun board to design. As you might guess from the version
number, we made a couple prototypes of a version 2 using the
same CC1111 SoC/radio part as version 1 but in the EasyMini form
factor (0.8 by 1.5 inches). Feed back from existing users indicated
that bigger wasn't better in this case, so we shelved that design.

With the availability of the STM32F042 ARM Cortex-M0 part in a 4mm
square package, I was able to pack that, the higher power CC1200 radio
part, a 512kB memory part and a beeper into the same space as the
original TeleMini version 1 board. There is USB on the board, but it's
only on some tiny holes, along with the cortex SWD debugging
connection. I may make some kind of jig to gain access to that for
configuration, data download and reprogramming.

For those interested in an even smaller option, you could remove
the screw terminals and battery connector and directly wire to the
board, and replace the beeper with a shorter version. You could even
cut the rear mounting holes off to make the board shorter; there are
no components in that part of the board.

April 18, 2017

The White Rabbit team at CERN organised a short course about fibre-optic cleaning and inspection.

A special fibre inspection microscope that automatically analyses the image to decide if a cable or SFP passes or fails the norms was demonstrated.The images of some of the often-used cables and SFP modules that we picked from the development lab, showed clearly traces of grease and dust.

The course showed undoubtedly that fibres should always be inspected and that in almost all cases they should be cleaned before plugging in. One should not forget to inspect and clean the SFP side either!

Thanks to Amin Shoaie from CERN's EN-EL group for making this course available.Note that this course and the practical exercises will be repeated at CERN in the last week of April. Please contact us if you are interested.

April 16, 2017

Observations on SCTP and Linux

When I was still doing Linux kernel work with netfilter/iptables in the
early 2000's, I was somebody who actually regularly had a look at the
new RFCs that came out. So I saw the SCTP RFCs, SIGTRAN RFCs, SIP and
RTP, etc. all released during those years. I was quite happy to see
that for new protocols like SCTP and later DCCP, Linux quickly received
a mainline implementation.

Now most people won't have used SCTP so far, but it is a protocol used
as transport layer in a lot of telecom protocols for more than a decade
now. Virtually all protocols that have traditionally been spoken over
time-division multiplex E1/T1 links have been migrated over to SCTP
based protocol stackings.

Working on various Open Source telecom related projects, i of course
come into contact with SCTP every so often. Particularly some years
back when implementing the Erlang SIGTAN code in erlang/osmo_ss7 and most recently
now with the introduction of libosmo-sigtran with its OsmoSTP, both part
of the libosmo-sccp repository.

I've also hard to work with various proprietary telecom equipment over
the years. Whether that's some eNodeB hardware from a large brand
telecom supplier, or whether it's a MSC of some other vendor. And they
all had one thing in common: Nobody seemed to use the Linux kernel SCTP
code. They all used proprietary implementations in userspace, using RAW
sockets on the kernel interface.

I always found this quite odd, knowing that this is the route that you
have to take on proprietary OSs without native SCTP support, such as
Windows. But on Linux? Why? Based on rumors, people find the Linux
SCTP implementation not mature enough, but hard evidence is hard to come
by.

Sure, software always has bugs and will have bugs. But we at Osmocom
are 10-15 years "late" with our implementations of higher-layer
protocols compared to what the mainstream telecom industry does. So if
we find something, and we find it even already during R&D of some
userspace code, not even under load or in production, then that seems a
bit unsettling.

One would have expected, with all their market power and plenty of
Linux-based devices in the telecom sphere, why did none of those large
telecom suppliers invest in improving the mainline Linux SCTP code? I
mean, they all use UDP and TCP of the kernel, so it works for most of
the other network protocols in the kernel, but why not for SCTP? I
guess it comes back to the fundamental lack of understanding how open
source development works. That it is something that the given
industry/user base must invest in jointly.

The leatest discovered bug

For quite some time I was seeing some erratic behavior when at some
point the STP would not receive/process a given message sent by one of
the clients (ASPs) connected. I tried to ignore the problem initially
until the code matured more and more, but the problems remained.

It became even more obvious when using Michael Tuexen's m3ua-testtool,
where sometimes even the most basic test cases consisting of sending +
receiving a single pair of messages like ASPUP -> ASPUP_ACK was failing.
And when the test case was re-tried, the problem often disappeared.

Also, whenever I tried to observe what was happening by meas of strace,
the problem would disappear completely and never re-appear until strace
was detached.

Of course, given that I've written several thousands of lines of new
code, it was clear to me that the bug must be in my code. Yesterday I
was finally prepare to accept that it might actually be a Linux SCTP
bug. Not being able to reproduce that problem on a FreeBSD VM also
pointed clearly into this direction.

Now I could simply have collected some information and filed a bug
report (which some kernel hackers at RedHat have thankfully invited me
to do!), but I thought my use case was too complex. You would have to
compile a dozen of different Osmocom libraries, configure the STP, run
the scheme-language m3ua-testtool in guile, etc. - I guess nobody
would have bothered to go that far.

So today I tried to implement a test case that reproduced the problem in
plain C, without any external dependencies. And for many hours, I
couldn't make the bug to show up. I tried to be as close as possible to
what was happening in OsmoSTP: I used non-blocking mode on client and
server, used the SCTP_NODELAY socket option, used the sctp_rcvmsg()
library wrapper to receive events, but the bug was not reproducible.

Some hours later, it became clear that there was one setsockopt() in
OsmoSTP (actually, libosmo-netif) which enabled all existing SCTP
events. I did this at the time to make sure OsmoSTP has the maximum
insight possible into what's happening on the SCTP transport layer, such
as address fail-overs and the like.

As it turned out, adding that setsockopt for SCTP_FLAGS to my test code
made the problem reproducible. After playing around which of the flags,
it seems that enabling the SENDER_DRY_EVENT flag makes the bug appear.

With that work-around in place, suddenly all the m3ua-testtool and sua-testtool test cases are reliably green
(PASSED) and OsmoSTP works more smoothly, too.

What do we learn from this?

Free Software in the Telecom sphere is getting too little attention.
This is true even those small portions of telecom relevant protocols
that ended up in the kernel like SCTP or more recently the GTP module I
co-authored. They are getting too little attention in development, even
more lack of attention in maintenance, and people seem to focus more on
not using it, rather than fixing and maintaining what is there.

It makes me really sad to see this. Telecoms is such a massive
industry, with billions upon billions of revenue for the classic telecom
equipment vendors. Surely, they would be able to co-invest in some
basic infrastructure like proper and reliable testing / continuous
integration for SCTP. More recently, we see millions and more millions
of VC cash burned by buzzword-flinging companies doing "NFV" and
"SDN". But then rather reimplement network stacks in userspace than to
fix, complete and test those little telecom infrastructure components
which we have so far, like the SCTP protocol :(

Where are the contributions to open source telecom parts from Ericsson,
Nokia (former NSN), Huawei and the like? I'm not even dreaming about
the actual applications / network elements, but merely the maintenance
of something as basic as SCTP. To be fair, Motorola was involved early
on in the Linux SCTP code, and Huawei contributed a long series of fixes
in 2013/2014. But that's not the kind of long-term maintenance
contribution that one would normally expect from the primary interest
group in SCTP.

Finally, let me thank to the Linux SCTP maintainers. I'm not
complaining about them! They're doing a great job, given the arcane code
base and the fact that they are not working for a company that has
SCTP based products as their core business. I'm sure the would love
more support and contributions from the Telecom world, too.

April 09, 2017

As I blogged in my blog post in Fabruary, I was working towards a more
fully-featured SIGTRAN stack in the Osmocom (C-language) universe.

The trigger for this is the support of 3GPP compliant AoIP (with a
BSSAP/SCCP/M3UA/SCTP protocol stacking), but it is of much more general
nature.

The code has finally matured in my development branch(es) and is now
ready for mainline inclusion. It's a series of about 77 (!) patches,
some of which already are the squashed results of many more incremental
development steps.

The result is as follows:

General SS7 core functions maintaining links, linksets and routes

xUA functionality for the various User Adaptations (currently SUA and M3UA supported)

SCCP and SUA share one implementation, where SCCP messages are
transcoded into SUA before processing, and re-encoded into SCCP after
processing, as needed.

I have already done experimental OsmoMSC and OsmoHNB-GW over to libosmo-sigtran.
They're now all just M3UA clients (ASPs) which connect to osmo-stp
to exchange SCCP messages back and for the between them.

implement BSSAP / A-interface procedures in OsmoMSC, on top of the
SCCP-User SAP.

If those steps are complete, we will have a single OsmoMSC that can talk
both IuCS to the HNB-GW (or RNCs) for 3G/3.5G as well as AoIP towards
OsmoBSC. We will then have fully SIGTRAN-enabled the full Osmocom
stack, and are all on track to bury the OsmoNITB that was devoid of such
interfaces.

If any reader is interested in interoperability testing with other
implementations, either on M3UA or on SCCP or even on A or Iu interface
level, please contact me by e-mail.

April 08, 2017

In my previous post, I characterized the STARSHIPRAIDER I/O circuit for high voltage fault transient performance, but was unable to adequately characterize the high speed data performance because my DSO (Rigol DS1102D) only has 100 MHz of bandwidth.

Although I did have some ideas on how to improve the performance of the current I/O circuit, it was already faster than I could measure so I had no way to know if my improvements were actually making it any better. Ideally I'd just buy an oscilloscope with several GHz of bandwidth, but I'm not made of money and those scopes tend to be in the "request a quote" price range.

The obvious solution was to build one. I already had a proven high-speed sampling architecture from my TDR project so all I had to do was repackage it as an oscilloscope and make it faster still.

The circuit was beautifully simple: an output from the FPGA drives a 50 ohm trace to a SMA connector, then a second SMA connector drives the positive input of an ADCMP572 through a 3 dB attenuator (to keep my signal within range). The negative input is driven by a cheap 12-bit I2C DAC. The comparator output is then converted from CML to LVDS and fed to the host FPGA board. Finally, a 3.3V CML output from the FPGA drives the latch enable input on the comparator.

The "ADC" algorithm is essentially the same as on my TDR. I like to think of it as an equivalent-time version of a flash ADC: rather than 256 comparators digitizing the signal once, I digitize the signal 256 times with one comparator (and of course 256 different reference voltages). The post-processing to turn the comparator outputs into 8-bit ADC codes is the same.

Unlike the TDR, however, I also do equivalent-time sampling in the time domain. The FPGA generates the sampling and PRBS clocks with different PLL outputs (at 250 MHz / 4 ns period), and sweeps the relative phase in 100 ps steps to produce an effective resolution of 10 Gsps / 100 ps timebase.

Without further ado here's a picture of the board. Total BOM cost including connectors and PCB was approximately $50.

Oscilloscope board (yes, it's PMOD form factor!)

After some initial firmware development I was able to get some preliminary eye renders off the board. They were, to say the least, not ideal.

250 Mbps: very bumpy rise

500 Mbps: significant eye closure even with increased drive strength

I spent quite a while tracking down other bugs before dealing with the signal integrity issues. For example, a low-frequency pulse train showed up with a very uneven duty cycle:

Duty cycle distortion

Someone suggested that I try a slow rise time pulse to show the distortion more clearly. Not having a proper arbitrary waveform generator, I made do with a squarewave and R-C lowpass filter.

It appeared that I had jump discontinuities in my waveform every two blocks (color coding)

I don't have an EE degree, but I can tell this looks wrong!

Interestingly enough, two blocks (of 32 samples each) were concatenated into a single JTAG transfer. These two were read in one clock cycle and looked fine, but the junction to the next transfer seemed to be skipping samples.

As it turned out, I had forgotten to clear a flag which led to me reading the waveform data before it was done capturing. Since the circular buffer was rotating in between packets, some samples never got sent.

The next bug required zooming into the waveform a bit to see. The samples captured on the first few (the number seemed to vary across bitstream builds) of my 40 clock phases were showing up shifted by 4 ns (one capture clock).

Horizontally offset samples

I traced this issue to a synchronizer between clock domains having variable latency depending on the phase offset of the source and destination clocks. This is an inherent issue in clock domain crossing, so I think I'm just going to have to calibrate it out somehow. For the short term I'm manually measuring the number of offset phases each time I recompile the FPGA image, and then correcting the data in post-processing.

The final issue was a hardware bug. I was terminating the incoming signal with a 50Ω resistor to ground. Although this had good AC performance, at DC the current drawn from a high-level input was quite significant (66 mA at 3.3V). Since my I/O pins can't drive this much, the line was dragged down.

I decided to rework the input termination to replace the 50Ω terminator with split 100Ω resistors to 3.3V and ground. This should have about half the DC current draw, and is Thevenin equivalent to a 50Ω terminator to 1.65V. As a bonus, the mid-level termination will also allow me to AC-couple the incoming signal if that becomes necessary.

Add 10 nF high speed decoupling cap to help compensate for inductance of long feeder trace

I cleaned off all of the flux residue and ran a second set of eye loopback tests at 250 and 500 Mbps. The results were dramatically improved:

Post-rework at 250 Mbps

Post-rework at 500 Mbps

While not perfect, the new eye openings are a lot cleaner. I hope to tweak my input stage further to reduce probing artifacts, but for the time being I think I have sufficient performance to compare multiple STARSHIPRAIDER test circuits and see how they stack up at relatively high speeds.

Next step: collect some baseline data for the current STARSHIPRAIDER characterization board, then use that to inform my v0.2 I/O circuit!

A number of people guessed it was a datalogger of some type, but didn’t quite identify the manufacturer or model correctly. That being said, I found Josh Myer’s response an interesting read, so I’ll give the prize to him. Congrats, email me for your prize!

March 30, 2017

We intend to make this more of a "WR tutorial." and we think there will be something to learn and discuss for everybody: newcomers, casual users and even experts.

Online registration will open on April 17. Registration for the workshop is independent of registration to the conference. If you register, it will be a great pleasure to see you there. Also, please send me comments on the program if you have any. We still have a bit of freedom to change it if need be.

And of course, please forward this to any other people you think could be interested!

March 29, 2017

Netdev 2.1 is the fourth edition of the technical conference on Linux networking. This conference is driven by the community and focus on both the kernel networking subsystems (device drivers, net stack, protocols) and their use in user-space.

This edition will be held in Montreal, Canada, April 6 to 8, and the schedule has been posted recently, featuring amongst other things a talk giving an overview and the current status display of the Distributed Switch Architecture (DSA) or a workshop about how to enable drivers to cope with heavy workloads, to improve performances.

At Free Electrons, we regularly work on networking related topics, especially as part of our Linux kernel contribution for the support of Marvell or Annapurna Labs ARM SoCs. Therefore, we decided to attend our first Netdev conference to stay up-to-date with the network subsystem and network drivers capabilities, and to learn from the community latest developments.

Our engineer Antoine Ténart will be representing Free Electrons at this event. We’re looking forward to being there!

March 26, 2017

April 21st is approaching fast, so here some updates. I'm particularly
happy that we now have travel grants available. So if the travel
expenses were preventing you from attending so far: This excuse is no
longer valid!

Get your ticket now, before it is too late. There's a limited number of
seats available.

OsmoCon 2017 Schedule

As you can see, the day is fully packed with talks about Osmocom
cellular infrastructure projects. We had to cut some talk slots short
(30min instead of 45min), but I'm confident that it is good to cover a
wider range of topics, while at the same time avoiding fragmenting the
audience with multiple tracks.

OsmoCon 2017 Travel Grants

We are happy to announce that we have received donations to permit for
providing travel grants!

This means that any attendee who is otherwise not able to cover their
travel to OsmoCon 2017 (e.g. because their interest in Osmocom is not
related to their work, or because their employer doesn't pay the travel
expenses) can now apply for such a travel grant.

OsmoCon 2017 Social Event

Tech Talks are nice and fine, but what many people enjoy even more at
conferences is the informal networking combined with good food. For
this, we have the social event at night, which is open to all attendees.

Instead of ordering more of the old (v2) design, I decided to do some
improvements in the next version:

add mounting holes so the PCB can be mounted via M3 screws

add U.FL and SMA sockets, so the modems are connected via a short U.FL
to U.FL cable, and external antennas or other RF components can be
attached via SMA. This provides strain relief for the external
antenna or cabling and avoids tearing off any of the current loose
U.FL to SMA pigtails

flip the SIM slot to the top side of the PCB, so it can be accessed
even after mounting the board to some base plate or enclosure via the
mounting holes

more meaningful labeling of the silk screen, including the purpose of
the jumpers and the input voltage.

A software rendering of the resulting v3 PCB design files that I just
sent for production looks like this:

March 21, 2017

As I just wrote in my post about TelcoSecDay, I sometimes
worry about the choices I made with Osmocom, particularly when I see
all the great stuff people doing in fields that I previously was working
in, such as applied IT security as well as Linux Kernel development.

History

When people like Dieter, Holger and I started to play with what later
became OpenBSC, it was just for fun. A challenge to master. A closed
world to break open and which to attack with the tools, the mindset and
the values that we brought with us.

Later, Holger and I started to do freelance development for commercial
users of Osmocom (initially basically only OpenBSC, but then OsmoSGSN,
OsmoBSC, OsmoBTS, OsmoPCU and all the other bits on the infrastructure
side). This lead to the creation of sysmocom in 2011, and ever since we
are trying to use revenue from hardware sales as well as development
contracts to subsidize and grow the Osmocom projects. We're investing
most of our earnings directly into more staff that in turn works on
Osmocom related projects.

NOTE

It's important to draw the distinction betewen the Osmocom cellular
infrastructure projects
which are mostly driven by commercial users and sysmocom these days,
and all the many other pure juts-for-fun community projects under
the Osmocom umbrella, like OsmocomTETRA, OsmocomGMR, rtl-sdr, etc.
I'm focussing only on the cellular infrastructure projects, as they
are in the center of my life during the past 6+ years.

In order to do this, I basically gave up my previous career[s] in IT
security and Linux kernel development (as well as put things like
gpl-violations.org on hold). This is a big price to pay for crating
more FOSS in the mobile communications world, and sometimes I'm a bit
melancholic about the "old days" before.

Financial wealth is clearly not my primary motivation, but let me be
honest: I could have easily earned a shitload of money continuing to do
freelance Linux kernel development, IT security or related consulting.
There's a lot of demand for related skills, particularly with some
experience and reputation attached. But I decided against it, and
worked several years without a salary (or almost none) on Osmocom
related stuff [as did Holger].

But then, even with all the sacrifices made, and the amount of revenue
we can direct from sysmocom into Osmocom development: The complexity of
cellular infrastructure vs. the amount of funding and resources is always
only a fraction of what one would normally want to have to do a proper
implementation. So it's constant resource shortage, combined with lots
of unpaid work on those areas that are on the immediate short-term
feature list of customers, and that nobody else in the community feels
like he wants to work on. And that can be a bit frustrating at times.

Is it worth it?

So after 7 years of OpenBSC, OsmocomBB and all the related projects, I'm
sometimes asking myself whether it has been worth the effort, and
whether it was the right choice.

It was right from the point that cellular technology is still an area
that's obscure and unknown to many, and that has very little FOSS
(though Improving!). At the same time, cellular networks are becoming
more and more essential to many users and applications. So on an
abstract level, I think that every step in the direction of FOSS for
cellular is as urgently needed as before, and we have had quite some
success in implementing many different protocols and network elements.
Unfortunately, in most cases incompletely, as the amount of funding
and/or resources were always extremely limited.

Satisfaction/Happiness

On the other hand, when it comes to metrics such as personal
satisfaction or professional pride, I'm not very happy or satisfied.
The community remains small, the commercial interest remains limited,
and as opposed to the Linux world, most players have a complete lack of
understanding that FOSS is not a one-way road, but that it is important
for all stakeholders to contribute to the development in terms of
development resources.

Project success?

I think a collaborative development project (which to me is what FOSS is
about) is only then truly successful, if its success is not related to
a single individual, a single small group of individuals or a single
entity (company). And no matter how much I would like the above to be
the case, it is not true for the Osmocom cellular infrastructure
projects. Take away Holger and me, or take away sysmocom, and I think
it would be pretty much dead. And I don't think I'm exaggerating here.
This makes me sad, and after all these years, and after knowing quite a
number of commercial players using our software, I would have hoped that
the project rests on many more shoulders by now.

This is not to belittle the efforts of all the people contributing to
it, whether the team of developers at sysmocom, whether those in the
community that still work on it 'just for fun', or whether those
commercial users that contract sysmocom for some of the work we do.
Also, there are known and unknown donors/funders, like the NLnet
foundation for some parts of the work. Thanks to all of you, and
clearly we wouldn't be where we are now without all of that!

But I feel it's not sufficient for the overall scope, and it's not [yet]
sustainable at this point. We need more support from all sides,
particularly those not currently contributing. From vendors of BTSs and
related equipment that use Osmocom components. From operators that use
it. From individuals. From academia.

Yes, we're making progress. I'm happy about new developments like the
Iu and Iuh support, the OsmoHLR/VLR split and 2G/3G authentication that Neels just blogged about. And
there's progress on the SIMtrace2 firmware with card emulation and MITM,
just as well as there's progress on libosmo-sigtran (with a more
complete SUA, M3UA and connection-oriented SCCP stack), etc.

But there are too little people working on this, and those people are
mostly coming from one particular corner, while most of the [commercial]
users do not contribute the way you would expect them to contribute in
collaborative FOSS projects. You can argue that most people in the
Linux world also don't contribute, but then the large commercial
beneficiaries (like the chipset and hardware makers) mostly do, as are
the large commercial users.

All in all, I have the feeling that Osmocom is as important as it
ever was, but it's not grown up yet to really walk on its own feet. It
may be able to crawl, though ;)

So for now, don't panic. I'm not suffering from burn-out, mid-life
crisis and I don't plan on any big changes of where I put my energy: It
will continue to be Osmocom. But I also think we have to have a more
open discussion with everyone on how to move beyond the current
situation. There's no point in staying quiet about it, or to claim that
everything is fine the way it is. We need more commitment. Not from
the people already actively involved, but from those who are not [yet].

If that doesn't happen in the next let's say 1-2 years, I think it's
fair that I might seriously re-consider in which field and in which way
I'd like to dedicate my [I would think considerable] productive energy and
focus.

I'm just on my way back from the Telecom Security Day 2017
<https://www.troopers.de/troopers17/telco-sec-day/>, which is an
invitation-only event about telecom security issues hosted by ERNW
back-to-back with their Troopers 2017 <https://www.troopers.de/troopers17/>
conference.

I've been presenting at TelcoSecDay in previous years and hence was
again invited to join (as attendee). The event has really gained quite
some traction. Where early on you could find lots of IT security /
hacker crowds, the number of participants from the operator (and to
smaller extent also equipment maker) industry has been growing.

The quality of talks was great, and I enjoyed meeting various familiar
faces. It's just a pity that it's only a single day - plus I had to
head back to Berlin still today so I had to skip the dinner + social
event.

When attending events like this, and seeing the interesting hacks that
people are working on, it pains me a bit that I haven't really been
doing much security work in recent years. netfilter/iptables was at
least somewhat security related. My work on OpenPCD / librfid was
clearly RFID security oriented, as was the work on airprobe,
OsmocomTETRA, or even the EasyCard payment system hack

I have the same feeling when attending Linux kernel development related
events. I have very fond memories of working in both fields, and it was
a lot of fun. Also, to be honest, I believe that the work in Linux
kernel land and the general IT security research was/is appreciated much
more than the endless months and years I'm now spending my time with
improving and extending the Osmocom cellular infrastructure stack.

Beyond the appreciation, it's also the fact that both the IT security
and the Linux kernel communities are much larger. There are more
people to learn from and learn with, to engage in discussions and
ping-pong ideas. In Osmocom, the community is too small (and I have the
feeling, it's actually shrinking), and in many areas it rather seems
like I am the "ultimate resource" to ask, whether about 3GPP specs or
about Osmocom code structure. What I'm missing is the feeling of being
part of a bigger community. So in essence, my current role in the "Open
Source Cellular" corner can be a very lonely one.

But hey, I don't want to sound more depressed than I am, this was
supposed to be a post about TelcoSecDay. It just happens that attending
IT Security and/or Linux Kernel events makes me somewhat gloomy for the
above-mentioned reasons.

Meanwhile, if you have some interesting projcets/ideas at the border
between cellular protocols/systems and security, I'd of course love to
hear if there's some way to get my hands dirty in that area again :)

March 16, 2017

At the Octave conference in Geneva the presentation Support of free software in public institutions: the KiCad case will be given by Javier Serrano and Tomasz Wlostowski from CERN.

KiCad is a tool to help electronics designers develop Printed Circuit Boards (PCB). CERN's BE-CO-HT section has been contributing to its development since 2011. These efforts are framed in the context of CERN's activities regarding Open Source Hardware (OSHW), and are meant to provide an environment where design files for electronics can be shared in an efficient way, without the hurdles imposed by the use of proprietary formats.

The talk will start by providing some context about OSHW and the importance of using Free Software tools for sharing design files. We will then move on to a short KiCad tutorial, and finish with some considerations about the role public institutions can play in developing and fostering the use of Free Software, and whether some of the KiCad experience can apply in other contexts.

I’ve learned a lot working with Tim. I also respect his work ethic and he is a steadfast contributor to the open source community. This would be an excellent summer opportunity for any student interested in system-level hardware hacking!

March 10, 2017

Last month, five engineers from Free Electrons participated to the Embedded Linux Conference in Portlan, Oregon. It was once again a great conference to learn new things about embedded Linux and the Linux kernel, and to meet developers from the open-source community.

Free Electrons team at work at ELC 2017, with Maxime Ripard, Antoine Ténart, Mylène Josserand and Quentin Schulz

Free Electrons selection of talks

Of course, the slides from many other talks are progressively being uploaded, and the Linux Foundation published the video recordings in a record time: they are all already available on Youtube!

Below, each Free Electrons engineer who attended the conference has selected one talk he/she has liked, and gives a quick summary of the talk, hopefully to encourage you watch the corresponding video recording.

Using SWupdate to Upgrade your system, Gabriel Huau

Gabriel Huau from Witekio did a great talk at ELC about SWUpdate, a tool created by Denx to update your system. The talk gives an overview of this tool, how it is working and how to use it. Updating your system is very important for embedded devices to fix some bugs/security fixes or add new features, but in an industrial context, it is sometimes difficult to perform an update: devices not easily accessible, large number of devices and variants, etc. A tool that can update the system automatically or even Over The Air (OTA) can be very useful. SWUpdate is one of them.

SWUpdate allows to update different parts of an embedded system such as the bootloader, the kernel, the device tree, the root file system and also the application data.
It handles different image types: UBI, MTD, Raw, Custom LUA, u-boot environment and even your custom one. It includes a notifier to be able to receive feedback about the updating process which can be useful in some cases. SWUPdate uses different local and OTA/remote interfaces such as USB, SD card, HTTP, etc. It is based on a simple update image format to indicate which images must be updated.

Many customizations can be done with this tool as it is provided with the classic menuconfig configuration tool. One great thing is that this tool is supported by Yocto Project and Buildroot so it can be easily tested.

GCC/Clang Optimizations for embedded Linux, Khem Raj

Khem Raj from Comcast is a frequent speaker at the Embedded Linux Conference, and one of his strong fields of expertise is C compilers, especially LLVM/Clang and Gcc. His talk at this conference can interest anyone developing code in the C language, to know about optimizations that the compilers can use to improve the performance or size of generated binaries. See the video and slides.

One noteworthy optimization is Clang’s -Oz (Gcc doesn’t have it), which goes even beyond -Os, by disabling loop vectorization. Note that Clang already performs better than Gcc in terms of code size (according to our own measurements). On the topic of bundle optimizations such as -O2 or -Os, Khem added that specific optimizations can be disabled in both compilers through the -fno- command line option preceding the name of a given optimization. The name of each optimization in a given bundle can be found through the -fverbose-asm command line option.

Another new optimization option is -Og, which is different from the traditional -g option. It still allows to produce code that can be debugged, but in a way that provides a reasonable level of runtime performance.

On the performance side, he also recalled the Feedback-Directed Optimizations (FDO), already covered in earlier Embedded Linux Conferences, which can be used to feed the compiler with profiler statistics about code branches. The compiler can use such information to optimize branches which are the more frequent at run-time.

Khem’s last advise was not to optimize too early, and first make sure you do your debugging and profiling work first, as heavily optimized code can be very difficult to debug. Therefore, optimizations are for well-proven code only.

Note that Khem also gave a similar talk in the IoT track for the conference, which was more focused on bare-metal code optimization code and portability: “Optimizing C for microcontrollers” (slides, video).

This talk was about the journey of a new comer in the mainline kernel community to fix the DisplayPort support in Intel i915 DRM driver. It first presented what happens from the moment we plug a cable in a monitor until we actually see an image, then where the driver is in the kernel: in the DRM subsystem, between the hardware (an Intel Integrated Graphics device) and the libdrm userspace library on which userspace applications such as the X server rely.

The bug to fix was that case when the driver would fail after updating to the requested resolution for a DP link. The other existing drivers usually fail before updating the resolution, so Manasi had to add a way to tell the userspace the DP link failed after updating the resolution. Such addition would be useless without applications using this new information, therefore she had to work with their developers to make the applications behave correctly when reading this important information.

With a working set of patches, she thought she had done most of the work with only the upstreaming left and didn’t know it would take her many versions to make it upstream. She wished to have sent a first version of a driver for review earlier to save time over the whole development plus upstreaming process. She also had to make sure the changes in the userspace applications will be ready when the driver will be upstreamed.

The talk was a good introduction on how DisplayPort works and an excellent example on why involving the community even in early stages of the development process may be a good idea to quicken the overall driver development process by avoiding complete rewriting of some code parts when upstreaming is under way.

Timekeeping in the Linux Kernel, Stephen Boyd

Stephen did a great talk about one thing that is often overlooked, and really shouldn’t: Timekeeping. He started by explaining the various timekeeping mechanisms, both in hardware and how Linux use them. That meant covering the counters, timers, the tick, the jiffies, and the various POSIX clocks, and detailing the various frameworks using them. He also explained the various bugs that might be encountered when having a too naive counter implementation for example, or using the wrong POSIX clock from an application.

Android Things, Karim Yaghmour

Karim did a very good introduction to Android Things. His talk was a great overview of what this new OS from Google targeting embedded devices is, and where it comes from. He started by showing the history of Android, and he explained what this system brought to the embedded market. He then switched to the birth of Android Things; a reboot of Google’s strategy for connected devices. He finally gave an in depth explanation of the internals of this new OS, by comparing Android Things and Android, with lots of examples and demos.

Android Things replaces Brillo / Weave, and unlike its predecessor is built reusing available tools and services. It’s in fact a lightweight version of Android, with many services removed and a few additions like the PIO API to drive GPIO, I2C, PWM or UART controllers. A few services were replaced as well, most notably the launcher. The result is a not so big, but not so small, system that can run on headless devices to control various sensors; with an Android API for application developers.

One thing to keep in mind: The Linux Foundation is an industry
association, it exists to act in the joint interest of it's paying
members. It is not a charity, and it does not act for the public good.
I know and respect that, while some people sometimes appear to be
confused about its function.

I wouldn't have any issue if VMware would (prior to joining LF) have
said: Ok, we had some bad policies in the past, but now we fully comply
with the license of the Linux kernel, and we release all
derivative/collective works in source code. This would be a positive
spin: Acknowledge past issues, resolve the issues, become clean and then
publicly underlining your support of Linux by (among other things)
joining the Linux Foundation. I'm not one to hold grudges against
people who accept their past mistakes, fix the presence and then move
on. But no, they haven't fixed any issues.

They are having one of the worst track records in terms of intentional
GPL compliance issues for many years, showing outright disrespect for Linux,
the GPL and ultimately the rights of the Linux developers, not resolving
those issues and at the same time joining the Linux Foundation? What
kind of message sends that?

It sends the following messages:

you can abuse Linux, the GPL and copyleft while still being accepted
amidst the Linux Foundation Members

it means the Linux Foundations has no ethical concerns whatsoever
about accepting such entities without previously asking them to become
clean

it also means that VMware has still not understood that Linux and FOSS
is about your actions, particularly the kind of choices you make how
to technically work with the community, and not against it.

So all in all, I think this move has seriously damaged the image of both
entities involved. I wouldn't have expected different of VMware, but I
would have hoped the Linux Foundation had some form of standards as to
which entities they permit amongst their ranks. I guess I was being
overly naive :(

It's a slap in the face of every developer who writes code not because
he gets paid, but because it is rewarding to know that copyleft will
continue to ensure the freedom of related code.

UPDATE (March 8, 2017):

I was mistaken in my original post in that VMware didn't just join,
but was a Linux Foundation member already before, it is "just" their
upgrade from silver to gold that made the news recently. I stand
corrected. Still doesn't make it any better that the are involved
inside LF while engaging in stepping over the lines of license
compliance.

UPDATE2 (March 8, 2017):

As some people pointed out, there is no verdict against VMware. Yes,
that's true. But the mere fact that they rather distribute derivative
works of GPL licensed software and take this to court with an armada
of lawyers (instead of simply complying with the license like everyone
else) is sad enough. By the time there will be a final verdict, the
product is EOL. That's probably their strategy to begin with :/

I always though I understood UMTS AKA (authentication and key
agreement), including the re-synchronization procedure. It's been years
since I wrote tools like osmo-sim-auth which you can use to
perform UMTS AKA with a SIM card inserted into a PC reader, i.e.
simulate what happens between the AUC (authentication center) in a
network and the USIM card.

However, it is only now as the sysmocom team works on 3G support of the
dedicated OsmoHLR (outside of
OsmoNITB!), that I seem to understand all the nasty little details.

I always thought for re-synchronization it is sufficient to simply
increment the SQN (sequence number). It turns out, it isn't as there is
a MSB-portion called SEQ and a lower-bit portion called IND, used for
some fancy array indexing scheme of buckets of highest-used-SEQ within
that IND bucket.

If you're interested in all the dirty details and associated spec
references (the always hide the important parts in some Annex) see the
discussion between Neels and me in Osmocom redmine issue 1965.

March 05, 2017

For those of you who don't know what the tinkerphones/OpenPhoenux GTA04 is: It is a
'professional hobbyist' hardware project (with at least public
schematics, even if not open hardware in the sense that editable
schematics and PCB design files are published) creating updated
mainboards that can be used to upgrade Openmoko phones. They fit into
the same enclosure and can use the same display/speaker/microphone.

What the GTA04 guys have been doing for many years is close to a miracle
anyway: Trying to build a modern-day smartphone in low quantities,
using off-the-shelf components available in those low quantities, and
without a large company with its associated financial backing.

Smartphones are complex because they are highly integrated devices. A
seemingly unlimited amount of components is squeezed in the tiniest
form-factors. This leads to complex circuit boards with many layers
that take a lot of effort to design, and are expensive to build in low
quantities. The fine-pitch components mandated by the integration
density is another issue.

Building the original GTA01 (Neo1937) and GTA02 (FreeRunner) devices at
Openmoko, Inc. must seem like a piece of cake compared to what the GTA04
guys are up to. We had a team of engineers that were familiar at last
with feature phone design before, and we had the backing of a consumer
electronics company with all its manufacturing resources and expertise.

Nevertheless, a small group of people around Dr. Nikolaus Schaller has
been pushing the limits of what you can do in a small for fun
project, and the have my utmost respect. Well done!

According to the mailing list posts, it seems to be incredibly difficult
to solder the PoP stack due to the way TI has designed the packaging of
the DM3730. If you want more gory details, see
this post
and yet another post.

It is very sad to see that what appears to be bad design choices at TI
are going to bring the GTA04 project to a halt. The financial hit by
having only 33% yield is already more than the small community can take,
let alone unused parts that are now in stock or even thinking about
further experiments related to the manufacturability of those chips.

If there's anyone with hands-on manufacturing experience on the DM3730
(or similar) TI PoP reading this: Please reach out to the GTA04 guys and
see if there's anything that can be done to help them.

UPDATE (March 8, 2017):

In an earlier post I was asserting that the GTA04 is open hardware
(which I actually believed up to that point) until some readers have
pointed out to me that it isn't. It's sad it isn't, but still it has
my sympathies.

March 04, 2017

The 2017.02 version of Buildroot has been released recently, and as usual Free Electrons has been a significant contributor to this release. A total of 1369 commits have gone into this release, contributed by 110 different developers.

Before looking in more details at the contributions from Free Electrons, let’s have a look at the main improvements provided by this release:

The big announcement is that 2017.02 is going to be a long term support release, maintained with security and other important fixes for one year. This will allow companies, users and projects that cannot upgrade at each Buildroot release to have a stable Buildroot version to work with, coming with regular updates for security and bug fixes. A few fixes have already been collected in the 2017.02.x branch, and regular point releases will be published.

Several improvements have been made to support reproducible builds, i.e the capability of having two builds of the same configuration provide the exact same bit-to-bit output. These are not enough to provide reproducible builds yet, but they are a piece of the puzzle, and more patches are pending for the next releases to move forward on this topic.

A package infrastructure for packages using the waf build system has been added. Seven packages in Buildroot are using this infrastructure currently.

Support for the OpenRISC architecture has been added, as well as improvements to the support of ARM64 (selection of ARM64 cores, possibility of building an ARM 32-bit system optimized for an ARM64 core).

The external toolchain infrastructure, which was all implemented in a single very complicated package, has been split into one package per supported toolchain and a common infrastructure. This makes it much easier to maintain.

A number of updates has been made to the toolchain components and capabilities: uClibc-ng bumped to 1.0.22 and enabled for ARM64, mips32r6 and mips64r6, gdb 7.12.1 added and switched to gdb 7.11 as the default, Linaro toolchains updated to 2016.11, ARC toolchain components updated to arc-2016.09, MIPS Codescape toolchains bumped to 2016.05-06, CodeSourcery AMD64 and NIOS2 toolchains bumped.

Eight new defconfigs for various hardware platforms have been added, including defconfigs for the NIOSII and OpenRISC Qemu emulation.

Sixty new packages have been added, and countless other packages have been updated or fixed.

Buildroot developers at work during the Buildroot Developers meeting in February 2017, after the FOSDEM conference in Brussels.

More specifically, the contributions from Free Electrons have been:

Thomas Petazzoni has handled the release of the first release candidate, 2017.02-rc1, and merged 742 patches out of the 1369 commits merged in this release.

Thomas contributed the initial work for the external toolchain infrastructure rework, which has been taken over by Romain Naour and finally merged thanks to Romain’s work.

Thomas contributed the rework of the ARM64 architecture description, to allow building an ARM 32-bit system optimized for a 64-bit core, and to allow selecting specific ARM64 cores.

Thomas contributed the raspberrypi-usbboot package, which packages a host tool that allows to boot a RaspberryPi system over USB.

Thomas fixed a large number of build issues found by the project autobuilders, contributing 41 patches to this effect.

Mylène Josserand contributed a patch to the X.org server package, fixing an issue with the i.MX6 OpenGL acceleration.

Gustavo Zacarias contributed a few fixes on various packages.

In addition, Free Electrons sponsored the participation of Thomas to the Buildroot Developers meeting that took place after the FOSDEM conference in Brussels, early February. A report of this meeting is available on the eLinux Wiki.

The Ware for January 2017 is a Philips Norelco shaver, which recently died so I thought I’d take it apart and see what’s inside. It’s pretty similar to the previous generation shaver I was using. Hard to pick a winner — Jimmyjo got the thread on the right track, Adrian got the reference to the prior blog post…from 8 years ago. I think I’ll run with with Jimmyjo as the winner though, since it looks from the time stamps he was the first to push the thread into the general category of electric shaver. Congrats, email me to claim your prize (again)!

February 24, 2017

After 8 release candidates, Linus Torvalds released the final 4.10 Linux kernel last Sunday. A total of 13029 commits were made between 4.9 and 4.10. As usual, LWN had a very nice coverage of the major new features added during the 4.10 merge window: part 1, part 2 and part 3. The KernelNewbies Wiki has an updated page about 4.10 as well.

On the total of 13029 commits, 116 were made by Free Electrons engineers, which interestingly is exactly the same number of commits we made for the 4.9 kernel release!

Our main contributions for this release have been:

For Atmel platforms, Alexandre Belloni added support for the securam block of the SAMA5D2, which is needed to implement backup mode, a deep suspend-to-RAM state for which we will be pushing patches over the next kernel releases. Alexandre also fixed some bugs in the Atmel dmaengine and USB gadget drivers.

For Allwinner platforms

Antoine Ténart enabled the 1-wire controller on the CHIP platform

Boris Brezillon fixed an issue in the NAND controller driver, that prevented from using ECC chunks of 512 bytes.

Maxime Ripard added support for the CHIP Pro platform from NextThing, together with many addition of features to the underlying SoC, the GR8 from Nextthing.

Boris Brezillon added support for the Video Encoder IP, which provides composite output. See also our recent blog post about our RaspberryPi work.

Boris Brezillon made a number of improvements to clock support on the RaspberryPi, which were needed for the Video Encoder IP support.

For the Marvell ARM platform

Grégory Clement enabled networking support on the Marvell Armada 3700 SoC, a Cortex-A53 based processor.

Grégory Clement did a large number of cleanups in the Device Tree files of Marvell platforms, fixing DTC warnings, and using node labels where possible.

Romain Perier contributed a brand new driver for the SPI controller of the Marvell Armada 3700, and therefore enabled SPI support on this platform.

Romain Perier extended the existing i2c-pxa driver to support the Marvell Armada 3700 I2C controller, and enabled I2C support on this platform.

Romain Perier extended the existing hardware number generator driver for OMAP to also be usable for SafeXcel EIP76 from Inside Secure. This allows to use this driver on the Marvell Armada 7K/8K SoC.

Romain Perier contributed support for the Globalscale EspressoBin board, a low-cost development board based on the Marvell Armada 3700.

Romain Perier did a number of fixes to the CESA driver, used for the cryptographic engine found on 32-bit Marvell SoCs, such as Armada 370, XP or 38x.

Thomas Petazzoni fixed a bug in the mvpp2 network driver, currently only used on Marvell Armada 375, but in the process of being extended to be used on Marvell Armada 7K/8K as well.

As the maintainer of the MTD NAND subsystem, Boris Brezillon did a few cleanups in the Tango NAND controller driver, added support for the TC58NVG2S0H NAND chip, and improved the core NAND support to accommodate controllers that have some special timing requirements.

As the maintainer of the RTC subsystem, Alexandre Belloni did a number of small cleanups and improvements, especially to the jz4740

During the second half of 2016, the code basically stayed untouched. In
early 2017, several patch series of (at least) three authors have been
published on the netdev mailing list for review and merge.

This poses the very valid question on how do we test those (sometimes
quite intrusive) changes. Setting up a complete cellular network with
either GPRS/EGPRS or even UMTS/HSPA is possible using OsmoSGSN and
related Osmocom components. But it's of course a luxury that not many
Linux kernel networking hackers have, as it involves the availability of
a supported GSM BTS or UMTS hNodeB. And even if that is available,
there's still the issue of having a spectrum license, or a wired setup
with coaxial cable.

So as part of the recent discussions on netdev, I tested and described a
minimal test setup using libgtpnl, OpenGGSN and sgsnemu.

This setup will start a mobile station + SGSN emulator inside a Linux
network namespace, which talks GTP-C to OpenGGSN on the host, as well as
GTP-U to the Linux kernel GTP-U implementation.

This is of course just for manual testing, and for functional (not
performance) testing only. It would be great if somebody would pick up
on my recent mail containing some suggestions about an automatic
regression testing setup for the kernel GTP-U code. I have way
too many spare-time projects in desperate need of some attention to work
on this myself. And unfortunately, none of the telecom operators (who
are the ones benefiting most from a Free Software accelerated GTP-U
implementation) seems to be interested in at least co-funding or
otherwise contributing to this effort :/

February 20, 2017

For a few months, Free Electrons has been helping the Raspberry Pi Foundation upstream to the Linux kernel a number of display related features for the Rasperry Pi platform.

The main goal behind this upstreaming process is to get rid of the closed-source firmware that is used on non-upstream kernels every time you need to enable/access a specific hardware feature, and replace it by something that is both open-source and compliant with upstream Linux standards.

Eric Anholt has been working hard to upstream display related features. His biggest contribution has certainly been the open-source driver for the VC4 GPU, but he also worked on the display controller side, and we were contracted to help him with this task.

Our first objective was to add support for SDTV (composite) output, which appeared to be much easier than we imagined. As some of you might already know, the display controller of the Raspberry Pi already has a driver in the DRM subsystem. Our job was to add support for the SDTV encoder (also called VEC, for Video EnCoder). The driver has been submitted just before the 4.10 merge window and surprisingly made it into 4.10 (see also the patches). Eric Anholt explained on his blog:

The Raspberry Pi Foundation recently started contracting with Free Electrons to give me some support on the display side of the stack. Last week I got to review and release their first big piece of work: Boris Brezillon’s code for SDTV support. I had suggested that we use this as the first project because it should have been small and self contained. It ended up that we had some clock bugs Boris had to fix, and a bug in my core VC4 CRTC code, but he got a working patch series together shockingly quickly. He did one respin for a couple more fixes once I had tested it, and it’s now out on the list waiting for devicetree maintainer review. If nothing goes wrong, we should have composite out support in 4.11 (we’re probably a week late for 4.10).

Our second objective was to help Eric with HDMI audio support. The code has been submitted on the mailing list 2 weeks ago and will hopefully be queued for 4.12. This time on, we didn’t write much code, since Eric already did the bulk of the work. What we did though is debugging the implementation to make it work. Eric also explained on his blog:

Probably the biggest news of the last two weeks is that Boris’s native HDMI audio driver is now on the mailing list for review. I’m hoping that we can get this merged for 4.12 (4.10 is about to be released, so we’re too late for 4.11). We’ve tested stereo audio so far, no compresesd audio (though I think it should Just Work), and >2 channel audio should be relatively small amounts of work from here. The next step on HDMI audio is to write the alsalib configuration snippets necessary to hide the weird details of HDMI audio (stereo IEC958 frames required) so that sound playback works normally for all existing userspace, which Boris should have a bit of time to work on still.

On our side, it has been a great experience to work on such topics with Eric, and you should expect more contributions from Free Electrons for the Raspberry Pi platform in the next months, so stay tuned!

February 15, 2017

I've recently attended a seminar that (among other topics) also covered
RF interference hunting. The speaker was talking about various
real-world cases of RF interference and illustrating them in detail.

Of course everyone who has any interest in RF or cellular will know
about fundamental issues of radio frequency interference. To the
biggest part, you have

cells of the same operator interfering with each other due to too
frequent frequency re-use, adjacent channel interference, etc.

cells of different operators interfering with each other due to
intermodulation products and the like

cells interfering with cable TV, terrestrial TV

DECT interfering with cells

cells or microwave links interfering with SAT-TV reception

all types of general EMC problems

But what the speaker of this seminar covered was actually a cellular
base-station being re-broadcast all over Europe via a commercial
satellite (!).

It is a well-known fact that most satellites in the sky are basically
just "bent pipes", i.e. they consist of a RF receiver on one frequency,
a mixer to shift the frequency, and a power amplifier. So basically
whatever is sent up on one frequency to the satellite gets
re-transmitted back down to earth on another frequency. This is abused
by "satellite hijacking" or "transponder hijacking" and has been covered
for decades in various publications.

Ok, but how does cellular relate to this? Well, apparently some people
are running VSAT terminals (bi-directional satellite terminals) with
improperly shielded or broken cables/connectors. In that case, the RF
emitted from a nearby cellular base station leaks into that cable, and
will get amplified + up-converted by the block up-converter of that VSAT
terminal.

The bent-pipe satellite subsequently picks this signal up and
re-transmits it all over its coverage area!

I've tried to find some public documents about this, an there's
surprisingly little public information about this phenomenon.

It describes a surprisingly manual and low-tech approach at hunting down
the source of the interference by using an old nokia net-monitor phone
to display the MCC/MNC/LAC/CID of the cell. Even in 2011 there were
already open source projects such as airprobe that could have done the
job based on sampled IF data. And I'm not even starting to consider
proprietary tools.

It should be relatively simple to have a SDR that you can tune to a
given satellite transponder, and which then would look for any
GSM/UMTS/LTE carrier within its spectrum and dump their identities in a
fully automatic way.

But then, maybe it really doesn't happen all that often after all to
rectify such a development...

February 13, 2017

When working on optimizing the power consumption of a board we need a way to measure its consumption. We recently bought an ACME from BayLibre to do that.

Overview of the ACME

The ACME is an extension board for the BeagleBone Black, providing multi-channel power and temperature measurements capabilities. The cape itself has eight probe connectors allowing to do multi-channel measurements. Probes for USB, Jack or HE10 can be bought separately depending on boards you want to monitor.

Last but not least, the ACME is fully open source, from the hardware to the software.

First setup

Ready to use pre-built images are available and can be flashed on an SD card. There are two different images: one acting as a standalone device and one providing an IIO capture daemon. While the later can be used in automated farms, we chose the standalone image which provides user-space tools to control the probes and is more suited to power consumption development topics.

Using the ACME

To control the probes and get measured values the Sigrok software is used. There is currently no support to send data over the network. Because of this limitation we need to access the BeagleBone Black shell through SSH and run our commands there.

The driver has four parameters (continuous sampling, sample limit, time limit and sample rate) and has one probe attached with three channels (PWR, CURR and VOL). The acquisition parameters help configuring data acquisition by giving sampling limits or rates. The rates are given in Hertz, and should be within the 1 and 500Hz range when using an ACME.

For example, to sample at 20Hz and display the power consumption measured by our probe P1:

Beta image

A new image is being developed and will change the way to use the ACME. As it’s already available in beta we tested it (and didn’t come back to the stable image). This new version aims to only use IIO to provide the probes data, instead of having a custom Sigrok driver. The main advantage is many software are IIO aware, or will be, as it’s the standard way to use this kind of sensors with the Linux kernel. Last but not least, IIO provides ways to communicate over the network.

A new webpage is available to find information on how to use the beta image, on https://baylibre-acme.github.io. This image isn’t compatible with the current stable one, which we previously described.

The first nice thing to notice when using the beta image is the Bonjour support which helps us communicating with the board in an effortless way:

$ ping baylibre-acme.local

A new tool, acme-cli, is provided to control the probes to switch them on or off given the needs. To switch on or off the first probe:

$ ./acme-cli switch_on 1
$ ./acme-cli switch_off 1

We do not need any additional custom software to use the board, as the sensors data is available using the IIO interface. This means we should be able to use any IIO aware tool to gather the power consumption values:

Sigrok, on the laptop/machine this time as IIO is able to communicate over the network;

iio-capture, which is a fork of iio-readdev designed by BayLibre for an integration into LAVA (automated tests);

and many more..

Conclusion

We didn’t use all the possibilities offered by the ACME cape yet but so far it helped us a lot when working on power consumption related topics. The ACME cape is simple to use and comes with a working pre-built image. The beta image offers the IIO support which improved the usability of the device, and even though it’s in a beta version we would recommend to use it.

February 12, 2017

In the good old days ever since the late 1980ies - and a surprising
amount even still today - telecom signaling traffic is still carried
over circuit-switched SS7 with its TDM lines as physical layer, and not
an IP/Ethernet based transport.

When Holger first created OsmoBSC, the BSC-only version of OpenBSC some
7-8 years ago, he needed to implement a minimal subset of SCCP wrapped
in TCP called SCCP Lite. This was due to the simple fact that the MSC
to which it should operate implemented this non-standard protocol
stacking that was developed + deployed before the IETF SIGTRAN WG
specified M3UA or SUA came around. But even after those were specified
in 2004, the 3GPP didn't specify how to carry A over IP in a standard
way until the end of 2008, when a first A interface over IP study
was released.

As time passese, more modern MSCs of course still implement classic
circuit-switched SS7, but appear to have dropped SCCPlite in favor of
real AoIP as specified by 3GPP meanwhile. So it's time to add this to
the osmocom universe and OsmoBSC.

A couple of years ago (2010-2013) implemented both classic SS7
(MTP2/MTP3/SCCP) as well as SIGTRAN stackings (M2PA/M2UA/M3UA/SUA in
Erlang. The result has been used in some production deployments, but
only with a relatively limited feature set. Unfortunately, this code
has nto received any contributions in the time since, and I have to say
that as an open source community project, it has failed. Also, while
Erlang might be fine for core network equipment, running it on a BSC
really is overkill. Keep in miond that we often run OpenBSC on
really small ARM926EJS based embedded systems, much more resource
constrained than any single smartphone during the late decade.

In the meantime (2015/2016) we also implemented some minimal SUA support
for interfacing with UMTS femto/small cells via Iuh (see OsmoHNBGW).

So in order to proceed to implement the required
SCCP-over-M3UA-over-SCTP stacking, I originally thought well, take
Holgers old SCCP code, remove it from the IPA multiplex below, stack it
on top of a new M3UA codebase that is copied partially from SUA.

However, this falls short of the goals in several ways:

The application shouldn't care whether it runs on top of SUA or SCCP,
it should use a unified interface towards the SCCP Provider.
OsmoHNBGW and the SUA code already introduce such an interface baed on
the SCCP-User-SAP implemented using Osmocom primitives (osmo_prim).
However, the old OsmoBSC/SCCPlite code doesn't have such abstraction.

The code should be modular and reusable for other SIGTRAN stackings
as required in the future

So I found myself sketching out what needs to be done and I ended up
pretty much with a re-implementation of large parts. Not quite fun, but
definitely worth it.

And then finally stack all those bits on top of each other, rendering a
fairly clean and modern implementation that can be used with the IuCS of
the virtually unmodified OsmmoHNBGW, OsmoCSCN and OsmoSGSN for testing.

Next steps in the direction of the AoIP are:

Implementation of the MTP-SAP based on the IPA transport

Binding the new SCCP code on top of that

Converting OsmoBSC code base to use the SCCP-User-SAP for its
signaling connection

From that point onwards, OsmoBSC doesn't care anymore whether it
transports the BSSAP/BSSMAP messages of the A interface over
SCCP/IPA/TCP/IP (SCCPlite) SCCP/M3UA/SCTP/IP (3GPP AoIP), or even
something like SUA/SCTP/IP.

However, the 3GPP AoIP specs (unlike SCCPlite) actually modify the
BSSAP/BSSMAP payload. Rather than using Circuit Identifier Codes and
then mapping the CICs to UDP ports based on some secret conventions,
they actually encapsulate the IP address and UDP port information for
the RTP streams. This is of course the cleaner and more flexible
approach, but it means we'll have to do some further changes inside the
actual BSC code to accommodate this.

February 11, 2017

When implementing any kind of communication protocol, one always dreams
of some existing test suite that one can simply run against the
implementation to check if it performs correct in at least those use
cases that matter to the given application.

Of course in the real world, there rarely are protocols where this is
true. If test specifications exist at all, they are often just very
abstract texts for human consumption that you as the reader should
implement yourself.

For some (by far not all) of the protocols found in cellular networks,
every so often I have seen some formal/abstract machine-parseable test
specifications. Sometimes it was TTCN-2, and sometimes TTCN-3.

If you haven't heard about TTCN-3, it is basically a way to create
functional tests in an abstract description (textual + graphical), and
then compile that into an actual executable tests suite that you can run
against the implementation under test.

However, when I last did some research into this several years ago, I
couldn't find any Free / Open Source tools to actually use those
formally specified test suites. This is not a big surprise, as even
much more fundamental tools for many telecom protocols are missing, such
as good/complete ASN.1 compilers, or even CSN.1 compilers.

To my big surprise I now discovered that Ericsson had released their
(formerly internal) TITAN TTCN3 Toolset
as Free / Open Source Software under EPL 1.0. The project is even part
of the Eclipse Foundation. Now I'm certainly not a friend of Java or
Eclipse by all means, but well, for running tests I'd certainly not
complain.

I haven't yet had time to play with it, but it definitely is rather high
on my TODO list to try.

ETSI provides a couple of test suites in TTCN-3 for protocols like
DIAMETER, GTP2-C, DMR, IPv6, S1AP, LTE-NAS, 6LoWPAN, SIP, and others at
http://forge.etsi.org/websvn/ (It's also the first time I've seen that
ETSI has a SVN server. Everyone else is using git these days, but yes,
revision control systems rather than periodic ZIP files is definitely a
big progress. They should do that for their reference codecs and ASN.1
files, too.

I'm not sure once I'll get around to it. Sadly, there is no TTCN-3 for
SCCP, SUA, M3UA or any SIGTRAN related stuff, otherwise I would want to
try it right away. But it definitely seems like a very interesting
technology (and tool).

I was attending but not so excited by Georg Greve's OpenPOWER talk. It was a
great talk, and it is an important topic, but the engineer in me would
have hoped for some actual beefy technical stuff. But well, I was just
not the right audience. I had heard about OpenPOWER quite some time ago
and have been following it from a distance.

February 05, 2017

Working as an embedded systems pentester is a lot of fun, but it comes with some annoying problems. There's so many tools that I can never seem to find the right one. Need to talk to a 3.3V UART? I almost invariably have an FTDI cable configured for 5 or 1.8V on my desk instead. Need to dump a 1.8V flash chip? Most of our flash dumpers won't run below 3.3. Need to sniff a high-speed bus? Most of the Saleae Logic analyzers floating around the lab are too slow to keep up with fast signals, and the nice oscilloscopes don't have a lot of channels. And everyone's favorite jack-of-all-trades tool, the Bus Pirate, is infamous for being slow.

As someone with no shortage of virtual razors, I decided that this yak needed to be shaved! The result was an ongoing project I call STARSHIPRAIDER. There will be more posts on the project in the coming months so stay tuned!

The first step was to decide on a series of requirements for the project:

32 bidirectional I/O ports split into four 8-pin banks.This is enough to sniff any commonly encountered embedded bus other than DRAM. Multiple banks are needed to support multiple voltage levels in the same target.

Full support for 1.2 to 5V logic levels.This is supposed to be a "Swiss Army knife" embedded systems debug/testing tool. This voltage range encompasses pretty much any signalling voltage commonly encountered in embedded devices.

Tolerance to +/- 12V DC levels.Test equipment needs to handle some level of abuse. When you're reverse engineering a board it's easy to hook up ground to the wrong signal, probe a power rail, or even do both at once. The device doesn't have to function in this state (shutting down for protection is OK) but needs to not suffer permanent damage. It's also OK if the protection doesn't handle AC sources - the odds of accidentally connecting a piece of digital test equipment to a big RF power amplifier are low enough that I'm not worried.

500 Mbps input/output rate for each pin.This was a somewhat arbitrary choice, but preliminary math indicated it was feasible. I wanted something significantly faster than existing tools in the class.

Ethernet-based interface to host PC.I've become a huge fan of Ethernet and IPv6 as communications interface for my projects. It doesn't require any royalties or license fees, scales from 10 Mbps to >10 Gbps and supports bridging between different link speeds, supports multi-master topologies, and can be bridged over a WAN or VPN. USB and PCIe, the two main alternatives, can do few if any of these.

Large data buffer.Most USB logic analyzers have very high peak capture rates, but the back-haul interface to the host PC can't keep up with extended captures at high speed. Commodity DRAM is so cheap that there's no reason to not stick a whole SODIMM of DDR3 in the instrument to provide an extremely deep capture buffer.

Multiple virtual instruments connected to a crossbar.Any nontrivial embedded device contains multiple buses of interest to a reverse engineer. STARSHIPRAIDER needs to be able to connect to several at once (on arbitrary pins), bridge them out to separate TCP ports, and allow multiple testers to send test vectors to them independently.

The brain of the system will be fairly straightforward high-speed digital. It will be a 6-8 layer PCB with an Artix-7 FPGA in FGG484 package, a SODIMM socket for 4GB of DDR3 800, a KSZ9031 Gigabit Ethernet PHY, a TLK10232 10gbit Ethernet PHY, and a SFP+ cage, plus some sort of connector (most likely a Samtec Q-strip) for talking to the I/O subsystem on a separate board.

The challenging part of the design, from an architectural perspective, seemed to be the I/O buffer and input protection circuit, so I decided to prototype it first.

STARSHIPRAIDER v0.1 I/O buffer design

A block diagram of the initial buffer design is shown above. The output buffer will be discussed in a separate post once I've had a chance to test it; today we'll be focusing on the input stage (the top half of the diagram).

During normal operation, the protection relay is closed. The series resistor has insignificant resistance compared to the input impedance of the comparator (an ADCMP607), so it can be largely ignored. The comparator checks the input signal against a threshold (chosen appropriately for the I/O standard in use) and sends a differential signal to the host board for processing. But what if something goes wrong?

If the user accidentally connects the probe to a signal outside the acceptable voltage range, a Schottky diode connected to the +5V or ground rail will conduct and shunt the excess voltage safely into the power rails. The series resistor limits fault current to a safe level (below the diode's peak power rating). After a short time (about 150 µs with my current relay driver), the protection relay opens and breaks the circuit.

All well and good in theory... but does it work? I built a characterization board containing a single I/O buffer and loaded with test points and probe connectors. You can grab the KiCAD files for this on Github as well. Here's a quick pic after assembly:

STARSHIPRAIDER I/O characterization board

Initial test results were not encouraging. Positive overvoltage spikes were clamped to +8V and negative spikes were clamped to -1V - well outside the -0.5 to +6V absolute max range of my comparator.

Positive transient response

Negative transient response

After a bit of review of the schematics, I found two errors. The "5V" ESD diode I was using to protect the high side had a poorly controlled Zener voltage and could clamp as high as 8V or 9V. The Schottky on the low side was able to survive my fault current but the forward voltage increased massively beyond the nominal value.

I reworked the board to replace the series resistor with a larger one (39 ohms) to reduce the maximum fault current, replaced the low-side Schottky with one that could handle more current, and replaced the Zener with an identical Schottky clamping to the +5V rail.

Testing this version gave much better results. There was still a small amount of ringing (less than five nanoseconds) a few hundred mV past the limit, but the comparator's ESD diodes should be able to safely dissipate this brief pulse.

Positive transient response, after rework

Negative transient response, after rework

Now it was time to test the actual signal path. My first iteration of the test involved cobbling together a signal path from an FPGA board through the test platform and to the oscilloscope without any termination. The source of the signal was a BNC-to-minigrabber flying lead test clip! Needless to say, results were less than stellar.

PRBS31 eye at 80 Mbps through protection circuit with flying leads and no terminator

After ordering some proper RF test supplies (like an inline 50 ohm BNC terminator), I got much better signal quality. The eye was very sharp and clear at 100 Mbps. It was visibly rounded at 200 Mbps, but rendering a squarewave at that rate requires bandwith much higher than the 100 MHz of my oscilloscope so results were inconclusive.

PRBS31 eye at 100 Mbps through protection circuit with proper cabling

PRBS31 eye at 200 Mbps, limited by oscilloscope bandwidth

I then hooked the protection circuit up to the comparator to test the entire inbound signal chain. While the eye looked pretty good at 100 Mbps (plotting one leg of the differential since my scope was out of channels), at 200 Mbps horrible jitter appeared.

PRBS31 eye at 100 Mbps through full input buffer

PRBS31 eye at 200 Mbps through full input buffer

After quite a bit of scratching my head and fumbling with datasheets, I realized my oscilloscope was the problem by plotting the clock reference I was triggering on. The jitter was visible in this clock as well, suggesting that it was inherent in the oscilloscope's trigger circuit. This isn't too surprising considering I'm really pushing the limits of this scope - I need a better one to do this kind of testing properly.

PRBS31 eye at 200 Mbps plus 200 MHz sync clock

At this point I've done about all of the input stage testing I can do with this oscilloscope. I'm going to try and rig up a BER tester on the FPGA so I can do PRBS loopback through the protection stage and comparator at higher speeds, then repeat for the output buffer and the protection run in the opposite direction.

I still have more work to do on the protection circuit as well... while it's fine at 100 Mbps, the 2x 10pF Schottky diode parasitic capacitance is seriously degrading my rise times (I calculated an RC filter -3dB point of around 200 MHz, so higher harmonics are being chopped off). I have some ideas on how I can cut this down much less but that will require a board respin and another blog post!

February 03, 2017

Linux Conf Australia took place two weeks ago in Hobart, Tasmania. For the second time, a Free Electrons engineer gave a talk at this conference: for this edition, Free Electrons CTO Thomas Petazzoni did a talk titled A tour of the ARM architecture and its Linux support. This talk was intended as an introduction-level talk to explain what is ARM, what is the concept behind the ARM architecture and ARM System-on-chip, bootloaders typically used on ARM and the Linux support for ARM with the concept of Device Tree.

The slides of the talk are available in PDF format, and the video is available on Youtube. We got some nice feedback afterwards, which is a good indication a number of attendees found it informative.

All the videos from the different talks are also available on Youtube.

We once again found LCA to be a really great event, and want to thank the LCA organization for accepting our talk proposal and funding the travel expenses. Next year LCA, in 2018, will take place in Sydney, in mainland Australia.

February 02, 2017

Like every year, a number of Free Electrons engineers will be attending the FOSDEM conference next week-end, on February 4 and 5, in Brussels. This year, Mylène Josserand and Thomas Petazzoni are going to FOSDEM. Being the biggest European open-source conference, FOSDEM is a great opportunity to meet a large number of open-source developers and learn about new projects.

In addition, Free Electrons is sponsoring the participation of Thomas Petazzoni to the Buildroot Developers meeting, which takes place during two days right after the FOSDEM conference. During this event, the Buildroot developers community gathers to make progress on the project by having discussions on the current topics, and working on the patches that have been submitted and need to be reviewed and merged.

The next Embedded Linux Conference will take place later this month in Portland (US), from February 21 to 23, with a great schedule of talks. As usual, a number of Free Electrons engineers will attend this event, and we will also be giving a few talks.

January 31, 2017

For many years, we have had a small, invitation only event by Osmocom
developers for Osmocom developers called OsmoDevCon. This was fine for
the early years of Osmocom, but during the last few years it became
apparent that we also need a public event for our many users. Those
range from commercial cellular operators to community based efforts like
Rhizomatica, and of course include the many
research/lab type users with whom we started.

So now we'll have the public OsmoCon on April 21st, back-to-back with
the invitation-only OsmoDevcon from April 22nd through 23rd.

I'm hoping we can bring together a representative sample of our user
base at OsmoCon 2017 in April. Looking forward to meet you all. I hope
you're also curious to hear more from other users, and of course the
development team.

January 30, 2017

This close-up view shows about a third of the circuit board. If it turns out to be too difficult to guess from the clues shown here, I’ll update this post with a full-board photo; but I have a feeling long-time players of Name that Ware might have too easy a time with this one.

The ware for December 2016 is a diaper making machine. The same machine can be configured for making sanitary napkins or diapers by swapping out the die cut rollers and base material; in fact, the line next to the one pictured was producing sanitary napkins at the time this photo was taken. Congrats to Stuart for the first correct guess, email me for your prize!

January 22, 2017

A few days ago, Autodesk has announecd
that the popular EAGLE electronics design automation (EDA) software is
moving to a subscription based model.

When previously you paid once for a license and could use that
version/license as long as you wanted, there now is a monthly
subscription fee. Once you stop paying, you loose the right to use the
software. Welcome to the brave new world.

I have remotely observed this subscription model as a general trend in
the proprietary software universe. So far it hasn't affected me at all,
as the only two proprietary applications I use on a regular basis
during the last decade are IDA and EAGLE.

I already have ethical issues with using non-free software, but those
two cases have been the exceptions, in order to get to the productivity
required by the job. While I can somehow convince my consciousness in
those two cases that it's OK - using software under a subscription model is
completely out of the question, period. Not only would I end up paying
for the rest of my professional career in order to be able to open and
maintain old design files, but I would also have to accept software that
"calls home" and has "remote kill" features. This is clearly not
something I would ever want to use on any of my computers. Also, I
don't want software to be associated with any account, and it's not the
bloody business of the software maker to know when and where I use my
software.

For me - and I hope for many, many other EAGLE users - this move is
utterly unacceptable and certainly marks the end of any business between
the EAGLE makers and myself and/or my companies. I will happily use
my current "old-style" EAGLE 7.x licenses for the near future, and theS
see what kind of improvements I would need to contribute to KiCAD or
other FOSS EDA software in order to eventually migrate to those.

As expected, this doesn't only upset me, but many other customers, some
of whom have been loyal to using EAGLE for many years if not decades,
back to the DOS version. This is reflected by some media reports (like
this one at hackaday
or user posts at element14.com or eaglecentral.ca
who are similarly critical of this move.

Rest in Peace, EAGLE. I hope Autodesk gets what they deserve: A new
influx of migrations away from EAGLE into the direction of Open Source
EDA software like KiCAD.

In fact, the more I think about it, I'm actually very much inclined to
work on good FOSS migration tools / converters - not only for my own
use, but to help more people move away from EAGLE. It's not that I
don't have enough projects at my hand already, but at least I'm
motivated to do something about this betrayal by Autodesk. Let's see
what (if any) will come out of this.

So let's see it that way: What Autodesk is doing is raising the level
off pain of using EAGLE so high that more people will use and contribute
FOSS EDA software. And that is actually a good thing!

January 20, 2017

Results of the processing of the color image

Previous blog post “Lens aberration correction with the lapped MDCT” described our experiments with the lapped MDCT[1] for optical aberration corrections of a single color channel and separation of the asymmetrical kernel into a small asymmetrical part for direct convolution and a larger symmetrical one to be applied in the frequency domain of the MDCT. We supplemented this processing chain with additional steps of the image conditioning to evaluate the overall quality of the of the results and feasibility of the MDCT approach for processing in the camera FPGA.

Image comparator in Fig.1 allows to see the difference between the images generated from the results of the several stages of the processing. It makes possible to compare any two of the image layers by either sliding the image separator or by just clicking on the image – that alternates right/left images. Zoom is controlled by the scroll wheel (click on the zoom indicator fits image), pan – by dragging.

Original image was acquired with Elphel model 393 camera with 5 Mpix MT9P006 image sensor and Sunex DSL227 fisheye lens, saved in jp4 format as a raw Bayer data at 98% compression quality. Calibration was performed with the Java program using calibration pattern visible in the image itself. The program is designed to work with the low-distortion lenses so fisheye was a stretch and the calibration kernels near the edges are just replicated from the ones closer to the center, so aberration correction is only partial in those areas.

First two layers differ just by added annotations, they both show output of a simple bilinear demosaic processing, same as generated by the camera when running in JPEG mode. Next layers show different stages of the processing, details are provided later in this blog post.

Linear part of the image conditioning: convolution and color conversion

Correction of the optical aberrations in the image can be viewed as convolution of the raw image array with the space-variant kernels derived from the optical point spread functions (PSF). In the general case of the true space-variant kernels (different for each pixel) it is not possible to use DFT-based convolution, but when the kernels change slowly and the image tiles can be considered isoplanatic (areas where PSF remains the same to the specified precision) it is possible to apply the same kernel to the whole image tile that is processed with DFT (or combined convolution/MDCT in our case). Such approach is studied in deep for astronomy [2],[3] (where they almost always have plenty of δ-function light sources to measure PSF in the field of view :-)).

The procedure described below is a combination of the sparse kernel convolution in the space domain with the lapped MDCT processing making use of its perfect (only approximate with the variant kernels) reconstruction property, but it still implements the same convolution with the variant kernels.

Signal flow is presented in Fig.2. Input signal is the raw image data from the sensor sampled through the color filter array organized as a standard Bayer mosaic: each 2×2 pixel tile includes one of the red and blue samples, and 2 of the green ones.

In addition to the image data the process depends on the calibration data – pairs of asymmetrical and symmetrical kernels calculated during camera calibration as described in the previous blog post.

Fig.2. Signal flow of the linear part of MDCT-based image conditioning

Image data is processed in the following sequence of the linear operations, resulting in intensity (Y) and two color difference components:

Each channel data is directly convolved with a small (we used just four non-zero elements) asymmetrical kernel AK, resulting in a sequence of 16×16 pixel tiles, overlapping by 8 pixels (input pixels are not limited to 16×16 tiles).

Each tile is multiplied by a window function, folded and converted with 8×8 pixel DCT-IV[4] – equivalent of the 16×16->8×8 MDCT.

8×8 result tiles are multiplied by symmetrical kernels (SK) – equivalent of convolving the pre-MDCT signal.

Each channel is subject to the low-pass filter that is implemented by multiplying in the frequency domain as these filters are indeed symmetrical. The cutoff frequency is different for the green (LPF1) and other (LPF2) colors as there are more source samples for the first. That was the last step before inverse transformation presented in the previous blog post, now we continued with a few more.

Natural images have strong correlation between different color channels so most image processing (and compression) algorithms involve converting the pure color channels into intensity (Y) and two color difference signals that have lower bandwidth than intensity. There are different standards for the color conversion coefficients and here we are free to use any as this process is not a part of a matched encoder/decoder pair. All such conversions can be represented as a 3×3 matrix multiplication by the (r,g,b) vector.

Two of the output signals – color differences are subject to an additional bandwidth limiting by LPF3.

IMDCT includes 8×8 DCT-IV, unfolding 8×8 into 16×16 tiles, second multiplication by the window function and accumulation of the overlapping tiles in the pixel domain.

Nonlinear image enhancement: edge emphasis, noise reduction

For some applications the output data is already useful – ideally it has all the optical aberrations compensated so the remaining far-reaching inter-pixel correlation caused by a camera system is removed. Next steps (such as stereo matching) can be done on- (or off-) line, and the algorithms do not have to deal with the lens specifics. Other applications may benefit from additional processing that improves image quality – at least the perceived one.

Such processing may target the following goals:

To reduce remaining signal modulation caused by the Bayer pattern (each source pixel carries data about a single color component, not all 3), trying to remove it by a LPF would blur the image itself.

Detect and enhance edges, as most useful high-frequency elements represent locally linear features

Reduce visible noise in the uniform areas (such as blue sky) where significant (especially for the small-pixel sensors) noise originates from the shot noise of the pixels. This noise is amplified by the aberration correction that effectively increases the high frequency gain of the system.

Some of these three goals overlap and can be addressed simultaneously – edge detection can improve de-mosaic quality and reduce related colored artifacts on the sharp edges if the signal is blurred along the edges and simultaneously sharpened in the orthogonal direction. Areas that do not have pronounced linear features are likely to be uniform and so can be low-pass filtered.

The non-linear processing produces modified pixel value using 3×3 pixel array centered around the current pixel. This is a two-step process:

First the 3×3 center-symmetric matrices (one for Y, another for color) of coefficients are calculated using the Y channel data, then

they are applied to the Y and color components by replacing the pixel value with the inner product of the calculated coefficients and the original data.

Signal flow for one channel is presented in Fig.3:

Fig.3 Non-linear image processing: edge emphasis and noise reduction

Four inner products are calculated for the same 9-sample Y data and the shown matrices (corresponding to second derivatives along vertical, horizontal and the two diagonal directions).

Each of these values is squared and

the following four 3×3 matrices are multiplied by these values. Matrices are symmetrical around the center, so gray-colored cells do not need to be calculated.

Four matrices are then added together and scaled by a variable parameter K1. The first two matrices are opposite to each other, and so are the second two. So if the absolute value of the two orthogonal second derivatives are equal (no linear features detected), the corresponding matrices will annihilate each other.

The sum of the positive values is compared to a specified threshold value, and if it exceed it – all the matrix is proportionally scaled down – that makes different line directions to “compete” against each other and against the blurring kernel.

The sum of all 9 elements of the calculated array is zero, so the default unity kernel is added and when correction coefficients are zeros, the result pixels will be the same as the input ones.

Inner product of the calculated 9-element array and the input data is calculated and used as a new pixel value. Two of the arrays are created from the same Y channel data – one for Y and the other for two color differences, configurable parameters (K1, K2, threshold and the smoothing matrix) are independent in these two cases.

Next steps

How much is it possible to warp?

The described method of the optical aberration correction is tested with the software implementation that uses only operations that can be ported to the FPGA, so we are almost ready to get back to to Verilog programming. One more thing to try before is to see if it is possible to merge this correction with a minor distortion correction. DFT and DCT transforms are not good at scaling the images (when using the same pixel grid). It is definitely not possible no rectify large areas of the fisheye images, but maybe small (fraction of a pixel per tile) stretching can still be absorbed in the same step with shifting? This may have several implications.

Single-step image rectification

It would be definitely attractive to eliminate additional processing step and save FPGA resources and/or decrease the processing time. But there is more than that – re-sampling degrades image resolution. For that reason we use half-pixel grid for the offline processing, but it increases amount of data 4 times and processing resources – at least 4 times also.

When working with the whole pixel grid (as we plan to implement in the camera FPGA) we already deal with the partial pixel shifts during convolution for aberration correction, so it would be very attractive to combine these two fractional pixel shifts into one (calibration process uses half-pixel grid) and so to avoid double re-sampling and related image degrading.

Using analytical lens distortion model with the precision of the pixel mapping

Another goal that seems achievable is to absorb at least the table-based pixel mapping. Real lenses can only to some precision be described by an analytical formula of a radial distortion model. Each element can have errors and the multi-lens assembly can inevitably has some mis-alignments – all that makes the lenses different and deviate from a perfect symmetry of the radial model. When we were working with (rather low distortion) wide angle lenses Evetar N125B04530W we were able to get to 0.2-0.3 pix root mean square of the reprojection error in a 26-lens camera system when using radial distortion model with up to the 8-th power of the radial polynomial (insignificant improvement when going from 6-th to the 8-th power). That error was reduced to 0.05..0.07 pixels when we implemented table-based pixel mapping for the remaining (after radial model) distortions. The difference between one of the standard lens models – polynomial for the low-distortion ones and f-theta for fisheye and “f-theta” lenses (where angle from optical axis approximately linearly depends on the distance from the center in the focal plane) is rather small, so it is a good candidate to be absorbed by the convolution step. While this will not eliminate re-sampling when the image will be rectified, this distortion correction process will have a simple analytical formula (already supported by many programs) and will not require a full pixel mapping table.

Image rectification is an important precondition to perform correlation-based stereo matching of two or more images. When finding the correlation between the images of a relatively large and detailed object it is easy to get resolution of a small fraction of a pixel. And this proportionally increases the distance measurement precision for the same base (distance between the individual cameras). Among other things (such as mechanical and thermal stability of the system) this requires precise measurement of the sub-camera distortions over the overlapping field of view.

When correlating multiple images the far objects (most challenging to get precise distance information) have low disparity values (may be just few pixels), so instead of the complete rectification of the individual images it may be sufficient to have a good “mutual rectification”, so the processed images of the object at infinity will match on each of the individual images with the same sub-pixel resolution as we achieved for off-line processing. This will require to mechanically orient each sub-camera sensor parallel to the others, point them in the same direction and preselect lenses for matching focal length. After that (when the mechanical match is in reasonable few percent range) – perform calibration and calculate the convolution kernels that will accommodate the remaining distortion variations of the sub-cameras. In this case application of the described correction procedure in the camera will result in the precisely matched images ready for correlation.

These images will not be perfectly rectified, and measured disparity (in pixels) as well as the two (vertical and horizontal) angles to the object will require additional correction. But this X/Y resolution is much less critical than the resolution required for the Z-measurements and can easily tolerate some re-sampling errors. For example, if a car at a distance of 20 meters is viewed by a stereo camera with 100 mm base, then the same pixel error that corresponds to a (practically negligible) 10 mm horizontal shift will lead to a 2 meter error (10%) in the distance measurement.

January 08, 2017

Finding a Libc for tiny embedded ARM systems

You'd think this problem would have been solved a long time ago. All I
wanted was a C library to use in small embedded systems -- those with
a few kB of flash and even fewer kB of RAM.

Small system requirements

A small embedded system has a different balance of needs:

Stack space is limited. Each thread needs a separate stack, and
it's pretty hard to move them around. I'd like to be able to
reliably run with less than 512 bytes of stack.

Dynamic memory allocation should be optional. I don't like using
malloc on a small device because failure is likely and usually hard
to recover from. Just make the linker tell me if the program is
going to fit or not.

Stdio doesn't have to be awesomely fast. Most of our devices
communicate over full-speed USB, which maxes out at about
1MB/sec. A stdio setup designed to write to the page cache at
memory speeds is over-designed, and likely involves lots of
buffering and fancy code.

Everything else should be fast. A small CPU may run at only
20-100MHz, so it's reasonable to ask for optimized code. They also
have very fast RAM, so cycle counts through the library matter.

Available small C libraries

I've looked at:

μClibc. This targets embedded Linux systems,
and also appears dead at this time.

Current AltOS C library

We've been using pdclib for a couple of years. It was easy to get
running, but it really doesn't match what we need. In particular, it
uses a lot of stack space in the stdio implementation as there's an
additional layer of abstraction that isn't necessary. In addition,
pdclib doesn't include a math library, so I've had to 'borrow' code
from other places where necessary. I've wanted to switch for a while,
but there didn't seem to be a great alternative.

What's wrong with newlib?

The "obvious" embedded C library is
newlib. Designed for embedded
systems with a nice way to avoid needing a 'real' kernel underneath,
newlib has a lot going for it. Most of the functions have a good
balance between speed and size, and many of them even offer two
implementations depending on what trade-off you need. Plus, the build
system 'just works' on multi-lib targets like the family of cortex-m
parts.

The big problem with newlib is the stdio code. It absolutely requires
dynamic memory allocation and the amount of code necessary for
'printf' is larger than the flash space on many of our devices. I was
able to get a cortex-m3 application compiled in 41kB of code, and that
used a smattering of string/memory functions and printf.

How about avr libc?

The Atmel world has it pretty good --
avr-libc is small and highly
optimized for atmel's 8-bit avr processors. I've used this library
with success in a number of projects, although nothing we've ever sold
through Altus Metrum.

In particular, the stdio implementation is quite nice -- a 'FILE' is
effectively a struct containing pointers to putc/getc functions. The
library does no buffering at all. And it's tiny -- the printf code
lacks a lot of the fancy new stuff, which saves a pile of space.

However, much of the places where performance is critical are written
in assembly language, making it pretty darn hard to port to another
processor.

Mixing code together for fun and profit!

Today, I decided to try an experiment to see what would happen if I
used the avr-libc stdio bits within the newlib environment. There were
only three functions written in assembly language, two of them were
just stubs while the third was a simple ultoa function with a weird
interface. With those coded up in C, I managed to get them wedged into
newlib.

Figuring out the newlib build system was the only real challenge; it's
pretty awful having generated files in the repository and a mix of
autoconf 2.64 and 2.68 version dependencies.

The result is pretty usable though; my STM 32L discovery board demo
application is only 14kB of flash while the original newlib stdio bits
needed 42kB and that was still missing all of the 'syscalls', like
read, write and sbrk.

'master' remains a plain upstream tree, although I do have a fix on
that branch. The new code is all on the tiny-stdio branch.

I'll post a note on the newlib mailing list once I've managed to
subscribe and see if there is interest in making this option available
in the upstream newlib releases. If so, I'll see what might make sense
for the Debian libnewlib-arm-none-eabi packages.

Modern small-pixel image sensors exceed resolution of the lenses, so it is the optics of the camera, not the raw sensor “megapixels” that define how sharp are the images, especially in the off-center areas. Multi-sensor camera systems that depend on the tiled images do not have any center areas, so overall system resolution may be as low as that of is its worst part.

De-mosaic processing and chromatic aberrations

Our current cameras role is to preserve the raw sensor data while providing some moderate compression, all the image correction is applied during post-processing. Handling the lens aberration has to be done before color conversion (or de-mosaicing). When converting Bayer data to color images most cameras start with the calculation of the “missing” colors in the RG/GB pattern using 3×3 or 5×5 kernels, this procedure relies on the specific arrangement of the color filters.

Each of the red and blue pixels has 4 green ones at the same distance (pixel pitch) and 4 of the opposite (R for B and B for R) color at the equidistant diagonal locations. Fig.1. shows how lateral chromatic aberration disturbs these relations.

Fig.1a is the point-spread function (PSF) of the green channel of the sensor. The resolution of the PSF measurement is twice higher than the pixel pitch, so the lens is not that bad – horizontal distance between the 2 greens in Fig.1c corresponds to 4 pixels of Fig.1a. It is also clearly visible that the PSF is elongated and the radial resolution in this part of the image is better than the tangential one (lens center is left-down).

Fig.1b shows superposition of the 3 color channels: blue center is shifted up-and-right by approximately 2 PSF pixels (so one actual pixel period of the sensor) and the red one – half-pixel left-and-down from the green center. So the point light of a star, centered around some green pixel will not just spread uniformly to the two “R”s and two “B”s shown connected with lines in Fig.1c, but the other ones and in different order. Fig.1d illustrates the effective positions of the sensor pixels that match the lens aberration.

Aberrations correction at post-processing stage

When we perform off-line image correction we start with separating each color channel and re-sampling it at twice the pixel pitch frequency (adding zero sample between each measured one) – this increase allows to shift image by a fraction of a pixel both preserving resolution and not introducing the phase errors that may be visually OK but hurt when relying on sub-pixel resolution during correlation of images.

Next is the conversion of the full image into the overlapping square tiles to the frequency domain using 2-d DFT, then multiplication by the inverted PSF kernels – individual for each color channel and each part of the whole image (calibration procedure provides a 2-d array of PSF kernels). Such multiplication in the frequency domain is equivalent to (much more computationally expensive) image convolution (or deconvolution as the desired result is to reduce the effect of the convolution of the ideal image with the PSF of the actual lens). This is possible because of the famous convolution-multiplication property of Fourier transform and its discrete versions.

After each color channel tile is corrected and the phases of color components match (lateral chromatic aberration is compensated) it is the time when the data may be subject to non-linear processing that relies on the properties of the images (like detection of lines and edges) to combine the color channels trying to achieve highest spacial resolution and not to introduce color artifacts. Our current software performs it while data is in the frequency domain, before the inverse Fourier transform and merging the lapped tiles to the restored image.

Fig.2. Histogram of difference between original image and after direct and inverse MDCT (with 8×8 pixels DCT-IV)

MDCT of an image – there and back again

It would be very appealing to use DCT-based MDCT instead of DFT for aberration correction. With just 8×8 point DCT-IV it may be possible to calculate direct 16×16 -> 8×8 MDCT and 8×8 -> 16×16 IMDCT providing perfect reconstruction of the image. 8×8 pixels DCT should be able to handle convolution kernels with 8 pixel radius – same would require 16×16 pixels DFT. I knew there will be a challenge to handle non-symmetrical kernels but first I gave a try to a 2-d MDCT to convert and reconstruct back a camera image that way. I was not able to find an efficient Java implementation of the DCT-IV so I had to write some code following the algorithms presented in [1].

That worked nicely – when I obtained a histogram of the difference between the original image (pixel values were in the range of 0 to 255) and the restored one – IMDCT(MDCT(original)) it demonstrated negligible error. Of course I had to discard 8 pixel border of the image added by replication before the procedure – these border pixels do not belong to 4 overlapping tiles as all internal ones and so can not be reconstructed.

When this will be done in the camera FPGA the error will be higher – DCT implementation there uses just an integer DSP – not capable of the double precision calculations as the Java code. But for the small 8×8 transformations it should be rather easy to manage calculation precision to the required level.

Convolution with MDCT

It was also easy to implement a low-pass symmetrical filter by multiplying 8×8 pixel MDCT output tiles by a DCT-III transform of the desired convolution kernel. To convolve f ☼ g you need to multiply DCT_IV(f) by DCT_III(g) in the transform domain [2], but that does not mean that DCT-III has also be implemented in the FPGA – the de-convolution kernels can be prepared during off-line calibration and provided to the camera in the required form.

But not much more can be done for the convolution with asymmetric kernels – they either require additional DST (so DCT and DST) of the image and/or padding data with extra zeros [3],[4] – all that reduces advantage of DCT compared to DFT. Asymmetric kernels are required for the lens aberration corrections and Fig.1 shows two cases not easily suitable for MDCT:

lateral chromatic aberrations (or just shift in the image domain) – Fig.1b and

“diagonal” kernels (Fig.1a) – not an even function of each of the vertical and horizontal axes.

Symmetric kernels are like what you can do with a twice folded piece of paper, cut to some shape and unfolded, with folds oriented strictly vertically and horizontally.

Factorization of the convolution

Another way to handle convolution with non-symmetrical kernels is to split it in two – first convolve with an asymmetrical one directly and then – use MDCT and symmetrical kernel. The input data for combined convolution is split Bayer data, so each color channel receives sparse sequence – green one has only half non-zero elements and red and blue – only 1/4 such pixels. In the case of half-pixel grid (to handle fraction-pixel shifts) the relative amount of non-zero pixels is four times smaller, so the total number of multiplications is the same as for the whole-pixel grid.

The goal of such factorization is to minimize the number of the non-zero elements in the asymmetrical kernel, imposing no restrictions on the symmetrical one. Factorization does not have to be absolutely precise – the effect of deconvolution is limited by several factors, most important being the amplification of the sensor noise (such as shot noise). The required number of non-zero pixel may vary with the type of the distortion, for the lens we experimented with (Sunex DSL227 fisheye) just 4 pixels were sufficient to achieve 2-4% error for each of the kernel tiles. Four pixel kernels make it 1 multiplication per each of the red and blue pixels and 2 multiplications per green. As the kernels are calculated during the camera off-line calibration it should be possible to simultaneously generate scheduling of the the DSP and buffer memories to additionally reduce the required run-time FPGA resources.

Fig.3 illustrates how the deconvolution kernel for the aberration correction is split into two for the consecutive convolutions. Fig.1a shows the required deconvolution kernel determined during the existing calibration procedure. This kernel is shown far off-center even for the green channel – it appeared near the edge of the fish-eye lens field of view as the current lens model is based on the radial polynomial and is not efficient for the fish-eye (f-theta) lenses, so aberration correction by deconvolution had to absorb that extra shift. As the convolution kernel has fixed non-zero elements, the computation complexity does not depend on the maximal kernel dimensions. Fig.3b shows the determined asymmetric convolution kernel of 4 pixels, and Fig.3c – the kernel for symmetric convolution with MDCT, the unique 8×8 pixels part of it (inside of the red square) is replicated to the other3 quadrants by mirroring along row 0 and column 0 because of the whole pixel even symmetry – right boundary condition for DCT-III. Fig.3d contains result of the DCT-III applied to the data shown in Fig.3c.

Fig.4. Symmetric convolution kernel tiles in MDCT domain. Full image (click to open) has peripheral kernels replicated as there was no calibration data outside of the fisheye lens filed of view

There should be some more efficient ways to find optimal combinations of the two kernels, currently I used a combination of the Levenberg-Marquardt Algorithm (LMA) that minimizes approximation error (root mean square of the differences between the given kernel and the result of the convolution of the two calculated) and adding/replacing pixels in the asymmetrical kernel, sorting the variants for the best LMA fit. Experimental code (FactorConvKernel.java) for the kernel calculation is in the same git repository.

Each kernel tile is processed independently of the neighbors, so while the aberration deconvolution kernels are changing smoothly between the adjacent tiles, the individual asymmetrical (for direct convolution with Bayer signal data) and symmetrical (for convolution by multiplication in the MDCT space) may change dramatically (see Fig.4). But when the direct convolution is applied before the window multiplication to the source pixels that contribute to a 16×16 pixel MDCT overlapping tile, then the result (after IMDCT) depends on the convolution of the two kernels, not the individual ones.

Deconvolving the test image

Next step was to apply the convolution to the test image, see if there are any visible blocking (or other) artifacts and if the image sharpness was improved. Only a single (green) channel was tested as there is no DCT-based color conversion code in this program yet. Program was tested with the whole pixel grid (not half pixel) so some reduction of sharpness caused by fractional pixel shift was expected. For the comparison “before/after” aberration correction I used two pairs – one with the raw Bayer data (half of the pixels are black in a checker-board pattern) and the other – with the Bayer pattern after 0.4 pix low-pass filter to reduce the checkerboard pattern. Without this filtering image would be either twice darker or (as on these pictures) saturated at lower levels (checkerboard 0/255 alternating pixels result in average gray level of only half of the full range).

Fig.5 shows animated GIF of a fraction of the whole image, clicking the image shows comparison to the raw Bayer (with the limited gray level), caption links the full size images for these 3 modes.

No de-noise code is used, so amplification of the pixel shot noise is clearly visible, especially on the uniform surfaces, but aliasing cancellation remained functional even with abrupt changing of the convolution kernels as ones shown in Fig.4.

Conclusions

Algorithms suitable for FPGA implementation are tested with the simulation code. Processing of the images subject to the typical optical aberration of the fisheye lens DSL227 does not add significantly to the computational complexity compared to the pure symmetric convolution using lapped MDCT based on the 8×8 pixels two-dimensional DCT-IV.

This solution can be used as a first stage of the real time image correction and rectification, capable of sub-pixel resolution in multiple application areas, such as 3-d reconstruction and autonomous navigation.

December 30, 2016

I've just had the pleasure of attending all four days of 33C3 and have returned
home with somewhat mixed feelings.

I've been a regular visitor and speaker at CCC events since 15C3 in
1998, which among other things
means I'm an old man now. But I digress ;)

The event has come extremely far in those years. And to be honest, I
struggle with the size. Back then, it was a meeting of like-minded
hackers. You had the feeling that you know a significant portion of the
attendees, and it was easy to connect to fellow hackers.

These days, both the number of attendees and the size of the event make
you feel much rather that you're in general public, rather than at some
meeting of fellow hackers. Yes, it is good to see that more people are
interested in what the CCC (and the selected speakers) have to say, but
somehow it comes at the price that I (and I suspect other old-timers)
feel less at home. It feels too much like various other technology
related events.

One aspect creating a certain feeling of estrangement is also the venue
itself. There are an incredible number of rooms, with a labyrinth of
hallways, stairs, lobbies, etc. The size of the venue simply makes it
impossible to simply _accidentally_ running into all of your fellow
hackers and friends. If I want to meet somebody, I have to make an
explicit appointment. That is an option that exits most of the rest of
the year, too.

The range of topics covered at the event also becomes wider, at least I
feel that way. Topics like IT security, data protection, privacy,
intelligence/espionage and learning about technology have always been
present during all those years. But these days we have bloggers sitting
on stage and talking about bottles of wine (seriously?).

Contrary to many, I also really don't get the excitement about shows
like 'Methodisch Inkorrekt'. Seems to me like mainstream
compatible entertainment in the spirit of the 1990ies Knoff Hoff Show without much
potential to make the audience want to dig deeper into (information)
technology.

This presentation covers some of our recent explorations into a specific
type of 3G/4G cellular modems, which next to the regular modem/baseband
processor also contain a Cortex-A5 core that (unexpectedly) runs Linux.

We want to use such modems for building self-contained M2M devices that
run the entire application inside the modem itself, without any external
needs except electrical power, SIM card and antenna.

Next to that, they also pose an ideal platform for testing the Osmocom
network-side projects for running GSM, GPRS, EDGE, UMTS and HSPA
cellular networks.

As with all the many projects that I happen to end up doing, it would be
great to get more people contributing to them. If you're interested in
cellular technology and want to help out, feel free to register at the
osmocom.org site and start adding/updating/correcting information to the
wiki.

You can e.g. help by

playing with the modem and documenting your findings

reviewing the source code released by Qualcomm + Quectel and
documenting your findings

help us to create a working OE build with our own kernel and rootfs
images as well as opkg package feeds for the modems

help reverse engineering DIAG and QMI protocols as well as the open
source programs to interact with them

December 29, 2016

In 2016, Osmocom gained initial 3.5G support with osmo-iuh and the Iu
interface extensions of our libmsc and OsmoSGSN code. This means you can run
your own small open source 3.5G cellular network for SMS, Voice and Data
services.

However, the project needs more contributors: Become an active
member in the Osmocom development community and get your nano3G
femtocell for free.

I'm happy to announce that my company sysmocom hereby issues a call for
proposals to the general public. Please describe in a short proposal
how you would help us improving the Osmocom project if you were to
receive one of those free femtocells.

December 23, 2016

Sometimes we need to test disks connected to camera and find out if a particular model is a good candidate for in-camera stream recording application. Such disks should not only be fast enough in terms of write speed, but they should have short ‘response time’ to write commands. This ‘response time’ is basically the time between command sent to disk and a response from disk that this command has finished. The time between the two events is related to total write speed, but it can vary due to processes going on in internal disk controller. The fluctuations in disk response time can be an important parameter for high bandwidth streaming applications in embedded systems as this value allows to estimate the data buffer size needed during recording, but this may be not very critical parameter for typical PC applications as modern computers are equipped with large amount of RAM. We have not found any suitable parameter in disk specifications we had which would give us a hint for the buffer size estimation and developed a small test program for this purpose.

This program basically resembles camogm (in-camera recording program) in its operation and allows us to write repeating blocks of data containing counter value and then check the consistency of the data written. This program works directly with disk driver and collects some statistics during its operation. Disk driver, among other things, measures the time between two events: when write command is issued and when command completion interrupt from controller is received. This time can be used to measure disk write speed as the amount of data sent to disk with each command is also known. In general, this time slightly floats around its average value given that the amount of data written with each command is almost the same. But long run tests have shown that sometimes the interrupt return time after write command can be much longer then the average time.

We decided to investigate this situation in a little bit more details and tested two SSDs with our test program. The disks used for tests were SanDisk SD8SMAT128G1122 and Crucial CT250MX200SSD6, both were connected to eSATA camera port over M.2 SSD adapter. We used these disks before and they demonstrated different performance during recording. We ran camogm_test to write 3 MB blocks of data in cyclic mode. The program collected delayed interrupt times reported by driver as well as the amount of data written since the last delay event. The processed results of the test:

Actual points of interest on these charts are circled in red and they show those delays that are noticeably different from average values. Below is the same data in table form:

Disk

Average IRQ reception time, ms

Standard deviation, ms

Average IRQ delay time, ms

Standard deviation, ms

Data recorded since last IRQ delay, GB

Standard deviation, GB

CT250MX200SSD6 (250 GB)

11.9

1.1

804

12.7

499.7

111.7

SD8SMAT128G1122 (128 GB)

19.3

4.8

113

6.5

231.5

11.5

The delayed interrupt times of these disks are considerably different although the difference in average interrupt times which reflect disk write speeds is not that big. It is interesting to notice that the amount of data written to disk between two consecutive interrupt delays is almost twice the total disk size. smartctl reported the increase of Runtime_Bad_Block attribute for CT250MX200SSD6 after each delay but the delays occurred each time on different LBAs. Unfortunately, SD8SMAT128G1122 does not have such parameter in its smartctl attributes and it is difficult to compare the two disks by this parameter.

December 18, 2016

You can see why this giant package is nearly obsolete these days (it's been around since 1955) - tiny crystal on a large steel case is largely limited by steel thermal conduction. Modern packages with copper base could do better with much smaller packages.

As we finished with the basic camera functionality and tested the first Eyesis4π built with the new 10393 system boards (it is smaller, requires less power and, is faster) we are moving forward with the in-camera image processing. We plan to combine our current camera calibration methods that require off-line post processing and the real-time image correction using the camera own FPGA resources. This project development will require switching between the actual FPGA coding and the software implementation of the same algorithms before going to the next step – software is still easier to design. The first part was in FPGA realm – it was to implement the fundamental image processing block that we already know we’ll be using and see how much of the resources it needs.

DCT type IV as a building block for in-camera image processing

We consider a small (8×8 pixel) DCT-IV to be a universal block for conditioning of the raw acquired images. Such operations as lens optical aberrations correction, color conversion (de-mosaic) in the presence of the lateral chromatic aberration, image rectification (de-warping) are easier to perform in the frequency domain using convolution-multiplication property and other algorithms.

In post-processing we use DFT (Discrete Fourier Transform) over rather large (64×64 to 512×512) tiles, but that would be too much for the in-camera processing. First is the tile size – for good lenses we do not need that large convolution kernels. Additionally we plan to combine several processing steps into one (based on our off-line post-processing experience) and so we do not need to sub-sample images – in our current software we double resolution of the raw images at the beginning and scale back the final result to reduce image degradation caused by re-sampling.

The second area where we plan to reduce computations is the replacement of the DFT with the DCT that is designed to be fed with the pure real data and so requires less arithmetic operations than DFT that processes complex input values.

Why “type IV” of the DCT?

Fig.1. Signal flow graph for DCT-IV

We already have DCT type II implemented for the JPEG/JP4 compression, and we still needed another one. Type IV is used in audio compression because it can be converted to a modified discrete cosine transform (MDCT) – a procedure when multiple overlapped windows are processed one at a time and the results are seamlessly combined without any block artifacts that are familiar for the JPEG with low settings of the compression quality. We too need lapped transform to process large images with relatively small (much smaller than the image itself) convolution kernels, and DCT-IV is a perfect fit. 8-point DCT-IV allows to implement transformation of 16-point segments with 8-point overlap in a reversible manner – the inverse transformation of 8-point data may be converted to 16-point overlapping segments, and being added together these segments result in the original data.

There is a price though to pay for switching from DFT to DCT – the convolution-multiplication property being so straightforward in FFT gets complicated for DCT[1]. While convolving with symmetrical kernels is still simple (just the kernel has to be transformed differently, but it is anyway done off-line in our case), the arbitrary kernel convolution (or just a shift in image space needed to compensate the lateral chromatic aberration) requires both DCT-IV and DST-IV transformed data. DST-IV can be calculated with the same DCT-IV modules (just by reversing the direction of input data and alternating the sign of the output samples), but it still requires additional hardware resources and/or more processing time. Luckily it is only needed for the direct (image domain to frequency domain) transform, the inverse transform IDCT-IV (frequency to image) does not require DST. And IDCT-IV is actually the same as the direct DCT-IV, so we can again instantiate the same module.

Most of the two-dimensional transforms combine 1-d transform modules (because DCT is a separable transform), so we too started with just an 8-point DCT. There are multiple known factorizations for such algorithm[2] and we used one of them (based on BinDCT-IV) shown in Fig.1.

DSP primitive in Xilinx Zynq

This algorithm is implemented with a pair of DSP48E1[3] primitives shown in Fig.2. This primitive is flexible and allows to configure different functionality, the diagram contains only the blocks and connections used in the current project. The central part is the multiplier (signed 18 bits by signed 25 bits) with inputs from a pair of multiplexed B registers (B1 and B2, 18 bits wide) and the pre-adder AD register (25 bits). The AD register stores sum/difference of the 25-bit D-register and a multiplexed pair of 25-bit A1 and A2 registers. Any of the inputs can be replaced by zero, so AD can receive D, A1, A2, -A1, -A2, D+A1, D-A1, D+A2 and D-A2 values. Result of the multiplier (43 bits) is stored in the M register and the data from M is combined with the 48-bit output accumulator register P. Final adder can add or subtract M to/from one of the P, 48-bit C-register or just 0, so the output P register can receive +/-M, P+/-M and C+/-M. The wrapper module dsp_ma_preadd_c.v reduces the number of DSP48E1 signals and parameters to those required for the project and in addition to the primitive instance have a simple model of the DSP slice to allow simulation without the DSP48E1 source code for convenience.

Fig.3. One-dimensional 8-point DCT-IV implementation

8-point DCT-IV transform

The DCT-IV implementation module (Fig.3.) operates in 16 clocks cycles (2 clock periods per data item) and the input/output permutations are not included – they can be absorbed in the data source and destination memories. Current implementation does not implement correct rounding and saturation to save resources – such processing can be added to the outputs after analysis for particular application data widths. This module is not in the coder/decoder signal chain so bit-accuracy is not required.

Data is output each other cycle (so two such modules can easily be used to increase bandwidth), while input data is scrambled more, some of the items have to appear twice in a 16-cycle period. This implementation uses two of the DSP48E1 primitives connected in series. First one implements the left half of the Fig.1. graph – 3 rotators (marked R8, and two of R4), four adders, and four subtracters, The second one corresponds to the right half with R1, R5, R9, R13, four adders, and four subtracters. Two of the small memories (register files) – 2 locations before the first DSP and 4 locations before the second effectively increase the number of the DSP internal D registers. The B inputs of the DSPs receive cosine coefficients, the same ROM provides values for both DSP stages.

The diagram shows just the data paths, all the DSP control signals as well as the memories write and read addresses are generated at the defined times decoded from the 16-cycle period. The decoder is based on the spreadsheet draft of the design.

Fig.4. Two-dimensional 8×8 DCT-IV

Two-dimensional 8×8 points DCT-IV

Next diagram Fig.4. shows a two-dimensional DCT type IV implementation using four of the 1-d 8-point DCT-IV modules described above. Input data arrives continuously in line-scan order, next 64-item block may follow either immediately or after a delay of at least 16 cycles so the pipelines phases are correctly restarted. Two of the input 8×25 memories (width can be reduced to match input data, 25 is the width of the DSP48E1 inputs) are used to re-order the input data.As each of the 1-d DCT modules require input data at more than a half cycles (see bottom of Fig.3) interleaving with the common memory for both channels is not possible, so each channel has to have a dedicated one. First of the two DCT modules convert even lines of 8 points, the other one – odd lines. The latency of the data output from the RAM in the second channel is made 1 cycle longer, so the output data from the channels also arrive at odd/even time slots and can be multiplexed to a common transpose buffer memory. Minimal size of the buffer is 2 of the 64 item pages (width can be reduced to match application requirements), but having just a two-page buffer increases the minimal pause time between blocks (if they are not immediate), with a four page buffer (and BRAM primitives are larger even if just halves are used) the minimal non-immediate delay of the 16 cycles of a 1-d module is still valid.

The second (vertical) pass is similar to the first (horizontal) one, it also has individual small memories for input data reordering and 2 output de-scrambler memories. It is possible to use a single stage, but the memory should hold at least 17 items (>16) and the primitives are 16-deep, and I believe that splitting in series makes it easier for the placer/router tools to implement the design.

Next steps

Now when the 8×8 point DCT-IV is designed and simulated the next step is to switch to the Java coding (add to our ImageJ plugin for camera calibration and image post-processing), convert calibration data to the form suitable for the future migration to FPGA and try the processing based on the chosen 8×8 DCT-IV. When satisfied with the results – continue with the FPGA coding.

December 16, 2016

When you work with GSM/cellular systems, the definite resource are the
specifications. They were originally released by ETSI, later by 3GPP.

The problem start with the fact that there are separate numbering
schemes. Everyone in the cellular industry I know always uses the
GSM/3GPP TS numbering scheme, i.e. something like 3GPP TS 44.008.
However, ETSI assigns its own numbers to the specs, like ETSI TS
144008. Now in most cases, it is as simple s removing the '.' and
prefixing the '1' in the beginning. However, that's not always true and
there are exceptions such as 3GPP TS 01.01 mapping to ETSI TS
101855. To make things harder, there doesn't seem to be a
machine-readable translation table between the spec numbers, but there's
a website for spec number conversion at http://webapp.etsi.org/key/queryform.asp

When I started to work on GSM related topics somewhere between my work
at Openmoko and the start of the OpenBSC project, I manually downloaded
the PDF files of GSM specifications from the ETSI website. This was a
cumbersome process, as you had to enter the spec number (e.g. TS 04.08)
in a search window, look for the latest version in the search results,
click on that and then click again for accessing the PDF file (rather
than a proprietary Microsoft Word file).

At some point a poor girlfriend of mine was kind enough to do this
manual process for each and every 3GPP spec, and then create a
corresponding symbolic link so that you could type something like evince
/spae/openmoko/gsm-specs/by_chapter/44.008.pdf into your command line
and get instant access to the respective spec.

However, of course, this gets out of date over time, and by now almost a
decade has passed without a systematic update of that archive.

To the rescue, 3GPP started at some long time ago to not only provide
the obnoxious M$ Word DOC files, but have deep links to ETSI. So you
could go to http://www.3gpp.org/DynaReport/44-series.htm and then click
on 44.008, and one further click you had the desired PDF, served by
ETSI (3GPP apparently never provided PDF files).

However, in their infinite wisdom, at some point in 2016 the 3GPP
webmaster decided to remove those deep links. Rather than a nice long
list of released versions of a given spec,
http://www.3gpp.org/DynaReport/44008.htm now points to some crappy
JavaScript tabbed page, where you can click on the version number and
then get a ZIP file with a single Word DOC file inside. You can hardly
male it any more inconvenient and cumbersome. The PDF links would open
immediately in modern browsers built-in JavaScript PDF viewer or your
favorite PDF viewer. Single click to the information you want. But no,
the PDF links had to go and replaced with ZIP file downloads that you
first need to extract, and then open in something like LibreOffice,
taking ages to load the document, rendering it improperly in a word
processor. I don't want to edit the spec, I want to read it, sigh.

So since the usability of this 3GPP specification resource had been
artificially crippled, I was annoyed sufficiently well to come up with a
solution:

then use a shell script that utilizes pdfgrep and awk to determine the
3GPP specification number (it is written in the title on the first
page of the document) and creating a symlink. Now I have something
like 44.008-4.0.0.pdf -> ts_144008v040000p.pdf

It's such a waste of resources to have to download all those files and
then write a script using pdfgrep+awk to re-gain the same usability that
the 3GPP chose to remove from their website. Now we can wait for ETSI
to disable indexing/recursion on their server, and easy and quick spec
access would be gone forever :/

December 15, 2016

LM1813 - anti-skid chip, was the largest analog die National Semiconductor had built to date as of 1974. It was built as a custom for a brake system vendor to Ford Motor company for use in their pickup trucks.

Die size 2234x1826 µm.

Test chips on the wafer:

Thanks for the wafers goes to Bob Miller, one of designers of this chip.

As usual, we take this opportunity to look at the contributions Free Electrons made to this kernel release. In total, we contributed 116 non-merge commits. Our most significant contributions this time have been:

Contribution of an input ADC resistor ladder driver, written by Alexandre Belloni. As explained in the commit log: common way of multiplexing buttons on a single input in cheap devices is to use a resistor ladder on an ADC. This driver supports that configuration by polling an ADC channel provided by IIO.

Several bug fixes and improvements to the Marvell CESA driver, for the crypto engine founds in most Marvell EBU processors. By Romain Perier and Thomas Petazzoni

Support for the PIC interrupt controller, used on the Marvell Armada 7K/8K SoCs, currently used for the PMU (Performance Monitoring Unit). By Thomas Petazzoni.

Enabling of Armada 8K devices, with support for the slave CP110 and the first Armada 8040 development board. By Thomas Petazzoni.

On Allwinner platforms

Addition of GPIO support to the AXP209 driver, which is used to control the PMIC used on most Allwinner designs. Done by Maxime Ripard.

Initial support for the Nextthing GR8 SoC. By Mylène Josserand and Maxime Ripard (pinctrl driver and Device Tree)

The improved sunxi-ng clock code, introduced in Linux 4.8, is now used for Allwinner A23 and A33. Done by Maxime Ripard.

Add support for the Allwinner A33 display controller, by re-using and extending the existing sun4i DRM/KMS driver. Done by Maxime Ripard.

Addition of bridge support in the sun4i DRM/KMS driver, as well as the code for a RGB to VGA bridge, used by the C.H.I.P VGA expansion board. By Maxime Ripard.

Numerous cleanups and improvements commits in the UBI subsystem, in preparation for merging the support for Multi-Level Cells NAND, from Boris Brezillon.

Improvements in the MTD subsystem, by Boris Brezillon:

Addition of mtd_pairing_scheme, a mechanism which allows to express the pairing of NAND pages in Multi-Level Cells NANDs.

Improvements in the selection of NAND timings.

In addition, a number of Free Electrons engineers are also maintainers in the Linux kernel, so they review and merge patches from other developers, and send pull requests to other maintainers to get those patches integrated. This lead to the following activity:

Maxime Ripard, as the Allwinner co-maintainer, merged 78 patches from other developers.

Grégory Clement, as the Marvell EBU co-maintainer, merged 43 patches from other developers.

Alexandre Belloni, as the RTC maintainer and Atmel co-maintainer, merged 26 patches from other developers.

Boris Brezillon, as the MTD NAND maintainer, merged 24 patches from other developers.

Continuous integration in Linux kernel

Because of Linux’s well-known ability to run on numerous platforms and the obvious impossibility for developers to test changes on all these platforms, continuous integration has a big role to play in Linux kernel development and maintenance.

More generally, continuous integration is made up of three different steps:

building the software which in our case is the Linux kernel,

testing the software,

reporting the tests results;

KernelCI complete process

KernelCI checks hourly if one of the Git repositories it tracks have been updated. If it’s the case then it builds, from the last commit, the kernel for ARM, ARM64 and x86 platforms in many configurations. Then it stores all these builds in a publicly available storage.

Once the kernel images have been built, KernelCI itself is not in charge of testing it on hardware. Instead, it delegates this work to various labs, maintained by individuals or organizations. In the following section, we will discuss the software architecture needed to create such a lab, and receive testing requests from KernelCI.

Core software component: LAVA

At this moment, LAVA is the only supported software by KernelCI but note that KernelCI offers an API, so if LAVA does not meet your needs, go ahead and make your own!

What is LAVA?

LAVA is a self-hosted software, organized in a server-dispatcher model, for controlling boards, to automate boot, bootloader and user-space testing. The server receives jobs specifying what to test, how and on which boards to run those tests, and transmits those jobs to the dispatcher linked to the specified board. The dispatcher applies all modifications on the kernel image needed to make it boot on the said board and then fully interacts with it through the serial.

Since LAVA has to fully and autonomously control boards, it needs to:

interact with the board through serial connection,

control the power supply to reset the board in case of a frozen kernel,

know the commands needed to boot the kernel from the bootloader,

serve files (kernel, DTB, rootfs) to the board.

The first three requirements are fulfilled by LAVA thanks to per-board configuration files. The latter is done by the LAVA dispatcher in charge of the board, which downloads files specified in the job and copies them to a directory accessible by the board through TFTP.

LAVA organizes the lab in devices and device types. All identical devices are from the same device type and share the same device type configuration file. It contains the set of bootloader instructions to boot the kernel (e.g.: how and where to load files) and the bootloader configuration (e.g.: can it boot zImages or only uImages). A device configuration file stores the commands run by a dispatcher to interact with the device: how to connect to serial, how to power it on and off. LAVA interacts with devices via external tools: it has support for conmux or telnet to communicate via serial and power commands can be executed by custom scripts (pdudaemon for example).

Control power supply

Some labs use expensive Switched PDUs to control the power supply of each board but, as discussed in our previous blog post we went for several Devantech ETH008 Ethernet-controlled relay boards instead.

Connect to serial

As advised in LAVA’s installation guide, we went with telnet and ser2net to connect the serial port of our boards. Ser2net basically opens a Linux device and allows to interact with it through a TCP socket on a defined port. A LAVA dispatcher will then launch a telnet client to connect to a board’s serial port. Because of the well-known fact that Linux devices name might change between reboots, we had to use udev rules in order to guarantee the serial we connect to is the one we want to connect to.

Actual testing

Now that LAVA knows how to handle devices, it has to run jobs on those devices. LAVA jobs contain which images to boot (kernel, DTB, rootfs), what kind of tests to run when in user space and where to find them. A job is strongly linked to a device type since it contains the kernel and DTB specifically built for this device type.

Those jobs are submitted to the different labs by the KernelCI project. To do so, KernelCI uses a tool called lava-ci. Amongst other things, this tool contains a big table of the supported platforms, associating the Device Tree name with the corresponding hardware platform name. This way, when a new kernel gets built by KernelCI, and produces a number of Device Tree Blobs (.dtb files), lava-ci knows what are the corresponding hardware platforms to run the kernel on. It submits the jobs to all the labs, which will then only run the tests for which they have the necessary hardware platform. We have contributed a number of patches to lava-ci, adding support for the new platforms we had in our lab.

LAVA overall architecture

Reporting test results

After KernelCI has built the kernel, sent jobs to contributing labs and LAVA has run the jobs, KernelCI will then get the tests results from the labs, aggregate them on its website and notify maintainers of errors via a mailing list.

Challenges encountered

As in any project, we stumbled on some difficulties. The biggest problems we had to take care of were board-specific problems.

Some boards like the Marvell RD-370 need a rising edge on a pin to boot, meaning we cannot avoid pressing the reset button between each boot. To work out this problem, we had to customize the hardware (swap resistors) to bypass this limitation.

Some other boards lose their serial connection. Some lose it when resetting their power but recover it after a few seconds, problem we found acceptable to solve by infinitely reconnecting to the serial. However, we still have a problem with a few boards which randomly close their serial connection without any reason. After that, we are able to connect to the serial connection again but it does not send any character. The only way to get it to work again is to physically re-plug the cable used by the serial connection. Unfortunately, we did not find yet a way to solve this bug.

The Linux kernel of our server refused to bind more than 13 USB devices when it was time to create a second drawer of boards. After some research, we found out the culprit was the xHCI driver. In modern computers, it is possible to disable xHCI support in the BIOS but this option was not present in our server’s BIOS. The solution was to rebuild and install a kernel for the server without the xHCI driver compiled. From that day, the number of USB devices is limited to 127 as in the USB specification.

Conclusion

We have now 35 boards in our lab, with some being the only ones represented in KernelCI. We encourage anyone, hobbyists or companies, to contribute to the effort of bringing continuous integration of the Linux kernel by building your own lab and adding as many boards as you can.

December 07, 2016

Many years ago, in the aftermath of Openmoko shutting down, fellow
former Linux kernel hacker Werner Almesberger
was working on an IEEE 802.15.4 (WPAN) adapter for the
Ben Nanonote.

As a spin-off to that, the ATUSB device was
designed: A general-purpose open hardware (and FOSS firmware + driver)
IEEE 802.15.4 adapter that can be plugged into any USB port.

This adapter has received a mainline Linux kernel driver written by
Werner Almesberger and Stefan Schmidt, which was eventually merged into
mainline Linux in May 2015 (kernel v4.2 and later).

Earlier in 2016, Stefan Schmidt (the current ATUSB Linux driver
maintainer) approached me about the situation that ATUSB hardware was
frequently asked for, but currently unavailable in its
physical/manufactured form. As we run a shop with smaller electronics
items for the wider Osmocom community at sysmocom, and we also
frequently deal with contract manufacturers for low-volume electronics
like the SIMtrace device anyway, it was easy to say "yes, we'll do it".

As a result, ready-built, programmed and tested ATUSB devices are now
finally available from the sysmocom webshop

Note: I was never involved with the development of the ATUSB hardware,
firmware or driver software at any point in time. All credits go to
Werner, Stefan and other contributors around ATUSB.

December 06, 2016

In a previous life I used to do a lot of IT security work, probably even
at a time when most people had no idea what IT security actually is. I
grew up with the Chaos Computer Club, as it was a great place to meet
people with common interests, skills and ethics. People were hacking
(aka 'doing security research') for fun, to grow their skills, to
advance society, to point out corporate stupidities and to raise
awareness about issues.

I've always shared any results worth noting with the general public.
Whether it was in RFID security, on GSM security, TETRA security, etc.

Even more so, I always shared the tools, creating free software
implementations of systems that - at that time - were very difficult to
impossible to access unless you worked for the vendors of related
device, who obviously had a different agenda then to disclose security
concerns to the general public.

Publishing security related findings at related conferences can be
interpreted in two ways:

On the one hand, presenting at a major event will add to your
credibility and reputation. That's a nice byproduct, but that shouldn't
be the primarily reason, unless you're some kind of a egocentric stage
addict.

On the other hand, presenting findings or giving any kind of
presentation or lecture at an event is a statement of support for that
event. When I submit a presentation at a given event, I think carefully
if that topic actually matches the event.

The reason that I didn't submit any talks in recent years at CCC events
is not that I didn't do technically exciting stuff that I could talk
about - or that I wouldn't have the reputation that would make people
consider my submission in the programme committee. I just thought there
was nothing in my work relevant enough to bother the CCC attendees with.

So when Holger 'zecke' Freyther and I chose to present about our recent
journeys into exploring modern cellular modems at the annual Chaos
Communications Congress, we did so because the CCC Congress is the right
audience for this talk. We did so, because we think the people there
are the kind of community of like-minded spirits that we would like to
contribute to. Whom we would like to give something back, for the many
years of excellent presentations and conversations had.

So far so good.

However, in 2016, something happened that I haven't seen yet in my 17
years of speaking at Free Software, Linux, IT Security and other
conferences: A select industry group (in this case the GSMA) asking me
out of the blue to give them the talk one month in advance at a private
industry event.

I could hardly believe it. How could they? Who am I? Am I spending
sleepless nights and non-existing spare time into security research of
cellular modems to give a free presentation to corporate guys at a
closed industry meeting? The same kind of industries that create the
problems in the first place, and who don't get their act together in
building secure devices that respect people's privacy? Certainly not.
I spend sleepless nights of hacking because I want to share the results
with my friends. To share it with people who have the same passion,
whom I respect and trust. To help my fellow hackers to understand
technology one step more.

If that kind of request to undermine the researcher/authors initial
publication among friends is happening to me, I'm quite sure it must be
happening to other speakers at the 33C3 or other events, too. And that
makes me very sad. I think the initial publication is something that
connects the speaker/author with his audience.

Let's hope the researchers/hackers/speakers have sufficiently strong
ethics to refuse such requests. If certain findings are initially
published at a certain conference, then that is the initial publication.
Period. Sure, you can ask afterwards if an author wants to repeat the
presentation (or a similar one) at other events. But pre-empting the
initial publication? Certainly not with me.

I offered the GSMA that I could talk on the importance of having FOSS
implementations of cellular protocol stacks as enabler for security
research, but apparently this was not to their interest. Seems like all
they wanted is an exclusive heads-up on work they neither commissioned
or supported in any other way.

And btw, I don't think what Holger and I will present about is all that
exciting in the first place. More or less the standard kind of security
nightmares. By now we are all so numbed down by nobody considering
security and/or privacy in design of IT systems, that is is hardly any
news. IoT how it is done so far might very well be the doom of
mankind. An unstoppable tsunami of insecure and privacy-invading
devices, built on ever more complex technology with way too many
security issues. We shall henceforth call IoT the Industry of
Thoughtlessness.

December 03, 2016

On CD4049 you can see 6 independent inverters, each having 3 inverters connected in series with increasing gate width on each stage - this helps to achieve higher speed and lower input capacitance. Gate length is 6µm, so it is probably the slowest CMOS circuit one can ever see. Gates are metal (i.e. not self-aligned silicon) which are again the slower type at that time.

A complex system like NeTV2 consists of several layers of design. About a month ago, we pushed out the PCB design. But a PCB design alone does not a product make: there’s an FPGA design, firmware for the on-board MCU, host drivers, host application code, and ultimately layers in the cloud and beyond. We’re slowly working our way from the bottom up, assembling and validating the full system stack. In this post, we’ll talk briefly about the FPGA design.

This design targets an Artix-7 XC7A50TCSG325-2 FPGA. As such, I opted to use Xilinx’s native Vivado design flow, which is free to download and use, but not open source. One of Vivado’s more interesting features is a hybrid schematic/TCL design flow. The designs themselves are stored as an XML file, and dynamically rendered into a schematic. The schematic itself can then be updated and modified by using either the GUI or TCL commands. This hybrid flow strikes a unique balance between the simplicity and intuitiveness of designing with a schematic, and the power of text-based scripting.

Above: top-level schematic diagram of the NeTV2 FPGA reference design as rendered by the Vivado tools

However, the main motivation to use Vivado is not the design entry methodology per se. Rather, it is Vivado’s tight integration with the AXI IP bus standard. Vivado can infer AXI bus widths, address space mappings, and interconnect fabric topology based on the types of blocks that are being strung together. The GUI provides some mechanisms to tune parameters such as performance vs. area, but it’s largely automatic and does the right thing. Being able to mix and match IP blocks with such ease can save months of design effort. However, the main downside of using Vivado’s native IP blocks is they are area-inefficient; for example, the memory-mapped PCI express block includes an area-intensive slave interface which is synthesized, placed, and routed — even if the interface is totally unused. Fortunately many of the IP blocks compile into editable verilog or VHDL, and in the case of the PCI express block the slave interface can be manually excised after block generation, but prior to synthesis, reclaiming the logic area of that unused interface.

Using Vivado, I’m able to integrate a PCI-express interface, AXI memory crossbar, and DDR3 memory controller with just a few minutes of effort. With similar ease, I’ve added in some internal AXI-mapped GPIO pins to provide memory-mapped I/O within the FPGA, along with a video DMA master which can format data from the DDR3 memory and stream it out as raster-synchronous RGB pixel data. All told, after about fifteen minutes of schematic design effort I’m positioned to focus on coding my application, e.g. the HDMI decode/encode, HDCP encipher, key extraction, and chroma key blender.

Below is the “hierarchical” view of this NeTV2 FPGA design. About 75% of the resources are devoted to the Vivado IP blocks, and about 25% to the custom NeTV application logic; altogether, the design uses about 72% of the XC7A50T FPGA’s LUT resources. A full-custom implementation of the Vivado IP blocks would save a significant amount of area, as well as be more FOSS-friendly, but it would also take months to implement an equivalent level of functionality.

Significantly, the FPGA reference design shared here implements only the “basic” NeTV chroma-key based blending functionality, as previously disclosed here. Although we would like to deploy more advanced features such as alpha blending, I’m unable to share any progress because this operation is generally prohibited under Section 1201 of the DMCA. With the help of the EFF, I’m suing the US government for the right to disclose and share these developments with the general public, but until then, my right to express these ideas is chilled by Section 1201.

The 2016.11 release of Buildroothas been published on November, 30th. The release announcement, by Buildroot maintainer Peter Korsgaard, gives numerous details about the new features and updates brought by this release. This new release provides support for using multiple BR2_EXTERNAL directories, gives some important updates to the toolchain support, adds default configurations for 9 new hardware platforms, and 38 new packages were added.

On a total of 1423 commits made for this release, Free Electrons contributed a total of 253 commits:

Romain Perier contributed a package for the AMD Catalyst proprietary driver. Such drivers are usually not trivial to integrate, so having a ready-to-use package in Buildroot will really make it easier for Buildroot users who use hardware with an AMD/ATI graphics controller. This package provides both the X.org driver and the OpenGL implementation. This work was sponsored by one of Free Electrons customer.

Gustavo Zacarias mainly contributed a large set of patches that do a small update to numerous packages, to make sure the proper environment variables are passed. This is a preparation change to bring top-level parallel build in Buildroot. This work was also sponsored by another Free Electrons customer.

Thomas Petazzoni did contributions in various areas:

Added a DEVELOPERS file to the tree, to reference which developers are interested by which architectures and packages. Not only it allows the developers to be Cc’ed when patches are sent on the mailing list (like the get_maintainers script does), but it also used by Buildroot autobuilder infrastructure: if a package fails to build, the corresponding developer is notified by e-mail.

Misc updates to the toolchain support: switch to gcc 5.x by default, addition of gcc patches needed to fix various issues, etc.

Numerous fixes for build issues detected by Buildroot autobuilders

In addition to contributing 104 commits, Thomas Petazzoni also merged 1095 patches from other developers during this cycle, in order to help Buildroot maintainer Peter Korsgaard.

Finally, Free Electrons also sponsored the Buildroot project, by funding the meeting location for the previous Buildroot Developers meeting, which took place in October in Berlin, after the Embedded Linux Conference. See the Buildroot sponsors page, and also the report from this meeting. The next Buildroot meeting will take place after the FOSDEM conference in Brussels.

Linux.conf.au, which takes place every year in January in Australia or New Zealand, is a major event of the Linux community. Free Electrons already participated to this event three years ago, and will participate again to this year’s edition, which will take place from January 16 to January 20 2017 in Hobart, Tasmania.

This time, Free Electrons CTO Thomas Petazzoni will give a talk titled A tour of the ARM architecture and its Linux support, in which he will share with LCA attendees what is the ARM architecture, how its Linux support is working, what the numerous variants of ARM processors and boards mean, what is the Device Tree, the ARM specific bootloaders, and more.

November 29, 2016

The Ware for October 2016 is a hard drive read head, from a 3.5″ Toshiba hard drive that I picked out of a trash heap. The drive was missing the cover which bore the model number, but based on the chips used on its logic board, the drive was probably made between 2011-2012. This photo was taken at about 40x magnification. Congrats to Jeff Epler for nailing the ware as the first guesser, email me for your prize!

As stated in a previous blog post, we officially launched our lab on 2016, April 25th and it is contributing to KernelCI since then. In a series of blog post, we’d like to present in details how our lab is working, starting with this first blog post that details the hardware infrastructure of our lab.

Introduction

In a lab built for continuous integration, everything has to be fully automated from the serial connections to power supplies and network connections.

To gather as much information as we can get to establish the specifications of the lab, our engineers filled a spreadsheet with all boards they wanted to have in the lab and their specificities in terms of connectors used the serial port communication and power supply. We reached around 50 boards to put into our lab. Among those boards, we could distinguish two different types:

boards which are powered by an ATX power supply,

boards which are powered by different power adapters, providing either 5V or 12V.

Another design criteria was that we wanted to easily allow our engineers to take a board out of the lab or to add one. The easier the process is, the better the lab is.

Home made cabinet

To meet the size constraints of Free Electrons office, we had to make the lab fit in a 100cm wide, 75cm deep and 200cm high space. In order to achieve this, we decided to build the lab as a large home made cabinet, with a number of drawers to easily access, change or replace the boards hosted in the lab. As some of our boards provide PCIe connectors, we needed to provide enough height for each drawer, and after doing a few measurements, decided that a 25cm height for our drawers would be fine. With a total height of 200cm, this gives a maximum of 8 drawers.

In addition, it turns out that most of our boards powered by ATX power supplies are rather large in size, while the ones powered by regular power adapters are usually much smaller. In order to simplify the overall design, we decided that all large boards would be grouped together on a given set of drawers, and all small boards would be grouped together on another set of drawers: i.e we would not mix large and small boards in the same drawer. With the 100cm x 75cm size limitation, this meant a drawer for small boards could host up to 8 boards, while a drawer for large boards could host up to 4 boards. From the spreadsheet containing all the boards supposed to be in the lab, we eventually decided there would be 3 large drawers for up to 12 large boards and 5 small drawers for up to 40 small or medium-sized boards.

Furthermore, since the lab will host a server and a lot of boards and power supplies, potentially producing a lot of heat, we have to keep the lab as open as it can be while making sure it is strong enough to hold the drawers. We ended up building our own cabinet, made of wood bought from the local hardware store.

We also want the server to be part of the lab. We already have a small piece of wood to strengthen the lab between the fourth and sixth drawers we could use to fix the server. We decided to give a mini-PC (NUC-like) a try, because, after all, it’s only communicating with the serial of each board and serving files to them. Thus, everything related to the server is fixed and wired behind the lab.

Make the lab autonomous

What continuous integration for the Linux kernel typically needs are control of:

the power for each board

serial port connection

a way to send files to test, typically the kernel image and associated files

In Free Electrons lab, these different tasks are handled by a dedicated server, itself hosted in the lab.

Serial port control

Serial connections are mostly handled via USB on the server side but there are many different connectors on the target side (in our lab, we have 6 different connectors: DE9, microUSB, miniUSB, 2.54″ male pins, 2.54″ female pins and USB-B). Therefore, our server has to have a physical connection with each of the 50 boards present in the lab. The need for USB hubs is then obvious.

Since we want as few cables connecting the server and the drawers as possible, we decided to have one USB hub per drawer, be it a large drawer or a small drawer. In a small drawer, up to 8 boards can be present, meaning the hub needs at least 8 USB ports. In a large drawer, up to 4 serial connections can be needed so smaller and more common USB hubs can do the work. Since the serial connection may draw some current on the USB port, we wanted all of our USB hubs to be powered with a dedicated power supply.

All USB hubs are then connected to a main USB hub which in turn is connected to our server.

Power supply control

Our server needs to control each board’s power to be able to automatically power on or off a board. It will power on the board when it needs to test a new kernel on it and power it off at the end of the test or when the kernel has frozen or could not boot at all.

In terms of power supplies, we initially investigated using Ethernet-controlled multi-sockets (also called Switched PDU), such as this device. Unfortunately, these devices are quite expensive, and also often don’t provide the most appropriate connector to plug the cheap 5V/12V power adapters used by most boards.

So, instead, and following a suggestion from Kevin Hilman (one of KernelCI’s founder and maintainer), we decided to use regular ATX power supplies. They have the advantage of being inexpensive, and providing enough power for multiple boards and all their peripherals, potentially including hard drives or other power-hungry peripherals. ATX power supplies also have a pin, called PS_ON#, which when tied to the ground, powers up the ATX power supply. This easily allows to turn an ATX power supply on or off.

In conjunction with the ATX power supplies, we have a selected Ethernet-controlled relay board, the Devantech ETH008, which contains 8 relays that can be remote controlled over the network.

This gives us the following architecture:

For the drawers with large boards powered by ATX directly, we have one ATX power supply per board. The PS_ON pin from the ATX power supply is cut and rewired to the Ethernet controlled relay. Thanks to the relay, we control if PS_ON is tied to the ground or not. If it’s tied to the ground, then the board boots, when it’s untied from the ground, the board is powered off.

For the drawers with small boards, we have a single ATX power supply per drawer. The 12V and 5V rails from the ATX power supply are then dispatched through the 8-relay board, then connected to the appropriate boards, through DC barrel or mini-USB/micro-USB cables, depending on the board. The PS_ON is always tied to the ground, so those ATX power supplies are constantly on.

In addition, we have added a bit of over-voltage protection, by adding transient-voltage-suppression diodes for each voltage output in each drawer. These diodes will absorb all the voltage when it exceeds the maximum authorized value and explode, and are connected in parallel in the circuit to protect.

Network connectivity

As part of the continuous integration process, most of our boards will have to fetch the Linux kernel to test (and potentially other related files) over the network through TFTP. So we need all boards to be connected to the server running the continuous integration software.

Since a single 52 port switch is both fairly expensive, and not very convenient in terms of wiring in our situation, we instead opted for adding 8-port Gigabit switches to each drawer, all of them being connected via a central 16-port Gigabit switch located at the back of the home made cabinet. This central switch not only connects the per-drawer switches, but also the server running the continuous integration software, and the wider Internet.

In-drawer architecture: large boards

A drawer designed for large boards, powered by an ATX power supply contains the following components:

Up to four boards

Four ATX power-supplies, with their PS_ON# connected to an 8-port relay controller. Only 4 of the 8 ports are used on the relay.

One 8-port Ethernet-controlled relay board.

One 4-port USB hub, connecting to the serial ports of the four boards.

One 8-port Ethernet switch, with 4 ports used to connect to the boards, one port used to connect to the relay board, and one port used for the upstream link.

One power strip to power the different components.

Large drawer example scheme

Large drawer in the lab

In drawer architecture: small boards

A drawer designed for small boards contains the following components:

Up to eight boards

One ATX power-supply, with its 5V and 12V rails going through the 8-port relay controller. All ports in the relay are used when 8 boards are present.

One 8-port Ethernet-controlled relay board.

One 10-port USB hub, connecting to the serial ports of the eight boards.

Two 8-port Ethernet switches, connecting the 8 boards, the relay board and an upstream link.

One power strip to power the different components.

Small drawer example scheme

Small drawer in the lab

Server

At the back of the home made cabinet, a mini PC runs the continuous integration software, that we will discuss in a future blog post. This mini PC is connected to:

A main 16-port Gigabit switch, itself connected to all the Gigabit switches in the different drawers

A main USB hub, itself connected to all the USB hubs in the different drawers

As expected, this allows the server to control the power of the different boards, access their serial port, and provide network connectivity.

Detailed component list

If you’re interested by the specific components we’ve used for our lab, here is the complete list, with the relevant links:

Conclusion

Hopefully, sharing these details about the hardware architecture of our board farm will help others to create a similar automated testing infrastructure. We are of course welcoming feedback on this hardware architecture!

Stay tuned for our next blog post about the software architecture of our board farm.

November 27, 2016

In 2006 I first visited Taiwan. The reason back then was Sean Moss-Pultz
contacting me about a new Linux and Free Software based Phone that he
wanted to do at FIC in Taiwan. This later became the Neo1973 and
the Openmoko project and finally became part
of both Free Software as well as smartphone history.

Ten years later, it might be worth to share a bit of a retrospective.

It was about building a smartphone before Android or the iPhone existed
or even were announced. It was about doing things "right" from a Free
Software point of view, with FOSS requirements going all the way down to
component selection of each part of the electrical design.

Of course it was quite crazy in many ways. First of all, it was a
bunch of white, long-nosed western guys in Taiwan, starting a company
around Linux and Free Software, at a time where that was not really
well-perceived in the embedded and consumer electronics world yet.

It was also crazy in terms of the many cultural 'impedance mismatches',
and I think at some point it might even be worth to write a book about
the many stories we experienced. The biggest problem here is of course
that I wouldn't want to expose any of the companies or people in the
many instances something went wrong. So probably it will remain a
secret to those present at the time :/

In any case, it was a great project and definitely one of the most
exciting (albeit busy) times in my professional career so far. It was
also great that I could involve many friends and FOSS-compatriots from
other projects in Openmoko, such as Holger Freyther, Mickey Lauer,
Stefan Schmidt, Daniel Willmann, Joachim Steiger, Werner Almesberger,
Milosch Meriac and others. I am happy to still work on a daily basis
with some of that group, while others have moved on to other areas.

I think we all had a lot of fun, learned a lot (not only about Taiwan),
and were working really hard to get the hardware and software into
shape. However, the constantly growing scope, the [for western terms]
quite unclear and constantly changing funding/budget situation and the
many changes in direction have ultimately lead to missing the market
opportunity. At the time the iPhone and later Android entered the
market, it was too late for a small crazy Taiwanese group of
FOSS-enthusiastic hackers to still have a major impact on the landscape
of Smartphones. We tried our best, but in the end, after a lot of hype
and publicity, it never was a commercial success.

What's more sad to me than the lack of commercial success is also the
lack of successful free software that resulted. Sure, there were some
u-boot and linux kernel drivers that got merged mainline, but none of
the three generations of UI stacks (GTK, Qt or EFL based), nor the GSM
Modem abstraction gsmd/libgsmd nor middleware (freesmartphone.org) has
manage to survive the end of the Openmoko company, despite having
deserved to survive.

Probably the most important part that survived Openmoko was the
pioneering spirit of building free software based phones. This spirit
has inspired pure volunteer based projects like
GTA04/Openphoenux/Tinkerphone, who have achieved extraordinary results -
but who are in a very small niche.

What does this mean in practise? We're stuck with a smartphone world in
which we can hardly escape any vendor lock-in. It's virtually
impossible in the non-free-software iPhone world, and it's difficult in
the Android world. In 2016, we have more Linux based smartphones than
ever - yet we have less freedom on them than ever before. Why?

the amount of hardware documentation on the processors and chipsets to
day is typically less than 10 years ago. Back then, you could still
get the full manual for the S3C2410/S3C2440/S3C6410 SoCs. Today,
this is not possible for the application processors of any vendor

the tighter integration of application processor and baseband
processor means that it is no longer possible on most phone designs to
have the 'non-free baseband + free application processor' approach
that we had at Openmoko. It might still be possible if you designed
your own hardware, but it's impossible with any actually existing
hardware in the market.

Google blurring the line between FOSS and proprietary code in the
Android OS. Yes, there's AOSP - but how many features are lacking?
And on how many real-world phones can you install it? Particularly
with the Google Nexus line being EOL'd? One of the popular exceptions
is probably
Fairphone2 with it's alternative AOSP operating system,
even though that's not the default of what they ship.

The many binary-only drivers / blobs, from the graphics stack to wifi
to the cellular modem drivers. It's a nightmare and really scary if
you look at all of that, e.g. at the binary blob downloads for
Fairphone2
to get an idea about all the binary-only blobs on a relatively current
Qualcomm SoC based design. That's compressed 70 Megabytes, probably
as large as all of the software we had on the Openmoko devices back
then...

So yes, the smartphone world is much more restricted, locked-down and
proprietary than it was back in the Openmoko days. If we had been more
successful then, that world might be quite different today. It was a
lost opportunity to make the world embrace more freedom in terms of
software and hardware. Without single-vendor lock-in and proprietary
obstacles everywhere.

November 24, 2016

During the past 16 years I have been playing a lot with a variety of
embedded devices.

One of the most important tasks for debugging or analyzing embedded
devices is usually to get access to the serial console on the UART of
the device. That UART is often exposed at whatever logic level the main
CPU/SOC/uC is running on. For 5V and 3.3V that is easy, but for ever
more and more unusual voltages I always had to build a custom cable or a
custom level shifter.

In 2016, I finally couldn't resist any longer and built a multi-voltage
USB UART adapter.

This board exposes two UARTs at a user-selectable voltage of 1.8, 2.3,
2.5, 2.8, 3.0 or 3.3V. It can also use whatever other logic voltage
between 1.8 and 3.3V, if it can source a reference of that voltage from
the target embedded board.

Rather than just building one for myself, I released the design as open
hardware under CC-BY-SA license terms. Full schematics + PCB layout
design files are available. For more information see
http://osmocom.org/projects/mv-uart/wiki

There are plenty of cellular modems on the market in the mPCIe form
factor.

Playing with such modems is reasonably easy, you can simply insert them
in a mPCIe slot of a laptop or an embedded device (soekris, pc-engines
or the like).

However, many of those modems actually export interesting signals like
digital PCM audio or UART ports on some of the mPCIe pins, both in
standard and in non-standard ways. Those signals are inaccessible in
those embedded devices or in your laptop.

So I built a small break-out board which performs the basic function of
exposing the mPCIe USB signals on a USB mini-B socket, providing power
supply to the mPCIe modem, offering a SIM card slot at the bottom, and
exposing all additional pins of the mPCIe header on a standard 2.54mm
pitch header for further experimentation.

November 08, 2016

Last month, the entire Free Electrons engineering team attended the Embedded Linux Conference Europe in Berlin. The slides and videos of the talks have been posted, including the ones from the seven talks given by Free Electrons engineers:

November 05, 2016

This is unidentified photo-sensor from DVD-RW drive. Most of the work is done by middle quad - it can receive the signal, track focus (via astigmatic focusing) and follow the track. Additional quads are probably here to improve tracking, they are not used as full quads - there are fewer outputs for left and right quads.

November 04, 2016

The MP2 AWD “All Wheel Drive” edition is now available for order. The MP2 AWD represents a big step forward for the Mesh Potato. It is based on the same core as the MP2 Phone and is packaged in an outdoor enclosure with additional features and capabilities, most notably a second radio capable of 2T2R (MIMO) operation on 2.4 and 5GHz bands. It also has an internal USB port as well as an SD card slot. This opens up the possibilities for innovation. The SD slot can host cached content such as World Possible’s Rachel Offline project or any locally important content. The USB port is available for a variety of uses such as 3G/4G modem for backhaul or backup.

The MP2 AWD is also easier to deploy than previous models as power, data, and telephony have been integrated into a single ethernet connection thanks to the PoE/TL adaptor that is shipped with the device. Now both phone, data, and power are all served via a single cable.

The default user setup for the MP2 AWD is to use the 2.4GHz radio for local hotspot access and the 5GHz radio to create the backbone network on the mesh but it can be configured to suit a variety of scenarios.

The MP2 AWD has the following features:

Everything already included in MP2 Phone including:

Atheros AR9331 SoC with a 2.4GHz 802.11n 1×1 router in a single chip

Internal antenna for 2.4GHz operation

FXS port based on Silicon Labs Si3217x chipset

16/64MB flash/ram memory configuration

Two 100Base-T Ethernet ports

High-speed UART for console support

A second radio module based on the MediaTek/Ralink RT5572 chipset which supports IEEE 802.11bgn 2T2R (2×2 MIMO) operation on 2.4 and 5 GHz bands.

Internal USB port which can be used for a memory device , GSM 3/4G dongle or other USB devices.

PoE/TL adaptor which will carry Voice/Data/Power via a single Cat5/6 cable to the MP2 AWD. Similar to a passive PoE connector but also carries voice telephone line connection allowing phone to be plugged in remotely from MP2 AWD

I adopted an add-in card format to allow end users to pick the cost/performance trade-off that suited their application the best. Some users require only a text overlay (NeTV’s original design scenario); but others wanted to blend HD video and 3D graphics, which would require a substantially more powerful and expensive CPU. An add-in card allows users to plug into anything from an economical $60 all-in-one, to a fully loaded gaming machine. The kosagi forum has an open thread for NeTV2 discussion.

October 30, 2016

I like this one because not only is it exquisitely engineered, it’s also aesthetically pleasing.

Sorry for the relative radio silence on the blog — been very heads down the past couple months grinding through several major projects, including my latest book, “The Hardware Hacker”, which is on-track to hit shelves in a couple of months!

October 29, 2016

K140UD2B is an old Soviet opamp without internal frequency compensation. Similar to RCA CA3047T. ICs manufactured in ~1982 have bare die in metal can, ones manufactured in 1988 - have some protective overcoat inside metal can (which is quite unusual).Die size 1621x1615 µm.

October 28, 2016

My intel SSD failed. Hard. As in: its content got wiped. But before getting way too theatrical, let’s stick to the facts first.

I upgraded my Lenovo ThinkPad X1 Carbon with a bigger SSD in the late summer this year — a 1TB intel 540s (M.2).

The BIOS of ThinkPads (and probably other brands as well) offer to secure your drive with an ATA password. This feature is part of the ATA specification and was already implemented and used back in the old IDE times (remember the X-BOX 1?).

With such an ATA password set, all read/write commands to the drive will be ignored until the drive gets unlocked. There’s some discussion about whether ATA passwords should or shouldn’t be used — personally I like the idea of $person not being able to just pull out my drive, modify its unencrypted boot record and put it back into my computer without me noticing.

In regard of current SSDs the ATA password doesn’t just lock access to the drive but also plays part in the FDE (full disk encryption) featured by modern SSDs — but back to what actually happened…

As people say, it’s good practice to frequently(TM) change passwords. So I did with my ATA password.

And then it happened. My data was gone. All of it. I could still access the SSD with the newly set password but it only contained random data. Even the first couple of KB, which were supposed to contain the partition table as well as unencrypted boot code, magically seem to have been replaced with random data. Perfectly random data.

So, what happened? Back to FDE of recent SSDs: They perform encryption on data written to the drive (decryption on reads, respectively) — no matter if you want it or not.
Encrypted with a key stored on the device — with no easy way of reading it out (hence no backup). This is happening totally transparently; the computer the device is connected to doesn’t have to care about that at all.

And the ATA password is used to encrypt the key the actual data on the drive is encrypted with. Password encrypts key encrypts data.

Back to my case: No data, just garbage. Perfectly random garbage. First idea on what happened, as obvious as devastating: the data on the drive gets read and decrypted with a different key than it initially got written and encrypted with. If that’s indeed the case, my data is gone.

This behaviour is actually advertised as a feature. intel calls it “Secure Erase“. No need to override your drive dozens of times like in the old days — therewith ensuring the data is irreversible vanished in the end. No, just wipe the key your data is encrypted with and done. And exactly this seems to have happened to me. I am done.

Fortunately I made backups. Some time ago. Quite some time ago. Of a few directories. Very few. Swearing. Tears. I know, I know, I don’t deserve your sympathies (but I’d still appreciate!).

Anger! Whose fault is it?! Who to blame?!

Let’s check the docs on ATA passwords, which appear to be very clear — from the official Lenovo FAQ:

“Will changing the Master or User hard drive password change the FDE key?”
– “No. The hard drive passwords have no effect on the encryption key. The passwords can safely be changed without risking loss of data.”

Not my fault! Yes! Wait, another FAQ entry says:

“Can the encryption key be changed?”
– “The encryption key can be regenerated within the BIOS, however, doing so will make all data inaccessible, effectively wiping the drive. To generate a new key, use the option listed under Security -> Disk Encryption HDD in the system BIOS.”

Double-checking the BIOS if I unintentionally told my BIOS to change the FDE key. No, I wasn’t even able to find such a setting.

The first link points to an ISO file. Works for me! Until it crashes. Reproducibly. This ISO reproducibly crashes my Lenovo X1 Carbon 3rd generation. Booting from USB thumb-drive (officially supported it says), as well as from CD. Hm.

For now I seem to have to conclude with the following questions:

Why there’s not I can’t find a damn thing about this bug in the media?

Why did intel delete its tweets referencing this bug?

Why does the firmware-updater doesn’t do much despite crashing my computer?

Why didn’t I do proper backups?!

How do I get my data back?!?1ß11

PS: Before I clicked the Publish button I again set up a few search queries. Found my tweets.

October 24, 2016

Most of the CMOS image sensors have Electronic Rolling Shutter – the images are acquired by scanning line by line. Their strengths and weaknesses are well known and extremely wide usage made the technology somewhat perfect – Andrey might have already said this somewhere before.

There are CMOS sensors with a Global Shutter BUT (if we take the same optical formats):

because of more elements per pixel – they have lower full well capacity and quantum efficiency

because analog memory is used – they have higher dark current and higher shutter ratio

GRR Snapshot was available in the 10353 cameras but ourselves we never tried it – one should have write directly to the sensor’s register to turn it on. But now it is tested and working in 10393s available through the TRIG (0x14) parameter.

MT9P001 sensor

Further, I will be writing about ON Semi’s MT9P001 image sensor focusing on snapshot modes. The operation modes are described in the sensor’s datasheet. In short:

In ERS Snapshot mode (Fig.1,3), exposure time is constant across all rows but each next row’s exposure start is delayed by tROW (row readout time) from the previous one (and so is the exposure end).

In GRR Snapshot mode (Fig.2,4), the exposure of all rows starts at the same moment but each next row is exposed by tROW longer than the previous one. This mode is good when a flash use is needed.

The difference between ERS Snapshot and Continuous is that in the latter mode the sensor doesn’t wait for a trigger and starts new image while still finishing reading the previous one. It provides the highest frame rate (Fig.5).

Fig.1 Electronic Rolling Shutter (ERS) Snapshot mode

Fig.2 Global Reset Release (GRR) Snapshot mode

Fig.3 ERS mode, whole frame

Fig.4 GRR mode, whole frame

Fig.5 Sensor operation modes, frame sequence

Here are some of the actual parameters of MT9P001:

Parameter

Value

Active pixels

2592h x 1944v

tROW

33.5 μs

Frame readout time (Nrows x tROW)

1944 x 33.5 μs ~ 65 ms

Test setup

The LEDs were powered & controlled by the camera’s external trigger output, the delay and duration of which are programmable.

The flash duration was set to 20 μs to catch, without the motion blur, the fan’s blades are marked with stickers – 5500-8000 RPM that is 0.5-0.96° per 20 μs. There was not enough light from the LEDs, so the setup is placed in dark environment and the camera color gains were set to 8 (ISO ~800-1000) – the images are a bit noisy.

The trigger period was set to 250 ms – and the synced LEDs were blinking for each frame.

The information on how to program the NC393 camera to generate trigger signal, fps, change sensor’s