Welcome to panthema.net

This website is a diverse collection of interesting ideas, thus it is panthematic. It contains free open-source software and projects (FOSS), computer science research results, blog articles and more, all created by myself, Timo Bingmann. Over the years, the amount of information, source code and other content has grown rather large. All entries are ordered chronologically in the weblog, with some special projects highlighted in the following summary:

BlinkenSort with Sound - The Sound of LED Sorting Algorithms with Raspberry Pi 3 and APA102 or SK9822 LEDs

Once BlinkenSort successfully showed fascinatingly complex sorting algorithms on an LED strip using an ESP8266 (see other article), I naturally ventured to add the sound output from the Sound of Sorting program. This turned out much harder than initially thought, because the sound generation software required more processing power than available in the cheap standard microcontrollers. After much trail and error, I found out that a sufficiently new Raspberry Pi (I happened to have a Pi 3 model B) has the rare combination of enough compute power, can drive an LED strip directly via SPI, and has an audio line-out.

The Raspberry Pi, however, cannot drive the same type of LED strip reliably as the ESP8266 can, because the Raspberry Pi runs a time-shared Linux system instead of a real-time program. But it can drive the more expensive APA102 or SK9822 LEDs which have a separate clock line. These are 4-pin LED strips with clock and data, which can easily be attached to the Pi's SPI output pins. Furthermore, the APA102 LEDs can be driven at a much higher refresh rate than the WS2812B and SK6812, due to the extra clock signal. This ultimately makes the sorting animations even smoother than with the ESP8266 (where the frame rate is already unnoticeable). I could not find any off-the-shelf library to drive the APA102 with a C++ program on the Pi, but it was trivial to write a frame buffer class and access the /dev/spi0.0 devices directly.

Then there was the question of adding a display. And after more experimentation, I found the MAX7219 dot LED matrix modules work well. These can also be driven by SPI from the Pi, which actually has two SPI outputs on the models with 40 pins. And as a bonus these dot LED matrix models can pull off an amazing frame rate. That means that besides showing the algorithm name, the super-fast refresh rate enables displaying of (nearly) every comparison counter increment, despite the human viewer only being able to see around 25 changes per second. Simply adding a HDMI display to the Pi would have also worked, but the LED matrix is cooler and has a retro feeling to it.

As you can see in the following YouTube video, each algorithm does something quite different which makes this a very interesting art installation with deep connections to informatics. BlinkenSort currently contains the following eighteen sorting algorithms (listed in the same order as in the video): MergeSort, Insertion Sort, QuickSort (LR) Hoare, QuickSort (LL) Lomoto, QuickSort Dual Pivot, ShellSort, HeapSort, CycleSort, RadixSort-MSD (High First), RadixSort-LSD (Low First), std::sort, std::stable_sort, WikiSort, TimSort, Selection Sort, Bubble Sort, Cocktail-Shaker Sort, and BozoSort. Besides sorting algorithms the collection also contains four hash table implementations: Linear Probing Hash Table, Quadratic Probing Hash Table, Cuckoo-Hashing with two places, and Cuckoo-Hashing with three places.

BlinkenSort - Sorting Algorithms on LEDs with ESP8266 and SK6812 or WS2812B

About two years ago I first got my hands on one of the fancy addressable "Neopixel" LED strips, which can be programmed to show colorful animations. I quickly saw there was one project I just had to do with these strips: use them to display sorting algorithms as animations. Since my Sound of Sorting project already contained the source code for many algorithms, the step to using an addressable LED strip as a "display" was not a large one. The strips can be animated using microcontrollers such as the Arduino or Espressif ESP chips, which have way enough compute power for running sorting algorithms on a few hundred items. Since all sorting algorithms run on random input data and may make random decisions themselves, the shown animations are near infinitely varying and fascinatingly complex.

The outcome of my project of displaying sorting algorithms on LED strips are two pieces of art:

"BlinkenSort", which are sorting algorithms animated on SK6812 RGBW LEDs using an ESP8266 microcontroller, is described in this article.

Today was the ceremonial graduation day on which new bachelors, masters, and also Dr.s (PhDs in the Anglo-Saxon world) were celebrated who received their degrees from the department of informatics at the Karlsruhe Institute of Technology (KIT) within the last year. This year's graduation day coincided with the 50th anniversary of the introduction of computer science as a field of study and as a diploma degree.

I was among those honoured for completing the Dr. last year, (see my dissertation page). In total 44 freshly minted Dr.s, 292 masters, and 261 bachelors where celebrated today.

Furthermore, I was awarded the Uniserv Research-Prize "Algorithms for Efficient Data-Processing" for the best dissertation in the field of fast algorithms in the academic year 2017/2018 at the KIT department of informatics by the talent committee of the department.

I would like to thank Uniserv for endowing the department and my dissertation with this prize and especially the talent committee for selecting my work. It was a great and pleasant surprise to receive such a notification in the morning email inbox, which as usual contained many "prize and distant inheritance" emails. But on that day one of them was real, and I nearly overlooked it.

My dissertation was the central purpose in my life for many years, and receiving the prize helps me feel the time was well spent and consider it as confirmation of having furthered the field of informatics a tiny bit. It is rare to receive such honours and I am truly grateful.

YouTube Video: "Animation of the US Treasury Yield Curve with Inversions from 1962-01-01 to 2019-04-01"

Today I published the first animation of market data by my new charting tool on Youtube. In light of recent events this was an video of inversions of the US Treasury Yield Curve. The video is created with QCustomPlot and a large Qt program to process data.

The dancing green line plots the yields of all constant maturity treasury notes. The trailing blue line shows the efficient 30 day Federal Funds rate. The orange line is the broad stock market SPX index. The background of each day is painted red if an inversion of the 1y/5y, 1y/10y, or 2y/10y yields occurs. The darker the red color, the greater the inversion.

The Federal Funds rate data was taken from FRED (Federal Reserve Bank of St. Louis), series DFF: https://fred.stlouisfed.org/series/DFF. The US Treasury yields data was also taken from FRED, series DGS1MO, DGS3MO, DGS1, DGS2, DGS5, DGS7, DGS10, DGS20, DGS30, e.g. https://fred.stlouisfed.org/series/DGS30. Across the decades the various durations were sometimes not emitted, which is visible by points being added or removed from the green curve. SPX data was taken from Stooq.com, series ^SPX.

NVMe "Disk" Bandwidth and Latency for Batched Block Requests

Last week I had the pleasure of being invited to the Dagstuhl seminar 19111 on Theoretical Models of Storage Systems. I gave a talk on the history of STXXL and Thrill, but also wanted to include some current developments. Most interesting I found is the gap closing between RAM and disk bandwidth due to the (relatively) new Non-Volatile Memory Express (NVMe) storage devices.

Since I am involved in many projects using external memory, I decided to perform a simple set of fundamental experiments to compare rotational disks and newer solid-state devices (SSDs). The results were interesting enough to write this blog article about.

Among the tools of STXXL/FOXXLL there are two benchmarks which perform two distinct access patterns: Scan (benchmark_disks) and Random (benchmark_disks_random).

In Scan a batch of k sequential blocks of size B are read or written in order.

In Random a batch of k randomly selected blocks of size B from a span of size N are read or written.

The Scan experiment is probably the fastest access method as it reads or writes the disk (actually: storage device) sequentially. The Random experiment is good to determine the access latency of the disk as it first has to seek to the block and then transfer the data. Notice that the Random experiment does batched block accesses like one would perform in a query/answering system where the next set of random blocks depends on calculations performed with the preceding blocks (like in a B-Tree). This is a different experiment than done by most "throughput" measurement tools which issue a continuous stream of random block accesses.

Abstract

The suffix array is the key to efficient solutions for myriads of string processing problems in different application domains, like data compression, data mining, or bioinformatics. With the rapid growth of available data, suffix array construction algorithms have to be adapted to advanced computational models such as external memory and distributed computing. In this article, we present five suffix array construction algorithms utilizing the new algorithmic big data batch processing framework Thrill, which allows scalable processing of input sizes on distributed systems in orders of magnitude that have not been considered before.

Having just finished my PhD thesis in computer science (see the corresponding dissertation), I ventured to actually print it as a proper book. In this article I want to share some of my experience with three print-on-demand book publishers:

Disclaimer: these are my experiences, yours will probably be different. And I hope print-on-demand quality will improve further in the future.

Below are macro photographs of the various print proof and other copies I received from the publishers. The photographs were taken with a Samsung Galaxy S9 smartphone with a (cheap) macro lens. Hence the colors and blur in the photos should be considered with caution, but the sharpness and detail level is sufficient for some discussion.

TL;DR: Amazon's print proof from the USA has the nicest print and color, but they only produce paperback covers. IngramSpark's prints are second best, have a lower resolution, but they produce hardcovers and consistent quality.

Amazon Print Proof from the USA

Uploading the PDF to Amazon CreateSpace is straight-forward due to the convenient web interface. The Amazon CreateSpace print proof was manufactured in Lexington, KY, USA, it was shipped three days after ordering, and arrived eleven days after ordering. For an international shipment from the USA to Germany that is a very acceptable delivery time.

The print quality of the Amazon Print Proof was in my option the best. It has the highest resolution, bright colors, and solid black. The paper and entire book has the feel of high-quality color laser printer output. Sadly, they the only produce paperback softcover books and I preferred a hardcover. On the plus side, publishing via Amazon gives you a free CreateSpace-ISBN and it is immediately listed in the world-wide Amazon catalog.

Original PDF

As a comparison to the printed books, the pictures above show the same excepts rendered as a bitmap image. Note that the layout of the words in each cover image may differ, because each cover needed to by typeset individually to the individual specifications of the publisher.

Tutorial on Boost.Spirit at C++ User Group Karlsruhe

On September 12th, 2018, I gave another 90min talk with live-coding examples in German at the C++ User Group Karlsruhe in rooms of the Karlsruhe Institute of Technology (KIT).

This time I was asked to present a more advanced topic around C++ and libraries and I chose to present a tutorial on Boost.Spirit.

Boost.Spirit is a parser and generator template meta-programming framework and maybe one of the most crazy and advanced uses of C++. It enables one to write context-free grammars inline as C++ code, which are translated into recursive descent parsers and fully optimized by the compiler.

This powerful framework is however not easy to get started with. I hope my tutorial helps more people to skip the steep learning curve and use Boost.Spirit for securely parsing user input and other structure data.

The tutorial consisted of a set of introduction slides: slides-2018-09-12-Cpp-Meetup.pdf . Followed by a live-coding session in German which was recorded by the KIT (see below for the youtube video).

The road to a Dr. title (PhD in the Anglo-Saxon world) is often long, rough, and twisted. First you have to do original research, produce novel results, publish articles, and then write and ultimately publish a dissertation. Defending your research in the dissertation in front of a panel of professors is one of the final milestones on that journey.

On July 3rd, I successfully defended my dissertation at the Karlsruhe Institute of Technology and the dissertation text has now been published as a book.

The LaTeX source code for my dissertation is available for download: dissertation-source.zip (791 KiB). The complete text is in one .tex file, all figures (except the creative commons logo) are generated from the LaTeX code.

Dissertation Defense Presentation

The presentation was only part of the whole defense. My actual slide set also had almost 200 more backup slides, which however were collected from all the other talks already available on this homepage. These backup slides helped greatly in the examination question after the presentation.

Note about the new tlx library of Advanced C++ Data Structures and Algorithms

Last year on February 19th, I started a new github repository called tlx with the goal of de-duplicating code from three projects: Thrill, STXXL, and a private project. The idea came up after a STXXL code workshop in Frankfurt (fashionably called hackathons nowadays).

simple but vital std::string manipulation functions missing from the STL.

The initial reason for tlx to come about was to consolidate all the bug fixes to the loser tree implementations that I had scattered across the three projects. Efficient multiway merging is such a fundamental task and there was no universally available C++ library that implements the tournament tree well.

A long search for an appropriate vacant user account with three letters on github lead to "tlx". This is definitely a good C++ namespace name, but to this day, it is unclear what the letters stand for. Template Libraries for CXX? The missing Library for CXX? Template Library and more eXtensions. Have your pick, someday someone will find a good official expansion.

Since its inception, tlx has grown a lot. Its goal is to consolidate algorithms and data structures from multiple projects. In a sense tlx maybe aims to be the Boost for advanced algorithms. The goals and constraints of tlx are:

To have a library of well implemented and tested advanced algorithms and things missing from the C++ STL.

Target high modularity with as little dependencies between modules as possible.