In the previous post we've seen the problems with the current source at the core of liquidsoap. The point of doing so was to highlight the interest of a more principled design.

We could start with a mathematical notion of stream, but we haven't reached that point yet. As I've illustrated before, sources are inherently interactive objects, which is hard to account for.

Instead, I'll keep the same practical notion of source but do one simplification: we'll work sample by sample. We'll have a #is_ready method that tells whether the source can produce a sample for the current instant. A source is always ready when it is in the middle of a track. And we have the #get method which returns at most one sample, none when the current track ends, and should never be called unless the source #is_ready.

As before, a source is passive by default (it doesn't produce data when not asked to) and caching mechanisms ensure that a source gives consistent answers to several observers. Depending on how/when it is used, a source will generate a stream, possibly undefined at some instants, but otherwise containing samples, metadata and end of track markers. Note that we need more precision than just time to denote a point in such a stream, in order to be able to tell the position relative to end of tracks -- it is possible to have several consecutive end of tracks exactly at the same point in time.

Let us detail caching, which isn't totally trivial. What do we cache: a number N of end of tracks, corresponding to partial #get, possibly followed by a sample. How do we use that cache: for an operator which already had M end of tracks and performs a #get, we return an end of track if M<N, the sample otherwise.

In the current liquidsoap, this M is passed as part of the frame, but there is no way to track which source has produced which end of track. Hence it is possible that M>N, for example in case of a switch which has seen many empty tracks before switching to that source. Even when M<N, it makes it unclear how many end of tracks we should add. I'm not sure whether it is crucial that we keep precise track of who produces which end of track.

What matters is (1) if you listen continuously to a stream, you get exactly all its content and (2) we don't have the first problem with sharing. This is because various operators of a source can only get data at one instant. If I pump it, the source will generate (or find in its cache) an end of track or a sample. It's as simple as this, there's no bad side-effect here. We don't even need to worry about sharing detection, because we can perform caching all the time without changing the stream content -- efficiency isn't an issue for now. (At some point I was worried that sub-instantaneous data is in the cache, i.e. end of tracks, could lead to the same problem as before, but it does not make sense because this data can't be refered to without first generating it: you can't pump a source "after" it has generated end of tracks, you can only pump it, ignore end of tracks, and pump it again.)

For our second problem, we would simply cache the result of #is_ready (rather, rely on the same cache). As with #get, we need to know the number of end of tracks already obtained by the observer to tell him if the source is ready at that precise point.

Some minor remarks:

We can't distinguish a source that is in the middle or just before the beginning of a track. If you don't listen to a source for a while, and the source is ready when you come back to it, there's no way to know if it is about to start something new or if you missed the beginning of the current track. It's never been a problem, I'll assume we can keep living with it.

The implementation of end-of-track notification is still a "partial" #get which may be too ad-hoc. The alternative is to attach an end-of-track tag to the last sample. Looks like a meaningless choice to me.

Note that metadata and end-of-tracks are treated very differently: we never issue a partial #get because of metadata. It corresponds to a view where, end of tracks are "after" the last sample, while metadata is "before" or maybe "inside" samples. We should try to generalize this into a nice notion of event, some being interruptions, some not.

With Samuel, we have explored other design choices. What I present here isn't his favorite, but it's the most convincing to me. It also has the advantage of being fully defined (at least in my mind).

This new model is not revolutionary. But we gained more than it seems. For example, we can now write an operator that takes two sources and sums them, only producing data when both of them are ready, and not pumping any of them when none is ready. The reason is that in the old frame-based design, we can only get data until either the end of a frame or the end of a track. But if one source ends a track, the other will still fill a frame.

Lifting to frames

The real challenge now is to derive an efficient frame-based model that faithfully reflects the simply sample-based model.

I'll start with the biggest problem: sharing. We can't approximate it anymore, or we'll obtain the same problem. Hence I'm going to propose a two-phase pumping: In a first abstract phase we do a dry run of the pumping methods to know who pumps who at what instant. Then we have a precise picture of the sharing and we can do a real run, computing the actual stream content.

This method forbids some source behaviors: for example if a source acts differently based on the actual values of its input samples, it cannot meaningfully take part to the first phase. That would be a problem with a blank detection source. I believe we can relax this enough to get what liquidsoap currently does. The key is to declare some values as "sparse" or "slow", allowing the change of the value (and hence the reaction of the source) to only take place at the end of the frame. For example blank detection would set a flag that only takes effect for the next frame.

For most sources I want to automatically generate the #simulated_get method from a simple #get. This is why compilation is needed now.

Now, I believe I can run that simulation thing. The nice thing is that once I bite the bullet and decide we need this, there's no question about being more modest in other places. In particular, the new #is_ready will be asked not only if a source is ready at a particular point but for how long. This enables a nice frame-based implementation of the precise sum operator described above. Of course, it goes with a #get that takes not only a start but also a stop position.

Plans

There are several things we can do now:

Compile frame-level sources from sample-level sources written in OCaml. This is only about analysing the source, and it does not need be user-friendly. There are a couple tricky analysis to run. For example, a sine source will have a counter decremented at each sample and reset after an end of track, and we'll have to turn this into the natural code for the frame-level sine.

Extend liquidsoap scripts with a notation for sources and compile it down to OCaml objects. We'll have to ensure that the user never sees a type error coming from OCaml, but only errors from us, which should be more readable. The main extra challenge is to run liquidsoap code fast. Some of it can be compiled directly to OCaml but function calls won't match exactly. We'll have to inline or leave some interpreter calls.

Do cross-source optimizations on top of the previous work. If the user writes a sum of sine sources, inline it into a source that computes directly the sums.

Compile to C or even lower level code.

I gave a rather precise proposal of what Step 1 could look like. Still we can discuss some points, try to generalize a few things. My proposition doesn't look as clean as I would like in some places, but it has one great advantage: I'm sure we can program liquidsoap sources with that model, and even do some things currently impossible.

Step 2 presents even more design choices: should users write sources in an OO style like we do? should they work with an explicit notion of time? should time be discrete or continuous?

In this post, I will give a high level picture of the current internals of liquidsoap, show a few problems with them. The interested reader can read further details on the design of liquidsoap in our papers. In the next post, I'll finally describe a better future for liquidsoap.

The source protocol

In order to get a minimum understanding of liquidsoap's streaming model, let us consider the following example:

output(fallback([queue,playlist]))

What we want here is to produce a stream either from the playlist or the queue of requests, and output it somehow. The fallback between our two sources only switches from one to the other for new tracks. For example, if the queue becomes available while we are in the middle of a playlist track, we'll keep playing the playlist track, and only switch to the queue for the next track. In order to obtain the expected behavior this means that (when possible) unused sources are (almost) frozen in time: the queue doesn't start playing its request until the fallback asks it to do so. The stream of a source isn't defined in itself, but computed in interaction with the source's environment. Simply put, it's on-demand streaming.

Now what happens concretely. Streaming is clocked, and at each cycle of the clock we produce a frame, which is simply a buffer of samples (audio, video or MIDI) of fixed size/duration. The filling of the frame starts from the outputs: Our output passes the frame to the fallback, which passes it to the source that is currently playing a track, the frame is filled, passed back to fallback, which passes it back to output, which outputs the frame to the world.

Sometimes, it'll be a little different: When the child source of fallback ends its track, it won't completely fill the frame, which gives the opportunity for the fallback to select another source. The newly selected source isn't used right away. Instead, the partial frame is returned to the above operator, in our case the output, which passes it back because all it wants is a complete frame -- the story would be different with nested fallbacks. When the partial frame comes back, its filling is completed by the newly selected source.

We perform that kind of computation by having sources communicate with each other using just a few methods: we mainly have #get for filling a frame and #is_ready for knowing if a source has a ready track (ongoing or not).

All this works smoothly when sources are organized along a tree, i.e., when there is no sharing. But we really want sharing, as seen in the following example:

source = fallback([queue,playlist])
output_1(source)
output_2(source)

This means that at each cycle of the clock, the source will be asked twice to fill a frame with a segment of its stream. Of course, the source should give the same data twice: we don't want to split our stream among the two outputs, one frame each. To deal with this, we have some caching mechanisms in #get. In more details, we create new sources by inheriting from a base class which defines #get as a public method wrapping the virtual private method #get_frame which can be written without thinking about sharing at all.

Problem #1

Caching everything is really costly, so we have to detect when sharing is needed. The problem is that sometimes, it's not easy to tell statically:

Here it looks like the queue is shared, but in fact the first occurrence can only be used in the morning (am), the second only in the afternoon (pm). While we could try to detect some of those cases, it is hopeless to catch them all. So we're bound to over-approximate sharing. This has been done, and isn't too complicated, although it does introduce some activation/deactivation methods that must be used carefully, especially when using transitions. This is okay regarding performance, but it has one nasty side effect: the approximation can have an impact on the stream!

The scenario is as follows: Take a source S which looks like it could be used by operators A and B. Now A asks S to start filling a frame from the middle (it happens, cf. the fallback example above) but the source sees that it could be also be used by B, perhaps to fill a frame from the beginning. So S fills in its cache from the beginning, and copies the part of it that starts from the middle in A's frame. But if B never uses S, we'll have modified our behavior for nothing.

Problem #2

An even worse problem is that we implemented some caching mechanism for #get but not #is_ready. Now, it is possible that a source fills its cache, has no new track to produce, and hence declares itself as not ready. Which might not be correct from the viewpoint of an operator that wants to access the data from the beginning of the cached frame. As a result of this weakness, I recently had to allow what should be illegal behavior in some operators that rely quite precisely on the readiness of their sources. Of course, this is a slippery slope.

Next time, I'll describe a simple way to construct a better source protocol from first principles, avoiding those problems, and even getting more expressiveness by the way. On the other hand, it'll have to be compiled. Stay tuned.

I recently found a good series of articles on LtU, on the Ghosts of UNIX past. The author reflects on some successes and failures of Unix, trying to find patterns. I enjoyed reading the articles, they teach you a few things, and are generally good food for thought.

The first article describes the "full exploitation" pattern, successfully achieved in UNIX with files. The next articles describe various failures of UNIX design, classifying them in "conflated", "unfixable" and "high-maintenance" designs. While I don't have the culture to do the proposed exercises, I tried to relate this to programming language design -- which suprisingly hasn't been done on LtU.

Full exploitation is when you manage to re-use a concept a lot. This is good because it avoids inventing new things: it's hard enough to get a few concepts right. In programming languages, a good first-class notion of functions can be fully exploited to model control, event handlers, lazy computation, etc.

Conflated designs mix up several concepts. This is bad only if the two concepts cannot be separated. Otherwise, it's just syntactic sugar. Object oriented programming comes to mind: Methods are a conflated design, mixing functions and record members. Also, inheritance mixes record extension and dynamic binding -- which makes it hard for people to think about the latter aspect, a very important source of bugs.

Unfixable designs are unfixable designs. This is a bad definition, but sometimes it is hard to say why something is unfixable. Often, this has to do with the wide adoption of the wrong design. Note that conflated designs may often be fixable, essentially by creating ways to access conflated notions in isolation. I'm not sure what's a good programming language example here. Perhaps dynamic binding, probably stuck forever in emacs lisp. (By the way, a fun fact according to Olivier Danvy: dynamic binding was born from exagerated exploitation of a single stack in the implementation of Lisp -- in those old days, people were psyched about stacks and got a bit carried away.)

High-maintenance designs have nothing wrong in themselves but interfere with other things in such a way that requires a lot of care and work in those areas. With programming, various difficult things come to mind: state, concurrency, dynamic binding... Those aspects make it very hard to reason locally about a piece of code. In themselves, state and concurrency are cool, but their interaction leads to many problems.

More than programming languages, liquidsoap was in the back of my mind. But there does not seem to be interesting examples of those patterns in liquidsoap. In fact, it is not an exceptionally good design. It's just a first shot at making an end-user streaming application that is not just one object/tool but rather a system/language for building tools. Liquidsoap introduces an abstract notion of source, which is good, but does not fully exploit it. For instance, blank detection operators are black boxes that encapsulate the computation of the average audio volume, but a more flexible and efficient design would be achieved by isolating the computation of the volume stream as a source, then write operators that act depending on the characteristics of the volume source. Liquidsoap does not have unfixable designs because we give ourselves the liberty to change things when needed. It does have high-maintenance designs: the source protocol is complex, and we've had manybugs because it wasn't properly followed.

Beyond those not-so-good design examples, liquidsoap is broken in several ways. But I'm comfortable to admit it because it works good enough for many people and we know how to fix it. The code name for that exciting new line of work is, quite straightforwardly, Liquidsoap 2.0. Until now, we've been writing sources by hand in OCaml. Liquidsoap users write scripts in our own scripting language, and can only compose sources, never write a new one. In other words, Liquidsoap is an interpreter for a dedicated script language that manipulates an abstract/opaque notion of source. But liquidsoap 2.0 will be a compiler that lets you write sources, and not only compose them, and will produce optimized code. It will be as simple as it is today for beginners, but will enable much more for power users. Importantly, it will allow us developers to not write as much bug-prone code. There is a lot to say about what could be in liquidsoap 2.0, but I'll start with a simple down-to-earth story... next post!