Present Perfect

2013-5-712:53 pm

I’ve never been a fan of voting for talks, because it tends to be poorly implemented under the guise of democracy. Of course it’s easy for me to talk, I’ve never organized anything at that scale.

I’ll give two examples on why I feel this way, one of which triggering today’s blog post.

First off, my colleague Marek submitted a talk to Djangocon. The talk was about how to use feat (a toolkit we wrote for livetranscoding) to serve Django pages, but in such a way that they can use Deferreds to remove the concurrency bottleneck of “1 request at a time” per process running Django.

Personally, to me, this is one of the most irritating design choices of Django – from the ground up it was built synchronously (which could have been fine in most places). But the fact that, when you get a request, you have to always synchronously respond to it (and block every other request for that process in the meantime) is a design choice that could have easily been avoided.

In our particular use case, it was really painful. If our website has to do an API request to some other service we don’t control that can easily take 30 seconds, our process throughput suddenly becomes 2 pages per minute. All the while, the server is sitting there waiting.

Yes, you can throw RAM at the problem and start 30 times more processes; or thread out API requests; or farm it out to Celery, and do some back-and-forthing to see when the call’s done. Or do any other number of workarounds for a fundamental design choice.

Since we like Twisted, we preferred to throw Twisted at the problem, and ended up with something that worked.

Anyway, that’s a lot of setup to explain what the talk was about. Marek submitted the talk to DjangoCon, and honestly I didn’t expect it to get much traction because, when you’re inside Django, you think like Django, and you don’t really realize that this is a real problem. Most people who do realize it switch away to something else.

But to my surprise, Marek’s talk was the most-voted talk! I wish I could link to the results, but of course that vote site is no longer online.

I guess I expected that would mean he’d be presenting at DjangoCon this year. So I asked him today when his talk was, and he said “Oh that’s right. I did not get accepted.”

Well, that was a surprise. Of course, the organising committee reserves the right to decide on their own – maybe they just didn’t like the talk. But if you ask your potential visitors to vote, you’d expect the most-voted talk to make it on the schedule no ?

The feedback Marek got from them was surprising too, though. Their first response was that this talk was too similar to another talk, titled “How to combine JavaScript & Django in a smart way”. Now, I’m not a JavaScript expert, but from the title alone I can already tell that it’s very unlikely that these two talks have many similarities beyond the word ‘Django’.

After refuting that point, their second reason was that they wanted more experienced speakers (but they didn’t ask Marek for his experience), and their third reason was that the talk was in previous editions of DjangoCon US/EU (it’s unclear whether they meant his talk or the JavaScript one, but Marek’s definitely wasn’t, and we couldn’t find any mention of the other talk in previous conferences. I’m also not sure why that even matters one way or the other. This email thread was in Polish, so I have to rely on Marek’s interpretation of it)

Personally, my reaction would have been to complain to the organizers or Django maintainers. Marek’s flegmatic attitude was much better though – after such an exchange, he simply doesn’t want to have anything to do with the conference.

He’s probably right – it’s hard to argue with someone who doesn’t want to invite you and is lying about the reasons.

The second example is BCNDevCon, a great conference here in Barcelona, organized by a guy who used to work for Flumotion who I have enormous respect for. I’ve never seen anyone create such a big conference over so little time.

He believes strongly in the democratic aspect, and as far as I can tell constructs the schedule solely based on the votes.

Sadly I didn’t go to the last one, and the reason is simply because I felt that the talks that made it were too obviously corporate. A lot of talks were about Microsoft products, and you could tell that they won votes because people’s coworkers voted on talks. I’m not saying that’s necessarily wrong – given that he worked at our company and has friends here, I’m sure people working here presenting at his conference have also done vote tending. It’s natural to do so. But there should be a way to balance that out.

I think the idea of voting is good, but implementation matters too. Ideally, you would only want people that actually are going to show up to vote. I have no idea how you can ensure that, though. Do you ask people to pre-pay ? Do you ask them to commit to pay if at least 50% of their votes make it in the final schedule, kickstarter-style ?

These two examples are on opposite extremes of voting. One conference simply disregards completely what people vote on. If I had voted or bought a ticket, I would feel lied to. Why waste the time of so many people? The other conference puts so much stock in the vote, that I feel the final result was strongly affected. I seriously doubt all those Windows 8 voters actually showed up.

Does anyone have good experiences with conference voting that did work? Feel free to share!

2012-1-2611:16 am

I’m in the quiet town of Malaga these three days to attend the GStreamer hackfest. The goal is to port applications over to the 0.11 API which will eventually be 1.0 There’s about 18 people here, which is a good number for a hackfest.

The goal for me is to figure out everything that needs to be done to have Flumotion working with GStreamer 0.11. It looks like there is more work than expected, since some of the things we rely on haven’t been ported successfully.

Luckily back in the day we spent quite a bit of time to layer parts as best as possible so they don’t depend too much on each other. Essentially, Flumotion adds a layer on top of GStreamer where GStreamer pipelines can be run in different processes and on different machines, and be connected to each other over the network. To that end, the essential communication between elements is abstracted and wrapped inside a data protocol, so that raw bytes can be transferred from one process to another, and the other end ends up receiving those same GStreamer buffers and events.

First up, there is the GStreamer Data protocol. Its job is to serialize buffers and events into a byte stream.

Second, there is the concept of streamheaders (which is related to the DELTA_UNIT flag in GStreamer). These are buffers that always need to be send at the beginning of a new stream to be able to interpret the buffers coming after it. In 0.10, that meant that at least a GDP version of the caps needed to be in the streamheader (because the other side cannot interpret a running stream without its caps), and in more recent versions a new-segment event. These streamheaders are analogous to the new sticky event concept in 0.11 – some events, like CAPS and TAG and SEGMENT are now sticky to the pad, which means that a new element connected to that pad will always see those events to make sense of the new data it’s getting.

Third, the actual network communication is done using the multifdsink element (and an fdsrc element on the other side). This element just receives incoming buffers, keeps them on a global buffer list, and sends all of them to the various clients added to it by file descriptor. It understands about streamheaders, and makes sure clients get the right ones for wherever they end up in the buffer list. It manages the buffers, the speed of clients, the bursting behaviour, … It doesn’t require GDP at all to work – Flumotion uses this element to stream Ogg, mp3, asf, flv, webm, … to the outside world. But to send GStreamer buffers, it’s as simple as adding a gdppay before multifdsink, and a gdpdepay after fdsrc. Also, at the same level, there are tcpserversink/tcpclientsrc and tcpclientsink/tcpserversrc elements that do the same thing over a simple TCP connection.

Fourth, there is an interface between multifdsink/fdsrc and Python. We let Twisted set up the connections, and then steal the file descriptor and hand those off to multifdsink and fdsrc. This makes it very easy to set up all sorts of connections (like, say, in SSL, or just pipes) and do things to them before streaming (like, for example, authentication). But by passing the actual file descriptor, we don’t lose any performance – the low-level streaming is still done completely in C. This is a general design principle of Flumotion: use Python and Twisted for setup, teardown, and changes to the system, and where we need a lot of functionality and can sacrifice performance; but use C and GStreamer for the lower-level processor-intensive stuff, the things that happen in steady state, processing the signal.

So, there is work to do in GStreamer 0.11:

The GStreamer data protocol has not really been ported. gdppay/depay are still there, but don’t entirely work.

streamheaders in those elements will need adapting to handle sticky events.

multifdsink was moved to -bad and left with broken unit tests. There is now multisocketsink. But sadly it looks like GSocket isn’t meant to handle pure file descriptors (which we use in our component that records streams to disk for example)

0.11 doesn’t have the traditional Python bindings. It uses gobject-introspection instead. That will need a lot of work on the Flumotion side, and ideally we would want to keep the codebase working against both 0.10 and 0.11 as we did for the 0.8->0.10 move. Apparently these days you cannot mix gi-style binding with old-style binding anymore, because they create separate class trees. I assume this also means we need to port the glib2/gtk2 reactors in Twisted to using gobject-introspection.

So, there is a lot of work to be done it looks like. Luckily Andoni arrived today too, so we can share some work.

After discussing with Wim, Tim, and Sebastien, my plan is:

create a common base class for multihandlesink, and refactor multisocketsink and multifdsink as subclasses of it

create g_value_transform functions to bytestreams for basic objects like Buffers and Events

use these transform functions as the basis for a new version of GDP, which we’ll make typefindable this time around

support sticky events

ignore metadata for now, as it is not mandatory; although in the future we could let gdppay decide which metadata it wants to serialize, so the application can request to do so

try multisocketsink as a transport for inside Flumotion and/or for the streaming components.

In the latter case, do some stress testing – on our platform, we have pipelines with multifdsink running for months on end without crashing or leaking, sometimes going up to 10000 connections open.

Make twisted reactors

prototype flumotion-launch with 0.11 code by using gir

That’s probably not going to be finished over this week, but it’s a good start. Last night I started by fixing the unit tests for multifdsink, and now I started refactoring multisocketsink and multifdsink with that. I’ll first try and make unit tests for multisocketsink though, to verify that I’m refactoring properly.

2011-10-242:28 pm

I’m in Prague right now for the second GStreamer conference. Prague is as pretty as I remember it from eighteen years ago when I was still in high school and we had our yearly school trip.

It’s great to see a mix of familiar and new faces again. 11 years ago GStreamer was made public, and I joined a year later around the 0.1.1 release if I recall. And now it’s this huge living breathing thing.

Tomorrow I will be giving a talk about Flumotion here, at 12.00 in the main room. If you’re interested in GStreamer beyond mere playback, this talk is for you. The only sad part is that my good friend Jan Schmidt will be talking about Bluray at the same time, but I’m relying on Ubicast to record it properly so I can see it later!

2011-5-511:01 am

Well, the cat has been out of the bag for a few days and I have been too busy to blog about it.

But today as I wait for my team to do a final deploy fixing a bug with too-long URL names for Flash Media Encoder, I have some spare time to mention what’s going on and make some people an offer they cannot refuse.

So, for the past half year of so we’ve been hacking away at a new service to solve a very specific problem in streaming. From 2005-2010 the streaming world mostly settled on Flash as a common platform, which was an unstable equilibrium for everyone involved, but it seemed to work. However, with the amount of codecs, devices and platforms there are today, this equilibrium has been falling. The introduction of iPhone, Microsoft’s heavy pushing of Silverlight (paying companies to stream in it – and funnily enough those companies usually stop using Silverlight when the money faucet closes), GoogleTV, the introduction of WebM, the arrival of HTML5 (ironically pushed by Apple – yay – even though their HTML5 sites usually only work in Safari – boo)… all these movements served to upset the status quo once again.

To the eye of the casual observer, it would seem that all streaming has standardized on H264, and so transmuxing technologies are popping up – taking the same video encoding and just remux it for different technologies. However, in practice, H264 is a collection of many techniques and profiles, different levels of complexity, and not all devices support the same profiles and techniques. If you want to stream to all H264 devices with just one encoding, you’ll have to settle for the least common denominator in terms of quality, and you’ll have to pick a resolution that works subpar for all of them.

Now, content producers hate this sort of situation. They just want to get the signal out there, because that’s what matters. The codec and the streaming is just the technological means to get it across the internet. And now the market is asking them to put a bunch of machines in their facilities, learn a lot of technologies they’d rather not worry about, consume heaps of bandwidth to send each version online, and then have to do it all over again each time something changes out there – a new codec, a new device, a new favorite resolution, …

Our answer to this problem is simple: send us one encoding, we will do the rest. Our service will take your live stream, transcode it to as many different encodings as you want, and hand them off to a CDN. That’s basically it. Want full HTML5 coverage ? We’ll do it for you – H264 single and multibitrate, Theora, WebM, and a Flash fallback. Want Silverlight, Flash RTMP, Windows Media MMS ? All there.

Services like this already exist for ondemand – see zencoder and encoding.com and Panda. Live is just inherently more difficult – you don’t get to work with nice single finished files, and it has to happen right now. But this is exactly the sort of thing a framework like GStreamer is good for.

In reality we aren’t doing anything new here – Flumotion runs a CDN that already provides this service to customers. The difference is that this time, you will be able to set it up yourself online. A standard integration time with any CDN is around two weeks. This service will cut that time down to five minutes. We’re not quite there yet, but we’re close.

What’s that you say ? Something about an offer ? Oh, right. It’s always pained me to see that, when we wanted to stream a conference for free, it was still quite a bit of work in the setup stage for our support team, and hence we didn’t stream as many conferences as I would have liked to. Similarly, it pains me to see a lot of customers not even considering free formats.

So the offer is simple. If you are running an event or a conference that flies under a Free/Open banner, and you’re willing to stream only in free formats (meaning, Theora and WebM), and you’re willing to ride the rough wave of innovation as we shake out our last bugs, we want to help you out. Send us the signal, we’ll do the rest. Drop me a line and let’s see how we can set it up. Offer limited, standard handwavy disclaimers apply, you’ll have to take my word for it, etc…

If you’re in the streaming industry, I will be demoing this new service next week on Wednesday around 2.00 pm local time in New York City, at Streaming Media East. And after that our Beta program starts.

Feel free to follow our twitter feed and find us on Facebook somewhere, as the kids these days say…

2010-10-264:55 pm

I’m in Cambridge (the UK one, not the US one) for the first ever GStreamer conference!

Years ago when I was in a cabin in the middle of the woods in Norway at four kilometers away from the closest main road and cut off by waistdeep snow, I never imagined that today I would be attending a conference dedicated to GStreamer with 150 people. 150 !

It seems the conference is a good mix between professionals and hobbyists. I’ve seen a few interesting applications built on top of GStreamer. Pretty cool stuff.

I took a quick look at the conference schedule and out of 19 talks, seven of them are from people that have been employed by the Fluendo group in the past, and for most of them it was their first full-time GStreamer job. It’s great to see these people branched out to other companies and taking GStreamer with them and higher up!

All in all, looks like the GStreamer community is in good shape. Next year, we’ll need two days…

While we’re at it, we should be handing out ’10 years of GStreamer badges’ to the people that have stuck by us so long :)