Gstreamer and OpenCV for image stabilisation

I am now back from Prague where I gave a talk on image stabilisation (and my holiday pictures). Hopefully a video of the talk will soon be online. In the meantime, I would like to explain a bit my efforts in written form, with some details slightly updated from the talk (the code progressed a bit since then).

I got interested in the issues of image stabilisation through a helium balloon photography project in which I participated. I want to make a nice time lapse video from the pictures I have taken, but they were taken from a camera that was moving, which would make the result very shaky without some kind of postprocessing.

Thankfully, I work at Igalia, which means that on top of my personal time, I could spend on this project some company time (what we call internally our hackfest time, of up to 5 hours per week).

Original problem statement

I have around 4h30 of pictures taken from a balloon 100 metres high. The pictures were taken at a rate of one per minute, which makes around 270 pictures. I want to make a nice time lapse out of it. Simply using the frames as is to build a video does not work well. Partly because I would probably be legally required to include a warning to epileptic people at the beginning of the video, but mostly because people actually watching it would wish they were epileptic to have a good excuse not to watch it.

This is due to the huge differences occurring between two consecutive frames.

Here is an example of two consecutive frames in that series:

As you can see, from one frame to the next, a lot of pixels would change. And that does not look pretty. It is also pretty obvious that they are both pictures of the same thing, and could be made to be pretty similar, mainly by rotating one of them, and maybe reprojecting it a bit so that things align properly even though the point of view changed a bit from one frame to the next.

Standing on the shoulders of giants

There was no question in my mind that I wanted to use GStreamer for the task, by writing an element or set of elements to do the stabilisation. The two big advantages of this approach are:

I can benefit from all the other elements of GStreamer, and I can easily do things like decode my pictures, turn them in a video, stabilise it and encode it in a format of my choice, all in one command.

Others could easily reuse my work, potentially in ways I could not think of. One idea would be to integrate that in PiTiVi in the future.

Then, after some research, I realised that OpenCV provides a lot of the tools needed for the task as well.

Since I am still in a prototyping/research stage, and I hate to write loads of boilerplate, I am using python for that project, though a later rewrite in C or C++ is not impossible.

First things first

I will not present things exactly in the order I researched them, but rather in the order I should have researched them: starting with a simpler problem, then getting into the complications of my balloon problem.

As you can see, Joe almost looks like he's on a boat. He isn't, but the cameraman is, and the video was taken with a lot of zoom. The movement in that video stream has a particularity that can make things simpler: the position of a feature on the screen does not change much from one frame to the next, because a very short amount of time happens between them. We will see that some potentially very useful algorithms take advantage of that particularity.

The steps of image stabilisation

As I see it for the moment, there are two basic steps in image stabilisation:

Optical flow

For all that matters in this study, we can say that for each frame the optical flow is represented by two lists of point coordinates origins and destinations, such that the feature at the coordinate origins[i] in the previous frame is at the coordinate destinations[i] in the current frame.

Optical flow algorithms can be separated in two classes, depending on whether they provide the flow for all pixels (Dense optical flow algorithms) or only for selected pixels (Sparse optical flow algorithms). Both classes can theoretically provide us with the right data (origins and destinations point lists) to successfully compute the opposite transformation we want to apply using findHomography().

I tried one algorithm of each class, choosing the ones that seemed popular to me after reading a bit of [Bradski2008]. Here is what I managed to do with them.

Dense optical flow

I tried to use OpenCV's implementation of the Horn-Schunck algorithm [Horn81]. I don't know if I used it incorrectly, or if the algorithm simply cannot be applied to that situation, but this is all I could do to Joe with that:

As you can see, this basically added flickering. Since that, I did not find time to improve this case before I realised that this algorithm is considered as obsolete in OpenCV, and the new python bindings do not include it.

Note that this does not mean that dense optical flow sucks: David Jordan, a Google Summer of Code student, does awesomethings with a dense algorithm by Proesmans et al. [Proesmans94].

Sparse optical flow

I played with the Lucas-Kanade algorithm [Lucas81], with the implementation provided by OpenCV. Once I managed to find a good set of parameters (which are now the default in the opticalflowfinder element), I got pretty good results:

Joe enjoys the stability of the river bank, undisturbed by the movements of the water (video)

And it is quite fast too. On my laptop (with an i5 processor), I can stabilise Joe the hippo in real time [1] (it is only a 640x480 video, though).

For those who attended my talk at the Gstreamer Conference 2011: yes, now it is proper real time, I optimised the code a bit.

The balloon problem

As we seen in the previous section, for a shaky hippo video, [Horn81] isn't any help, but [Lucas81] is pretty efficient. But can they be of any use for my balloon problem?

Unsuccessful results

I won't show any video here, because there is nothing much to see. Instead, an explanation in pictures that show how the algorithms rate for the balloon time lapse.

This is what Horn-Schunck can do:

The picture shows two consecutive frames in the time lapse (the older is on the left). Each of the coloured lines goes from a point on the first image to the corresponding point on the second one, according to the algorithm (click on the image to see a larger version where the lines are more visible). Since Horn-Schunck is a dense algorithm, the coloured lines are only displayed for a random subset of points to avoid clutter.

Obviously, these lines show that the algorithm is completely wrong, and could not follow the big rotation happening between the two frames.

Does Lucas-Kanade rate better? Let's see:

This is the same kind of visualisation, except that there is no need to chose a subset, since the algorithm already does that.

As for the result, it might be slightly less wrong than Horn-Schunck, but Lucas-Kanade does not seem to be of any help to us either.

The issue here, as said earlier, is that these two algorithms, like most optical flow algorithms, are making the assumption that a given feature will not move more than a few pixels from one frame to the next (for some value of "a few pixels"). This assumption is very clever for typical video streams taken at 25 or 30 frames per second. Unfortunately, it is obviously wrong in the case of our stream, where the camera has the time to move a lot between two frames (which are captured one minute apart).

Is all hope lost? Of course not!

Feature recognition

I found salvation in feature recognition. OpenCV provides a lot of feature recognition algorithms. I have tried only one of them so far, but I hope to find the time to compare it with others in the future.

The one I tried is SURF (for "Speeded Up Robust Features", [Bay06]). It finds "interesting" features in an image and descriptors associated with them. The descriptors it provides are invariant to rotation and scaling, which means that it is in theory possible to find the same descriptors from frame to frame.

To be able to efficiently compare the sets of frame descriptors I get for two consecutive frames, I use FLANN, which is well integrated in OpenCV.

Here is a visualisation of how this method performs:

As you can see, this is obviously much better! There might be a few outliers, but OpenCV's findHomography() can handle them perfectly, and here's a proof video (I am not including it in the article since it is quite high resolution).

Obviously, the result is not perfect yet (especially in the end), but it is quite promising, and I hope to be able to fix the remaining glitches sooner than later.

Show me the code!

The code as well as a quick introduction on how to use it is available on github. Bugs and patches should be posted here.

@glantrobot: I didnt really deal with them, they are still there in the video @jeroen: the balloon was tethered at a constant altitude, I did not include the "take off" and "landing" phases in the video; so, yeah, fixed altitude for the video.

In the wheather baloon case, can it help with the wobbling effect if you undo the camera lens distortions before the stabilization phase? If you know the lens parameters it should be possible.
I am not sure about the result tho, it was just a naive observation, I dont know very much about the theory of photography.
Ciao, Antonio

@ao2: with my element, the movement correction worked OK without any lens corrections, at least for the centre of the images. The result looked indeed more wobbly on the sides, where lens distortion is bigger. In other words: the movement was recognised correctly, but it was different on the edges, and my algorithm assumes it is constant on the whole image, so, yes, wobbly. Since this blog article, I made a variant of the video where I first run a step of "undistortion": http://emont.org/tmp/undistorted.ogg and it does indeed look better.

Do tou take into account the lens parameters in your "undistortion" step? I am just curios.
And I see you also do some background accumulation? Maybe some color normalization is still needed, I can see the borders on one frame over another even if they look aligned, but the result is good. Congrats!

Yes, I used some OpenCV example (dont remember which one right now) and printed a checker board to get the parameters of my camera+lens combo. The way both are fixed together is quite McGyver though, so they might not have been in the exact same set up as when the pictures from the balloon were taken, but it seems to look "good enough".
As for background accumulation, its done rather lazily for now, just keeping the old (transformed) frame and putting the new (transformed) frame on top of it.
There is definitely a lot of room for improvement. I would also like to erase the orange rope. I have no idea when I will find time to do all of that though...

This is odd. numpy.float128 is definitely there in numpy 1.6.1, its disappearance in 1.6.2 would be ...suprising. Are you sure numpy is correctly installed on your machine?

From:
sense.luo2012-06-11 09:42:00

I tried to import float32 float64...from numpy and they all worked well. But numpy.float128 caused "ImportError: cannot import name float128...". Then I found someone said use numpy.longdouble instead of numpy.float128, so I tried that and then the error gone.
Now I tried the example again: #gst-launch filesrc location=/data/shaky-hippo.webm ! decodebin ! tee name=tee tee. ! ffmpegcolorspace ! opticalflowfinder ! opticalflowrevert name=mux tee. ! ffmpegcolorspace ! mux. mux. ! ffmpegcolorspace ! xvimagesink and I got: Setting pipeline to PAUSED ... ERROR: Pipeline doesnt want to pause. ERROR: from element /GstPipeline:pipeline0/GstXvImageSink:xvimagesink0: Could not initialise Xv output Additional debug info: xvimagesink.c(1804): gst_xvimagesink_xcontext_get (): /GstPipeline:pipeline0/GstXvImageSink:xvimagesink0: Could not open display Setting pipeline to NULL ... Freeing pipeline ...
I dont know this is normal or not, or how can I get an output file?
Thanks~!

From:
sense.luo2012-06-11 11:27:51

oh........I found this is the problem of [xvimagesink]... so I used [autovideosink] instead of [xvimagesink]. Then I tried the command, only the picture of the 1st frame of the video showed on the screen, but then exited immediatly. And the messages left in the console were: Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock 50 features found, 50 matched ; errors min/max/avg: (25.464687, 109.17892, 49.937993316650392) 50 features found, 50 matched ; errors min/max/avg: (25.201805, 110.41348, 50.330241699218753) 50 features found, 49 matched ; errors min/max/avg: (25.302744, 105.21174, 50.002107659164743) All going wrong! ERROR: from element /GstPipeline:pipeline0/GstDecodeBin:decodebin0/GstMatroskaDemux:matroskademux0: GStreamer encountered a general stream error. Additional debug info: matroska-demux.c(4492): gst_matroska_demux_loop (): /GstPipeline:pipeline0/GstDecodeBin:decodebin0/GstMatroskaDemux:matroskademux0: stream stopped, reason error Execution ended after 179662777 ns. Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ...
Then I tried to delete [! mux. mux.] and run the command, all the frames played out normally. So I dont now what is the problem. Maybe also caused by the place of [numpy.float128]?
My system is ubuntu 32bit, numpy has no float128 on 32-bit platform...:(

Deleting this mux part means that my elements have no effect on the stream, so, yeah, of course it works, but no stabilisation ;)
Your issue is interesting, I have never tested the stuff on a 32 bit system... Replacing every float128 by a float64 _should_ work, but you lose precision, which might be why things go bad. Unfortunately, I dont have much time to investigate these days, Im über busy until mid July...

From:
sense.luo2012-06-11 17:05:14

Thank you so much for replying~
I will try it via some other method later.
Best wishes to you！ Good Luck!╮(￣▽￣)╭

Im afraid it is to late for it to be included in GStreamer 1.0, as it just got released ;).
Also, I am mentoring a master student porting this work to C and GStreamer 1.0 as his final project; the goal is to have this included in the opencv plugin in -bad. I will try to post an update to this post or a new post on the blog when he has something working.

Thank you so much for publishing this work, this image stabilisation is exactly what I want to do.
I have some experience with OpenCV but am new to Gstreamer which I have just installed on Ubuntu with apt-get. However, whenever I run your command, I get the following error:
WARNING: erroneous pipeline: no element "decodebin"
What am I doing wrong?
Any help you can give will be much appreciated!

@Joe: thanks for your interest. It is very surprising that you dont have the decodebin element installed. Check that you have installed gstreamer-plugins-base, of which it is part. For practical GStreamer usage, you probable want gstreamer-plugins-good, gstreamer-plugins-bad and gstreamer-plugins-ugly as well. Plenty of explanations way better than what I could give there: http://gstreamer.freedesktop.org/documentation/

From:
Joe2012-12-18 00:17:48

Thanks very much for your response, you were right, the base plugins werent installed. These are the extra things I have installed on Ubuntu with apt-get in-case it is useful to anyone else:
python-gst0.10 \ gstreamer0.10-plugins-base \ gstreamer0.10-plugins-good
I couldnt get the bad and ugly plugins to install as it says they are broken packages but I dont think I need them for this example.
The problem I now get is the following:
WARNING: erroneous pipeline: no element "opticalflowfinder"
Would you mind explaining how to make use of the opticalflowfinder element please? I see there is a class with that name in flow_finder.py but Im not sure how to use it. Im sure it has something to do with this instruction:
For the elements to be recognized, you need to include in GST_PLUGIN_PATH the directory where you checked out GstStabilizer
But Im not sure how to set the GST_PLUGIN_PATH variable.
Thanks for you help :)

you should do something like: export GST_PLUGIN_PATH=/path/to/GstStabilizer

From:
Joe2012-12-18 20:34:34

Fantastic, thanks very much! :)
I am running this on a remote server and this is output I am now getting:
Setting pipeline to PAUSED ... ERROR: Pipeline doesnt want to pause. ERROR: from element /GstPipeline:pipeline0/GstXvImageSink:xvimagesink0: Could not initialise Xv output Additional debug info: xvimagesink.c(1784): gst_xvimagesink_xcontext_get (): /GstPipeline:pipeline0/GstXvImageSink:xvimagesink0: Could not open display Setting pipeline to NULL ... Freeing pipeline ...
Is this command trying to open the result in a display? If so, the error would make sense as there is no display attached to the server. Is it easy to save the result as file instead?
Thanks so much for you help, sorry for all the questions.