Sunday, May 20, 2012

Graphcuts for Python: pygco (slight update)

I have been using the excellent gco library for energy minimization with graph cuts for quite some time. Finally I got around to clean up / rewrite some of my Python wrappers so that maybe someone else can use them, too.

So what does this library do?
It does discrete energy minimization on loopy graphs. This is an important topic in computer vision, since it can be used for segmentation, stereo vision and optical flows. When you read "CRF" in a computer vision paper and there is no learning involved, it just means they minimized an energy function (probably using gco). Often this is just some form of "smoothing" but a lot more can be done with it.

Let's talk about pairwise energy functions. These functions take the form

E = \sum v_i(y_i) + \sum w_ij(y_i, w_j)

with v_i and w_ij are some fixed parameters and y_i are the multinomial variables over which we want to minimize this energy. Typically the i run over all pixels in an image, and w_ij is only nonzero for adjacent pixels.
In general, think of v and w as tables (of size n_labels or n_labels x n_labels) for each node resp. edge in the graph. In simple models, these are the same tables for all nodes / edges, in more complicated ones they vary.

In general this is NP hard, but there are many interesting cases, in which this problem can (somewhat surprisingly) be solved efficiently and exactly.
If y_i are binary and the energy is submodular, the exact solution can be found efficiently.

This is where gco comes in: it implements two move-making algorithms, alpha-expansion and alpha-beta-swaps, that solve a series of binary problems to obtain a solution to a multi-label problem.

Going back to the applications I mentioned above, these labels could be different depths in stereo vision, different directions of movements in optical flow, or different object classes in semantic segmentation.

There are several ways to specify the energy for gco and I don't want to go into to much detail here. Have a look at the README if you are interested.

For the moment, I have ported two ways to Python (but the rest should come soon). One is for general graphs, specified by listing all edges, the other one is for 2D grid graphs (images).
In all cases, the unary potentials (the v_i above) are just given as one number for each vertex (pixel in the grid graph case) and state.
In the simple case that I wrapped for Python, the pairwise potentials (the w_ij)
are fixed over the whole graph and only depend on the combination of labels y_i and y_j.
This contains the case of the simple Pott's potential, where w_ij is one if y_i = y_j and zero otherwise.

If you want to use the code, you need to download the original gco library and my wrappers. There is one thing you have to keep in mind when working with gco, though: all potentials are expected to be integers, so you need to round them!

This is a trivial example of smoothing. We have a binary image, add some noise and want to recover the original image. The example contains a "grid graph" version and a version where I explicitly construct the edges.

The same works with multinomial input. Here I used two different kind of pairwise potentials, the second from the right used Pott's potentials, while the one on the very right knows that blue-green and green-red edges are more likely than blue-red.

And finally, a not-as-trivial-looking example:

The first two images on the top are consecutive frames from a video sequence from the middlebury benchmark. This example tries to find depth layers in the image by using paralax.
The second image is translated in x direction and the best fit for each pixel is found. The image on the top right gives the best translation for each individual pixel. The bottom images show the results obtained with alpha-expansion and Pott's potentials (right) and a 1d topology (left).

This is not really how you would do stereo vision (the potentials I used are very naive) but I feel it is a nice example (about 10 lines in Python).

I hope some people find this helpful an there will be more computer vision code in Python soon :)

Hi Kostia.In writing the wrapper I was a bit lazy. You have to compile gco yourself. Unfortunately it doesn't come with a makefile iirc.I'm not on my own box at the moment so I can't look up what I used but I think it should be a single gcc line (something along the lines of gcc --shared GCOptimization.cpp -O2 -o gco.so).

gco.so didn't compile...I got a lot of them...for instance,the beginning of it is

GCoptimization.h:389:16: error: ‘ptrdiff_t’ does not name a typeGCoptimization.h:414:16: error: ‘ptrdiff_t’ does not name a typeGCoptimization.cpp:31:23: warning: ‘GCO_CLOCKS_PER_SEC’ initialized and declared ‘extern’ [enabled by default]GCoptimization.cpp: In member function ‘GCoptimization::SiteID GCoptimization::GreedyIter::feasibleSites() const’:GCoptimization.cpp:555:73: error: no match for ‘operator-’ in ‘((const GCoptimization::GreedyIter*)this)->GCoptimization::GreedyIter::m_siteend - ((const GCoptimization::GreedyIter*)this)->GCoptimization::GreedyIter::m_site’GCoptimization.cpp: In member function ‘GCoptimization::EnergyTermType GCoptimization::DataCostFnSparse::search(GCoptimization::DataCostFnSparse::DataCostBucket&, GCoptimization::SiteID)’:GCoptimization.cpp:1823:19: error: ‘cLinearSearchSize’ was not declared in this scopeIn file included from GCoptimization.h:108:0, from GCoptimization.cpp:4:energy.h: In instantiation of ‘int Energy::get_var(Energy::Var) [with captype = int; tcaptype = int; flowtype = long long int; Energy::Var = int]’:GCoptimization.cpp:197:20: required from hereenergy.h:328:91: error: ‘what_segment’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]energy.h:328:91: note: declarations in dependent base ‘Graph’ are not found by unqualified lookupenergy.h:328:91: note: use ‘this->what_segment’ insteadenergy.h: In instantiation of ‘void Energy::add_term1(Energy::Var, Energy::Value, Energy::Value) [with captype = int; tcaptype = int; flowtype = long long int; Energy::Var = int; Energy::Value = int]’:GCoptimization.cpp:280:22: required from hereenergy.h:207:2: error: ‘add_tweights’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]energy.h:207:2: note: declarations in dependent base ‘Graph’ are not found by unqualified lookupenergy.h:207:2: note: use ‘this->add_tweights’ insteadenergy.h: In instantiation of ‘void Energy::add_term2(Energy::Var, Energy::Var, Energy::Value, Energy::Value, Energy::Value, Energy::Value) [with captype = int; tcaptype = int; flowtype = long long int; Energy::Var = int; Energy::Value = int]’:GCoptimization.cpp:304:42: required from hereenergy.h:220:2: error: ‘add_tweights’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]energy.h:220:2: note: declarations in dependent base ‘Graph’ are not found by unqualified lookupenergy.h:220:2: note: use ‘this->add_tweights’ insteadenergy.h:235:3: error: ‘add_tweights’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]energy.h:235:3: note: declarations in dependent base ‘Graph’ are not found by unqualified lookupenergy.h:235:3: note: use ‘this->add_tweights’ insteadenergy.h:236:3: error: ‘add_tweights’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]energy.h:236:3: note: declarations in dependent base ‘Graph’ are not found by unqualified lookup

Could you please let me know how GCO will help in segmenting the image and assigning labels. When I am looking at the code no where Image is being taken as input. If so how it will segment the image. Please let me know. Thanks.

this blog is very cool. I'm interested in this gco program, but I didn't find a good example in c++, like middlebury. So ff I have a grayscale image and lets say 4 labels, then how can I run this program, to segment the image into 4 region. I know, that you interested in python, but do you know some examples in the net? Thanks.

Thanks :)You can find code for middlebury that compares several approaches here: http://vision.middlebury.edu/MRF/I haven't looked at it in detail but it should be possible to reproduce the middlebury results with it.

Hi there. I'm trying to use your wrappers but after compiling and building them i'm having always the same error : "DLL load failed: The specified module could not be found". I used the depency walker tools on the pygco.pyd and says that At least one required implicit or forwarded dependency was not found: LIBSTDC++-6.DLL, LIBGCC_S_SJLJ-1.DLL. When i look it up on google most says that these are problems of compilers versions but i dont know how to resolve them. Can you help me?Thanks

Sorry, I have no idea. You are building using GCC under Windows? Cygwin? Btw, maybe you should give http://hci.iwr.uni-heidelberg.de/opengm2/ a shot. They have a branch containing python wrappers: https://github.com/DerThorsten/opengm/tree/RC2.1

With windows. I'm no exepert with C++ and compilers but it worked for me one time, now i don't know whats rong. I cmpiled the files using Windows Visual studio 2012, i Build the wraper and copied the pygco.pyd file to my python interpreter directory. Isn't that what you supose to do?

Thanks for your comment. The problem is that I can not maintain the source code of the actual implementation for license reasons. You can try to contact the people at western university that actually maintain the software to fix this. Fetching the source is complicated enough, I don't want to maintain a set of patches on their version.

Hi Thanapong.I would assume that is because the potentials encoded by your matrix are not submodular. Have you checked that? In that case I would recommend you try pyqpbo. If you are lucky you can just "pip install pyqpbo".Cheers,Andy

Hello, may I have question. Some how I tried the function cut_from_graph(*args, **kwargs) but in the end it returns only 2-class instead e.g 3 or more I asked... Any advice? ThanksBtw, I can provide the sample code to have look, but there is no room here to upload it or put images (illustration).