How to write efficient code in Matlab and about Image Processing in Matlab.

Menu

I want to take a short break of the series on pentomino recognition from scanned paper. This week I would like to explain Harris corner detector. Most of the explanations that I found on the web and in books tend to focus on the spatial difference between templates, the derivation of Taylor polynomial and explanation about covariance matrix. While this explanation is technically correct, in my opinion, it is hardly intuitive. Moreover, it does not show an important step of the algorithm, which can be used for more specific objects detection. I got this intuition after seeing this video.

But before that, test yourself. Given the following image, what places do you think that Harris corner detector will find? Try to do this without actually running the detector on the image.

After reading the rest of this post, you will be able to answer this question quickly and with confidence. By the way, if you think that the answer is only the corners of the checkerboard, keep on reading; It might be a surprise for you, Harris will prefer many other points to these corners.

In order to understand the algorithm, please consider the following pseudo code:

For some point (x,y)

Take a small rectangular region of interest around the point.

Find edges, in x and y direction. This is done by doing convolution with the appropriate templates. This yields two images, Ix and Iy.

Draw a scatter plot of Ix(x’,y’) and Iy(x’,y’), of all the points (x’,y’) in the region of interest.

I wrote a small Matlab code that you can download, move the region of interest on the image (it is a draggable and moveable rectangle), and see what happens. The code responds quite quickly for a movement of a small region of interest. If you are intersted in the graphical aspect of how to re-draw scatter plot fastly, you can read this.

Here is an example of some regions of interest, and their scatter plots. The first image shows what happens in a region that is almost uniform. All of the points are concentrated in the middle of the plot. When thinking about it in polar coordinates, all of the points have small magnitude.

The next image is an example of an area with all sort of edges. There are edges in all kind of direction, and with high magnitude.

If we take a look at a circle, we will see a similar behavior. On the circle arc, almost any angle exists, with high magnitude.

The image below has an edge in a specific direction. The scatter plot now shows an interesting behavior, it has points only with the angle of the edge, or the opposite angle. That is because there are white->black edges and black->white edges. All sorts of magnitudes exist, because the image has noise.

Taking a look at the checkerboard edge, we see only one specific direction. That is because the edges are all black->white.

At the corner of the checkerboard you will see a cross shape in the scatter plot.

Now how would you separate the regions of interest to the ones that either have no edges or edges in one direction? One possible way is to calculate the covariance matrix of the point and check that there are two high eigenvalues.The non-centered covariance matrix is defined as:

Where the <> brackets denote a sum over all possible values.

(Usually the covariance matrix is defined in terms of central moments, that is the data mean value is zero, however, here is not the case). If you don’t know what the covariance matrix is, just take a look at the following image:

The arrows direction are the eigenvectors of this matrix, and their length is the eigenvalues of the covariance matrix. Those of you who have heard about principal component analysis will have easy time understanding what is going on here. (More information about it is available here.)

At this point, Shi-Tomasi and Harris vary a little bit. Shi-Tomasi find the minimal eigenvalue and Harris uses some heuristic to save computation time. Also, in the actual algorithm, the matrix is weighted according to the distance of the point from the center of the region of interest.

After saying all of this, let’s take a look at the Shi-Tomasi corner detector variation of the image above. We will plot only values with large enough response. I did some thresholding and I am plotting the centers of the blobs that are left after.

Now, after understanding the intuition behind the algorithm, you should not be surprised when you see the center of the letter “O” found, even though it does not look like a corner. The letter “O” just has edges with high magnitude and at least two orthogonal directions.

In fact Harris does not find corners! Well, at least not in the everyday meaning of the word. It finds “feature points”, these are places that have low self similarity. That depends on the size of the window, but don’t expect to find only the corners of the checkerboard.

In order to get a better chance of detecting the corners of the checkerboard, we should increase Harris’s region of interest and decrease the threshold.

For each edge on the grid, decide whether it is “black edge” or “no edge” in part 3

The last step was far from perfect. When I developed the application, I decided that this is good enough to move on to the next steps. That is because often it is better to get a full working prototype as soon as possible, rather than get stuck at particular point. This approach has several advantages:

If you run out of time, you still got some kind of working solution.

Often, when you complete everything, you will see that there are other places, far more important than this one, to concentrate on them.

You might get really stuck in one of the next steps, in a way that you can’t advance at all. Obviously, it would be better to invest your time in that impossible step, rather than at one which is working, though not perfectly.

Development time is often your most precious resource!

Nevertheless, breaking the chronological order of things, I still want to describe how I improved this step at the end, after completing all of the steps. Just to remind you, the idea that I used at first was measuring the average of the pixels in the ROIs, marked as purple rectangles in the image below.

This kind of approach is not robust enough, because it does not consider any geometrical information. I can take the pixels of an black obvious edge, re-arrange their intensity among all pixels, and I still will get the same average intensity. Consider the two following images below (side-by-side). They have the same amount of intensity:

In order to differ the cases mentioned above, we could use correlation with perfect edge. The gray uniform patch would not give a high correlation. However, the edges can be curved, since they are drawn by human. Thus, I decided to perform a convolution with a column edge element. (For vertical edges, I use a convolution with row edge element.) This should find the centers of the edge, per column. Since the edge is black on white background, the element should be in the following form : [+ + – – – – + +] . The negative part corresponds to the black edge. (This will handle better curved edges, since the maximal value is found per column, and can be in different row).

correlationImage = conv2(double(im),[1;1; -1;-1;-1;-1; 1;1;],'same');

I do this on the whole image, and then check in the relevant ROIs, on the convolution result. In each ROI, I can find the maximal response of the convolution per column (or row, if vertical edge), and then return the mean value.

Here are two images, of an edge, and an image of correlation. I find the maximal value in each column, and average. That is the probability that the ROI represents an edge.

The idea that we used last time of finding the threshold automatically is still valid. Otsu’s method of thresholding can be used on these new numbers. Here is how the histogram of the new probability measure looks like now:

Otsu’s method will have an easy time with this one. The two distributions are separated very well.

After applying this new method here are the edges found after thresholding:

Now that is much better! It is not perfect, but now we can separate the problem into a set of smaller ones, and handle each individually. I will show in the next post how to:

Break the image into clusters by graph traversing technique

For each cluster

Each piece can be once and only once, so let’s use this information!

By the way, you can use the “follow” button above the post to subscribe by e-mail.

This it the 3rd part of the series on “How to recognize Pentomino from scanned paper“.

In the first part, I had discussed the steps to solve the problem of recognition.

Rotate the image to be orthogonal to x-y axis

Find the grid

For each edge on the grid, decide whether it is “black edge” or “no edge”

Break the grid into several parts, and process each one individually.

Somehow, by using the prior knowledge on the 12 pentomino pieces, find where they ar

In the second part, I had discussed about how to rotate the image and find the grid lines. The image below summarizes the steps so far, the grid is orthogonal to the image, and most of the lines are found. They are saved into two 1D arrays, “rows” and “cols”

There is also a slight problem, some lines are not detected. I will skip it for now, as it is quite easy to fill them in once you know the mean distance between the lines.

The next problem is to determine whether there is a drawn edge at some location. Just to make sure that I am clear, the following image shows what I call an edge. I marked the edges in green, and the non-edges in red.

The simplest approach I could think of is to find a small ROI around a possible edge, and collect some statistics. The simplest statistic is the mean value of the pixels. This makes sense, since edges should be black, and non-edges should be white. The following code section gets as input the processed image, and 2 arrays with row and column numbers of the lines on the grid. It runs in a loop through all rows and cols, and calls a procedure on a small ROI around the edge that attempts to measure the probability that it is an edge. The results are saved into an edge probability matrix.

You might have noticed that I am using a custom function called plotc to debug the code. It draws a closed polygon. I have explained about it on stackoverflow.

I am also using another custom function, called putToLimits. It clips the input, so that it doesn’t go above or below some minimal and maximal value, i.e. get out of range. Its output is a clipped variant of the input.

I am using loops. There might be a vectorized solution, but in this case I prefer loops, due to the clarity of the code. A code that can be debugged is often worth a more than a bunch of smart-ass one-liners. That depends on the case, but in general there is a tendency to abuse vectorized solutions in Matlab community, as they are often considered to be smart, creative, etc. As Loren said once in her blog: “Vectorize, but sensibly“. I have touched here an issue that I will discuss more thoroughly in the next posts.

I wrote two functions – one for detection of vertical edges, and one for horizontal. This is not a good practice. A duplicated code causes maintenance problem in the long term. I should fix it in one of the next coding iterations.

Now, back to the algorithm. First, here is a visualization of the ROIs being taken for the horizontal edges:

Here is a histogram of the mean values in the ROIs, both for the horizontal and the vertical edges.

That looks very good! You can spot two distributions mixed, with the mean values separated quite well. These kind of histograms are called Bi-Modal histograms. The good thing about them is that there are several solutions in computer vision to the problem of separating them and finding a threshold automatically. One of them is Otsu’s method, which is implemented in Matlab. The relevant function is graythresh. However, it assumes that the images are in the range of [0,1], so I did a scaling. The following function detects the range of the input, and scales it to [0,1]. Then it recalculates the threshold in terms of original input.

Here is the result, after thresholding with Otsu’s method:You can see that there are some non-edges detected as edges (false positives), and vice versa. This is not perfect, but it is a good start. I will explain in the next post how it can be improved.

In part 1, we have discussed the steps to find the tiling from the scanned paper. Just to remind you, there were:

Rotate the image to be orthogonal to x-y axis

Find the grid

For each edge on the grid, decide whether it is “black edge” or “no edge”

Break the grid into several parts, and process each one individually.

Somehow, by using the prior knowledge on the 12 Pentomino pieces, find where they are

One small thing, I did show the code of detecting the angle, but did not write the actual rotation command. It is as simple as:

rotatedIm = imrotate(clippedIm,90-angle);

I am skipping a small part of my code that removes white background. We can go back to it later. Here are two images, before clipping and rotation, and after:

You can see that now the lines of the grid are truly vertical.

Now, let’s focus on the second task – “Finding the grid”. Just to be clear, finding the grid means finding a vector of X and a vector of Y that represent the places where the lines in the image are. The most obvious choice for detecting periodic patterns in image processing is DFT (Discrete Fourier Transform), which is called FFT in Matlab. I am talking about summing up the image in either x or y direction, and using `fft`, not using `fft2`, which is not needed after we found the angle and rotated.

FFT can only be used as initial guess, since it is not exactly accurate. The reason is due to its quantization. For instance, imagine that the distance between the grid lines is about 20 pixels. The size of the image in x direction is 1024. Can you detect it? Well, FFT checks the signal decomposition into several frequencies. We need to find the frequency of 1024/20 = 51.2 , which is between 51 and 52. So how accurate is our estimation? We will either find the maximum correlation between the signal and the corresponding sine at the frequency of 51 or 52. Simple math shows that 1024/51 = 20.0784, 1024/51 = 19.6923. That means that we will either have a mistake of 0.784 or 0.307 pixels. That doesn’t sound much, right? After all, something less than a pixel must be small. Wrong! Don’t forget that a small mistake in the the distance is multiplied by the amount of lines, that is about 50. That can easily become as much as 0.784*50 = 39.20 pixels. Now that is a huge error.

The information from the phase allowed us to find the first column exactly. Yet the frequency cannot be found precisely, and eventually the error starts to accumulate. To sum up, FFT is valuable because it is very fast, robust and gives a good initial estimate. The main problem is that by using it, we cannot find the frequency accurately enough

The way to solve this issue is to look for each of the grid lines independently. If I sum the image row-wise, I will get something like this:

The valleys correspond to the grid lines. I am going to use an ad-hoc heuristical solution. First, find a middle line and threshold by its value

Then, I get a square-like wave that consists of [0,1]. I can find the center of each of the zeros sequence. This will give me almost all of the lines.

There are some missing lines! If you take a look again at the graph above, you will understand why. I will show you next time how to fill them.

Here is the code.It is the same both for rows and columns detection. If I want to find columns, the `dim` parameter should be 1, and in case of rows 2. Can you find out what that small trick that I am using in lines 3-5 does? I will leave it as as something to think about.

[This is my first post! Please leave a comment, I would love to hear your feedback.]

A few monthes ago, when I visited my parents house, I noticed that my dad had solved a lot of Pentomino puzzles. Also, he had wrote them down in a notebook.
Pentomino is a puzzle game, in which you have to tile a 2D shape from pieces. The pieces resemble Tetris pieces, however, they have 5 squares instead of 4. There are 12 unique pieces, excluding rotations and flips:

A typical tiling puzzle solution looks like in the following way:

Here is a typical page from his notebook, after I scanned it:

As someone who is excited about image processing, I immediately thought about writing a program that will scan the page, and recognize the puzzle solutions. In this post, and the next ones, I will show the steps that I used in order to accomplish this task. Also, I will try to share my thoughts about development in general.

Here is an overview of the algorithm:

Rotate the image to be orthogonal to x-y axis

Find the grid

For each edge on the grid, decide whether it is “black edge” or “no edge”

Break the grid into several parts, and process each one individually.

Somehow, by using the prior knowledge on the 12 pentomino pieces, find where they are.

Let’s discuss each step individually. First, the rotation. Why do we need to rotate? When I scanned the notebook, I was kind of sloppy, and did not align it precisely on the scanner.

Thus, some compensation for the angle is needed. It is probably possible to solve the entire problem only by knowing the angle, and not rotating the actual image. While it is possible, the fact that all of the edges become either perfectly horizontal or vertical makes the coding easier. Even more than that, the debugging (which usually takes more time than writing the code itself) becomes significantly easier. The actual rotation re-samples the image, which can cause a degradation of quality. Nevertheless, as long as I will be able to extract the relevant information, it will be good enough.

How to find the angle of rotation? Several options popped into my head. Hough transform to recognize the lines sounds like a reasonable choice. I can find two dominant angles in Hough space, and use them. I decided to use Radon transform, which is conceptually very close. Radon transform scans the image in every angle and sums up the pixels to a 1D array. You can see in the next image a conceptual explanation of the transform:

I expect to have the highest contrast while scanning the angle of the grid. Why? Because every other angle will be more blurred. Think about it, if we hit the exact grid location, we expect to see something like a wave of peaks. Every time that we hit the grid line, the array value will be very large, and when we don’t it will be small. I don’t need to scan all of the possible angles, I need a small range around 0 and 90 degrees, and fairly good quantization of these small ranges. In order to detect the angle with the maximum contrast, I can calculate standard deviation to approximate the contrast.

There is one inherent flaw in Radon transform. Different orientations result in different 1D array length. Thus, the transform is biased towards some angles, it gives them more weight. In order to overcome this problem, I divide by a Radon transform of array of ones.

Here is the result of the transform, on the image above. You can see that one of the columns has a high contrast – that is the angle! It is close to zero, but not exactly.