I'm considering two very specific approaches at the moment, but there's a general direction I'm trying to go in. First and foremost, I want something that works, and don't care too much about purity or elegance. Which means I'm perfectly willing to cheat by introducing easy to handle out-of-band information. One thing I'm already considering, but won't reach for until I need it, is color-coding lines/arrows differently from shapes. So, for instance, anything red would be interpreted as lines while anything blue would be processed under the assumption that it represented shapes|2|.

Secondly, I'm working with a very tight set of constraints on the source images I'm considering. These aren't going to be photographs. I admit, it would be cool to be able to draw a program in the sand, take a picture, and have it compile correctly, but that sounds ... hard. What I'll be dealing with to start with is going to be line drawings composed of primitive shapes, arrows and some sparse annotating text on a relatively uniform white background. Most of the work I've read on the subject of edge or feature detection has been aimed either at plain text nuder various transformations|3|, or at photographs|4|. The box and wire diagram area seems to be relatively unexplored.

The first approach I'm thinking about is thinning the image down to the minimal number of points it takes to represent. Once that's done, it should be possible to connect the dots at some threshold of proximity and generate the appropriate line/shape facts that we need in order to proceed further. And honestly, that's about as much thought as I've put into this approach. If you want details, check the README scection and Ping.hs. I might come back to it if the other Direction proves fruitless, or hits some unseen roadblocks.

This is the approach I've been considering and prototyping most vigorously. Mostly because it seems within striking distance of workable results in the very near future. At its most distilled form, this involves deciding the directional tendency of each pixel in the image. Each one might be one of the Cardinal directions (North/South or East/West), one of the Ordinal directions (NorthEast/SouthWest or NorthWest/SouthEast), or it might be contested|6|.

The algorithm for separating an image into these directional maps is relatively simple: for each pixel, count the contiguous filled space in each direction and sum the totals for equivalent directions.

makes it any more readable, but it is equivalent, and doesn't repeat itself quite as frequently. The point is, this function takes a Grid and a Coord (which represents a pixel) and returns the score of that pixel in that Grid. findContiguous is a helper function used in the above.

It takes a Grid and a list of Coords, and takes only the first contiguous chunk; as soon as it fails to find a given coordinate, it terminates and returns the ones its accumulated so far. Having defined all of the above, you can then define

That transformation alone gets you some traction, although not quite enough. What I'd really want is a decision function that would give me the result of trying to merge them, rather than just two separate maps. The easiest thing I can think of is

Which is to say, if either Cardinal scores are above a certain threshold, and one beats the other by said threshold, this is a Cardinal pixel. Same deal with the Ordinal scores. If nothing breaks the threshold, or if the breaking pair of scores doesn't point to a decisive winner, this is a Contested pixel. That then lets us redefine main

it would be a fairly simple matter to do the arrow/shape separation first, and then figure out how each shape breaks down in the Cardinal/Ordinal sense. What I'd be looking for at that point is to see which of the two maps yielded full coverage of a particular area with the fewest lines. Which is how I could tell that

this is meant to be a Cardinal shape; it can be drawn with four cardinal lines|7|, whereas the ordinal map specifies between 6 and 8 depending specifically on where you set the threshold for recognizing a line|8|. This also has the side benefit of disambiguating curves/circles from squares/lines;

If we go this breaking shape direction, we suddenly have the problem of how to compose the arrow. Ideally, we'd want the large vertical from the Cardinal map, and the two side diagonals from the Ordinal map. Which means we're not really checking which map gives us better coverage, we're trying to find maximum coverage of some number of coordinates by several distinct, but connected areas on a map. Which sounds like it lands us in difficult-problem territory. The good news is that approximation is very probably good enough for what we're doing here, and I can think of one or two half-way decent ways of doing the needed comparisons quickly enough|9|.

Anyhow, that's all I've got for now. If you want to see the up-to-the-minute details on how this progresses, keep an eye on the github.

1 - |back| - In case you're wondering, "EAF" stands for "Edgy As Fuck", which is the first thing that popped into my head when I thought about what I should name an edge-detecting project. I guess that might say something about me.

2 - |back| - Anything purple can be processed as an overlapping area, and essentially added to both the lines and shapes corpus. Black could potentially be text, but once we have the lines and shapes sussed out, there gets to be a relatively small possibility space where there might be text, and it seems like it would be easy enough to just take any occupied area within that space and pass it through some standard OCR software.

3 - |back| - Which is part of what I'm doing, but not the core, so I feel no shame whatsoever about punting it to something like tesseract.

6 - |back| - That is, it might be the case that no particular directional tendency exceeds a given threshold, or it might be the case that no single direction is the winner by a number exceeding that threshold.

8 - |back| - I could also get fancy and start doing shape comparisons here; we could take a look at the cardinal map and discount it on the basis that two of its "lines" are L-shaped. We could use this information either to impose a penalty, or to discount the ordinal breakdown entirely.

9 - |back| - Not that any approach would be slow on a corpus this size. The real test is going to happen when I try running this compiler over my first 300 dpi, 11x17 scan. I'm guessing performance-oriented tweakery will be necessary, but I've been wrong before.