Spell checking in code

Last year marc wrote about our spell checking in code feature for Oxygene. This feature lets you continue using code completion after you made a typo in your code. The compiler will report an error as usual, pick the most likely choice from a list of possible identifiers and pretend that item was picked in the first place. The end result is that this code:

var s: Srting;
for i: Integer :=0 to s.Length-1 do begin
s[i].
end;

will show code completion for the elements of the string, despite the wrongly spelled srting, and the compiler does the same thing, it shows an error for srting as usual, but using s will make it act like it’s a string.

There are a number of algorithms we could have picked to find the best matching item for this, so after some trial and error we picked the Damerau–Levenshtein distance and modified it a bit. Given two strings, this algorithm returns an integer indicating how many changes there are. Removal, adding or changes are counted as a single change, as are swapping two characters that follow each other (unlike the plain Levenshtein distance which only does removal, adding and changing, so swapping would count as 2). After some optimizations we came up with:

It’s not the exact Damerau–Levenshtein distance implementation as found on Wikipedia, the original is slower and takes a lot more memory, but it’s based on it, with some ideas gotten from StackOverflow posts. It might not give the same result 100% of the time, but it works really well for us. We run this against a list of possible identifiers in scope where normally an unknown identifier error would occur, and return the lowest item. The Fix-It dialog shows the 10 possible matches below a given threshold. Now this same code can be used for a lot of things. You can even make a regular spell checker with this method.