Follow Blog via Email

Category: Algorithms

To continue learning about F# and to practice programming functionally, I have written a simple Sudoku solver. I use the word simple, because some simple Sudoku problems can be solved only by going over the board and removing entries without having to make “guesses” and backtracking.

This guessing and backtracking usually involves implementing a depth-first search when writing a Sudoku algorithm.

In the interest of time (there’s another project which I have been delaying for months and need to be starting) I have only implemented a solution to the easy type of Sudokus.

The first part is writing code that checks whether a board is correctly solved. This part can be done first and it’s better to do so since we will need it to check whether our Sudoku solver actually works.

The process is pretty easy, in Board.fs I implement the types I need to represent the board (Value) and it’s state (BoardSolved).

Checking the board implies collecting all it’s rows, columns and boxes (which are nine 3×3 regions on the board) and then checking them for duplicates.

To solve a board we go over all rows, columns and boxes and for each item in these collections we remove from the list of possible values those values which have already been selected in another item. I call this “locking” the values.

This operation is applied successively until the board is solved. The most complicated part came because I used a 2D array to represent the board but I needed to convert the rows, columns and boxes values to lists. Converting the lists generated from the boxes was the single most complicated and longest part of the whole program.

I guess it pays to just use lists for everything and skip on using arrays. The language and it’s libraries are pretty much made for working with lists and sequences so when you deviate from that you end up having to write additional code.

To make this solver capable of solving any Sudoku, we would need to change the

| BoardUnlocked -> failwith “Board not yet solved”

line with calling a depth-first search which would select the first unlocked value, try and select the first possible value for this item and then continue recursively trying to solve the board until either it succeeds or it fails. When/if it fails backtrack and continue it’s search using another value.

It works by taking a sample file which contains names, the names should be thematically similar, and uses it to create chains of probabilities. That is, when we find the letter A in the sample, what are the possible letters that can follow this A and what probability is there for each of these letter to come up.

This table can be serialized to prevent recomputing it each time we call the algorithm.

The algorithm works with sub strings of size X where a small X will provide more random results (less close to the original result) but a larger X will provide results more closely aligned with the sample file.

Results more closely aligned with the sample file better reflect the sample but face a higher risk of ending up as a pastiche of 2 existing names or in some cases being one of the sample’s name as is.

Here’s how the sub strings work:

If we have the following name in our sample file:

Gimli

Using a sub string length of 2 would add all these sub strings in our probability table:

G i

i m

m l

l i

While using a sub string length of 3 would add these sub strings:

G im

i ml

m li

And so on as we increase the length of the sub strings.

After we have counted all the possible occurrences of each sub string over the whole file we assign a probability to each one.

For example, if for our whole file we would have the following possible sub strings for G

G im

G lo

G an

Each of these would be assigned a probability of 33.3%.

Generate name length info from the input file

The generate names of a length representative of our input sample we simply count the length of each name and derive a mean value and standard deviation. Using the mean and standard deviation will then easily allow us to draw a value from the normal distribution of word lengths.

Additionally it’s best to enforce a minimum word length. Even if our sample contains shorter names (2 or 3 letters long), from experience the algorithm doesn’t produce convincing results on these shorter lengths.

This is because it doesn’t differentiate sub strings for long and short names.

Generate a name with the name length and probability table

To generate a name we start by finding our desired name length using our name length info. Then we select our first character, the white space character.

We then generate a number between 0.0 and 1.0 (or 0 and 100) and using a prebuilt dictionary containing the probability table, find the next item.

Code

Sample and Examples

The larger the sample the better. Also the more thematically aligned the sample, the better. What I mean by thematically aligned is if you include the names of all Greek masculine mythological figures, you will get results that resemble the names of the Greek heroes and Gods.

For example:

Thedes
Kratla
Pourseus

On the other hand if you build your samples with names from the Lord of The Rings but include an equal part of Hobbits, Dwarf, Elven and Orcish names you will end up with a mishmash that does not make much sense.

Finally here are some results of the algorithm using the this sample file containing the names of some of the locations in the games Final Fantasy XI and Final Fantasy XIV:

The algorithm is pretty similar to diamond-square, in fact I have seen some people call it so, but it’s subtly different (in how to various sub-sections are divided) from the canon example, which is why I’m referring to it as midpoint displacement rather than diamond-square.

I’m pretty happy with the output of the results. It’s better than any map I have done before. Here is an example :

The code would need some optimization has it’s running out of memory fairly quick when generating larger maps.

You can find it as part of a larger repo on GitHub, that I have sadly abandoned.