Modeling a board

Storing a board in a text file

First, I changed the way that boards are stored. Rather than hard-coding the board or worse, encoding the representation directly into the algorithm with no easy way to change it, I picked a very simple text-based format.

Blank spaces are represented by dots / periods (.). All other characters represent a unique Vertex.

As I was working on the alternate solutions for Day 2, I added certain functions to various modules as I was writing the graph structures. However, in certain cases, I did not require those functions in the final solution but still left them in the module. A good example of this is the opposite function in the Direction module.

Most of the functions for the Board module are internal functions and should not be invoked by an external client. The most important function in the module is parseBoard, which takes in a string representing the board (e.g. the text files I mentioned earlier) and returns a constructed Board that meets a number of requirements.

Boards are two-dimensional.

Each character in the string representation that is not a blank space is turned into a unique vertex.

Each vertex is given a unique integer ID.

Boards do not contain diagonal edges.

Each pair of neighboring vertices has two edges between them.

Given two neighboring vertices, the two edges between them have opposite directions.

Generic solution framework

The solution performs the following steps to get to the final answer for the problem.

Create a Board from the boardFile.

Use that Board to create a graph using bCreateF.

Translate the instructions from the instrFile into Directions.

Take each line of instructions and run them through the graph. The start Vertex for each line of instructions is the result from the previous line of instructions, except for the first line of instructions which uses Vertex "5" as its start Vertex.

Turn the result from each line of instructions into a string, then concatenate them.

The solution framework has two functions, moveAll and day2part1. The former uses bGetNextF to move through a line of instructions. The latter is the connector that converts a board and lines of instructions into a final answer.

Adjacency Matrix

Per Wikipedia, an adjacency matrix is "a square matrix used to represent a finite graph".

Since the AOC Day 2 implementation requires a directed graph, the adjacency matrix cannot be a symmetric matrix. For my implementation, the label for each row represents the "from Vertex", the label for each column represents the "to Vertex" and the value at their intersection represents the Direction of movement.

For example, when looking at the Day 2, Part 1 board below, you can read the first entry as "When you start from Vertex 1 and go Right, you arrive at Vertex 2".

Here is the text representation of the Day 2, Part 1 board as an adjacency matrix.

Internally, an adjacency matrix is stored as an Array2D [,]. The scheme used to create this data structure is somewhat fragile because it assumes that the first Vertex has a unique integer ID of 0 and that subsequent vertices' IDs are incremented deterministically without gaps. Since we control Board and Vertex creation, we can rely on this assumption for now but it would be a dangerous assumption to make in production code.

The getNext function takes the following steps to move through the board:

Use the 2D array to find the row representing the from Vertex.

Once found, go through the row and looks for the desired Direction.

If found, use that column index to go from an integer to a Vertex using Board.vertices.

Edge List

An edge list is an extremely simple graph implementation which does not transform the original Board's internal structures in any way. It is, as the name implies, an exhaustive list of all edges in the graph.

Adjacency List

As opposed to an edge list, an adjacency list takes a Vertex-centric approach to representing graphs. Many implementations of adjacency lists internally use a data structure like a hashmap or hashtable, where the key is a Vertex and the value is a list of vertices (possibly with additional data) that the key can connect to.

For my implementation, I used two separate data structures to implement the adjacency lists - dict and Map. The keys are Vertex instances and the values are lists of Edges.

Inductive Graph

Finally, I used an inductive graph library, Hekate, as my final representation of a graph in F#. As I noted in my last blog post, inductive graphs were introduced by Erwig in his 2001 paper and then implemented (first) as the Haskell fgl library.

Inductive graphs are functional data structures that allow similar operations to other inductive data structures such as lists and trees.

Despite using a library here instead of implementing the data structure myself, I actually enjoyed writing code with this library. The authors have done a great job of using familiar syntax and patterns with Hekate. The only downside I could find is that the Hekate library is almost completely undocumented. I learned how to use the library from the unit tests I found on GitHub and from reading Erwig's original paper.

Here is the Day 2, Part 1 board represented visually as an inductive graph.

The getNext algorithm is relatively simple, considering that each Vertex stores information about both its predecessors and successors, i.e. what points to the Vertex and what the Vertex points to, respectively.

Testing

In the past, I have struggled quite a bit with properly implementing FsCheck tests. However, this time around, the combination of using FsCheck directly (as opposed to using it in combination with a testing library like Fuchu or Xunit) and using it from F# script files (as opposed to compiled programs) made the experience much more pleasant and was a great learning experience.

I wrote tests using FsCheck to ensure that my implementations were correct - and I'm glad I did. I found a number of elementary mistakes by having automated tests that I could easily run after each change to validate my results. For each data structure, I wrote 6 tests:

A printout of the Day 2, Part 1 board (to be verified manually).

A printout of the Day 2, Part 2 board (to be verified manually).

The Day 2, Part 1 test provided in the AOC problem description.

The Day 2, Part 2 test provided in the AOC problem description.

The Day 2, Part 1 problem stated in the AOC problem description.

The Day 2, Part 2 problem stated in the AOC problem description.

I could not automate the first two tests because each data structure's printout was quite different from the others.

I was able to implement the last two tests because I had already solved Day 2's problems and knew the correct answers.

Performance

After writing the five graph data structure implementations, I decided to run some basic performance tests on them. Having a generalized solution framework allowed me to attribute all differences in runtimes to the specific data structure being used.

To perform the tests in the most "unbiased" way possible, I ran each test 5 times and took the average of the results. For each run, I executed the last 4 tests listed in the Testing section.

Without further ado, here are the results from using F# interactive's #time directive.

Average run-time and garbage collection performance of the algorithms, sorted by the Real average column in ascending order.

Algorithm

Real average

CPU average

Gen0 average

Gen1 average

Gen2 average

Adjacency List Dict

0.8452

0.8326

104.2

25.2

0

Adjacency List Map

0.8684

0.8576

105.8

21.6

0

Inductive Graph

0.9894

0.9852

136.2

33

0

Adjacency Matrix

4.4932

4.4798

530.6

78.8

0.6

Edge List

359.2024

359.1448

68896.6

61.4

3.6

I then took the Adjacency List Dict implementation as the baseline (since it was the fastest) and translated the same table into percentages.

Average run-time and garbage collection performance as a percentage, with Adjacency List Dict as the baseline.

Algorithm

Real

CPU

Gen0

Gen1

Gen2

Adjacency List Dict

100%

100%

100%

100%

-

Adjacency List Map

103%

103%

102%

86%

-

Inductive Graph

117%

118%

131%

131%

-

Adjacency Matrix

532%

538%

509%

313%

100%

Edge List

42499%

43135%

66120%

244%

600%

Observations

I was not surprised to find that the Edge List was the worst performer. However, I was surprised by how much worse its performance was, even on a small board, than the next worst algorithm, the Adjacency Matrix.

Along those same lines, I was surprised by how poor the performance of the Adjacency Matrix was. However, I have a feeling that a more optimized implementation could do much better than my naive attempt. I believe my observation (as to my poor Adjacency Matrix writing skills) is accurate considering that it, along with the Adjacency List, is one of the most popular data structures for storing graphs.

The inductive graph performed very well for a relatively new data structure that was intended, from the beginning, to be a new way of storing graphs in functional languages. Its run-time performance was approximately 17% worse than the baseline and memory performance (based on garbage collection) was approximately 31% worse than the baseline. For someone who needs a graph data structure for more than just a getNext function, it could be a serious contender and warrants additional testing.

Next steps

I am satisfied with the results of this investigation into graph data structures in F#. I was able to implement all algorithms in a functional manner within a generic framework. I believe I have accomplished the goal that I set out with, which was to find a more robust and 'realistic' way of representing graph-like structures in F#.

However, here are some items that I could not (or did not) complete and may be worth looking into:

Change the base data structures to enforce more of the expected properties / invariants.

Make the types for Vertex, Edge, Direction, etc. private and move them within their respective modules.

I will now be returning to blogging about my solutions for Advent of Code 2016 problems. I am greatly looking forward to this because I have temporarily stopped solving more problems until I catch up with my blog posts. Hopefully, I will be able to post my Day 3 solutions in the next week.