Introduction

Without giving a Computer Science 101 lecture, I'll provide you with 2 pictures that says a thousand words. In short, DFA stands for Deterministic Finite Automation (or Automata). It's cousin is NFA, non-deterministic FA. An example of a NFA is looping through a big list looking for a word. The word might be found quickley, or it might only find it in the last entry. This is said to be non-deterministic. So we can then say for a DFA we will know exactly how long a search for a word (as per the NFA example) will take.

DFA representation of Table A. A circle with an outer circle denotes an endstate.

DFA representation of Table B. A circle with an outer circle denotes an endstate.

You might ask what that System.Object is doing there, well nothing. UPDATE: I have now removed the root object from the generated graph. As you can see, it is not an endstate, so it will not affect anything. But it does serve a purpose. It acts as the very first node that is created regardless of the key type. I can remove it, but then I would have to check for null everytime and that would be too costly. So dont worry about it!

This base implementation for the DFA state machine. I have provided several implementations for typesafe Int32, String, Char, Boolean, Float, Object and Type. Feel free to add your own and see comments I have meda in the code. Arrays of Arrays have terrible casting issues. I have also included a Combination class that I did many moons back, and never had a place to put. Combinations are very good to build DFA's. Also included is a utiliy class to generate all these pretty pictures (yes, this is aimed directly at you, Marc Clifton ) you can see here (you will need GraphViz for this). UPDATE: The GraphViz utilities has been greatly improved, and from above you can see you can now specify rendering options on a State level. The implementation is similar to ASP.NET's custom control Rendering. In other words, if you need to change something, just override RenderXXX. See TypeState's implementation for a good example.

The class consists mostly of non-human comprehensable code (almost every function is either running in a while(true) loop or recusively). This is most mostly a generic port of a C project that was done for Computer Science. From the 2 count functions's code, you will see the "multi dimensional linked list" is rather easily "walkable" with recursion (I had some stack issues, but havent been able to replicate them).

Purpose

So you may ask yourself why or how do I use this? The way the machine has been setup, is to take a "path" at a time, so adding a sentence or a word, get automatically laid out in the machine. No other intervention is required. One of the most difficult things to do is changing a NFA to a DFA, this is all done for you. Then all you need to do is query it. Some functions like Match() can take longer with "wildcards" (set these up as null, see the implementations).

You might ask me now why I dont just use an ArrayList or a HashTable? Simple. An ArrayList is good for batches of data, but poor at searching. A HashTable is excellent for lookups, but fails to be useful for pattern matching or "path" finding. In fact you should be using either of the afore mentioned in most cases. Examples of what this is useful for:

Spellchecker

Word/code completion

Code parser

Path finding

Any many more things most people only dream of

Performance

As with any DFA performance is key. This implementation is comparable to a Hashtable. In fact, if boxing was removed, lookups would be even nearer (2x instead of 3x in my experiments). Here is a simple console output from a 500 000 wordlist loaded in a char[]:

As one can see, lookups are 3 times slower than a Hashtable, but remember that a HashTable is incredibly fast, but does not have the same characteristics. Note the slowness of extracting all the entries. Not bad if you think that the DFA is converted to raw data again.

Thanks to the person who wrote the HP timer class I used for the timing (sorry, cant remember your name now).

Walkthrough

Say for example you wanted a quick inheritance graph of an assembly. You would simple do this:

Now we have a string we can pass to Dot (the GraphViz executable) and we end up with:

Pretty isn't it? But hey, that isnt even the purpose! Luckily GraphViz fits like a glove and its making many more things possible. If any one wants to send some network trace logs, I'll put up some renderings. UPDATE: Here they are! I have cropped the graph somewhat.

Observations

A question might arise why things like "leppie can eat" and "john can eat" does flow back into each other. This is infact possible, but a problem arises when you add "leppie can eat meat". Does this mean john can eat meat too? It depends how accurate you want to be. In my case, I have opted for accuracy. Adding this facility would only increase load time and to make it 100% accurate will require an NFA pass or a major change to the algorhythm. Suggestions are welcome.

Conclusion

I hope you enjoy using this, and feel free to comment as usual. The test project, basically contains just some tests I have run, and will probably cause an exception on your system. Look at it however to see how its meant to be used.

Changelog

1.1

Totally redone GraphViz utility and added rendering on a State level. Both transition (edges) and state (node) attributes can be tweaked. I have kept the naming as near as possible to GraphViz's syntax for those that are fimiliar with the product.

Uploaded new version with working tests (as you see them here) and pictures.

1.2

Fix the performance bug.

Made the pictures prettier

Built a NET callable dll for Dot. This can be downloaded seperately. The compiling and linking and sorting out MC++ issues was a pain, but it finally works now. If you have problems I can send you a project file, but be warned, its a mess too.

Future Plans

Now, so all of you will come back and back again, I'm listing some plans I have for the future (viable suggestions will be added too):

Ability to add state and transistion data, for the ultimate rendering experience.

Perhaps, build a .NET wrapper for GraphViz (if someone can send me a solution file for the source, it would appreciated as the source is a mess). UPDATE : Built the dll, now for the wrapper. Has a small interface to call from .NET.

I thought perhaps at a later stage I could implement the renderning GDI. Perhaps, later.

License

This article, along with any associated source code and files, is licensed under The BSD License

About the Author

Comments and Discussions

I´m using QuickGraph and dot.dll for visualizing a directed graph on an asp.net webpage. Now my question is: Is there a possibility to write the graph into a bytestream instead of storing it in a file? To directly display it on the webpage?
Or does anyone know another graph visualization tool supporting this?