Thursday, September 29, 2011

A long time ago, I came across a youtube video describe some of the more interesting features in PostgreSQL. Since I am stuck using MySQL for most of my paid projects, I waited until a side project grabbed my interest enough to devote a small bit of time to it.

The Problem

Given a list of words, use each word as a node. An edge exists between words if the levenshtein distance between them is 1. Find paths between words and count how many words are connected to a root word.

The "Solution"

Doing the full word list takes quite some time, but things are fairly fast with just 10000 words. The words file has a lot of proper nouns, and words with punctuation, so I filter all of those out. I also filter out one letter words, as they are only one distance away from a blank character.

Here we create a list of edges based off of the levenshtein distance. You will need to install contrib/fuzzystrmatch.sql onto your db if you get the error "HINT: No function matches the given name and argument types. You might need to add explicit type casts"

This isn't really the efficient way to do things. It covers every possible route away from 'week', and lists them. It won't get caught in a loop, but it will list out every possible path between each word that is connected to 'week'