A blog about the world, my experiences, and the technology that surrounds us.

Tuesday, April 8, 2008

Research Through Open Source and Access

I'm on a few dozen mailing lists and today was informed of a newly released book that provides an overview of genetic algorithms. Called A Field Guide to Genetic Programming, it's available to purchase for about $15 or download for free... And that's what I love about it.

In the last few years, I've become extremely focused and active in social network analysis, and am now hanging out at IBM and learning about data mining as well. What amazes me is the amount of open source tools available for doing this work at virtually no cost. This includes projects like R for statistics, Weka for machine learning, Egotistics for social networks (sorry for the shameless self-promotion - also check out Network Workbench), and many others.

Similarly, books and papers are becoming more freely accessible as well. One of my all-time favourites is Introduction to Information Retrieval, but many others exist. I wish I knew about a few more.

So now you have the theory and the software... Need data? That's not a problem either... I can think of 1, 2, 3 sources off the top of my head... And besides, crawling the World Wide Web for data isn't impossibly difficult, especially if you use the currently available open source offerings, like Nutch.