Project 5 - Data mining module, finding frequent network-itemsets

Project Overview:
The project is still going to apply Data Mining/Machine Learning solutions, however the idea has slightly evolved after discussions with the project mentors, Mario and Jeff.
The main idea is to build a tool allowing to do statistical/data mining analysis on data from dionaea sensors. This may mean correlating source/dst addresses with ports, different attack id's by applying apriori algorithm. The data will be stored in the database (a schema will be designed) so that some tools (not a GSoC part) will be able to visualize the results.
The application will be however built on a light framework allowing users to plug in external modules and by writing configuration files, hence creating dynamic 'workflows'. This basically means that most of the loaded components (channels reading and 'understanding' data, processors channging the input into format that algorithm understands and than does the opposite work and loggers that work data into proper places like stdout, database or file) can be reused, many times, in different configurations (this is a general case, because every adjacent component needs to 'understand' its neighbour by design). I would also like to focus on making couple of 'ready to use' components - for example channels reading data from hpfeeds channels and some generic loggers for logging data to stdout doing some coloring and standard formatting.
The application should by design do it's job periodically, be able to collect data 'online' and on demand.

Project Plan:

April 23rd - May20th: Community Bonding Period
Student keeps in touch with his mentors, discussing wide range of issues - from the high level architecture, project functionalities to the technology used for the project.

May21th - July 1st
First version of the framework, allowing to dynamically (un)load channels/modules/loggers, read parameters from the configuration file. This means the framework should be ready and robust enough to build modules doing the real work.

July 2nd - July 9th
First versions of channels reading various data. I would like to make one channel used in a final configuration of the project and some just as an extension for the project.

July 9th - July 13th: Mid Term Assessments

July 14th - August 10th:
Work on the apriori data mining module. This also includes processor algorithm

Issues: Had some problems with creating classes from shared modules, but everything works fine (however it took couple of hours and not 100% of plan was fulfilled). I will be offline for the weekend and hence cannot work on GSoC project.

11 June 2012 and 18 June 2012

My involvement was much smaller as I had to get things done with my school tasks.

Made a channel module cooperating with the main libev loop, listening on the local port and passing data to the internal application structures.

Did proper channels/loggers loading based on configuration file.

The code now should work properly when some components are missing (e.g. due to intentional configuration)