About:
An open-source C# market-basket synthetic data generator, capable of creating transactions, sequences and taxonomies, based on the IBM Quest version.
Written to address the maintainability and portability problems of the original, feedback, fixes and extensions are encouraged!

About:
Epistatic miniarray profiles (E-MAPs) are a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others.
This project contains nearest neighbor based tools for the imputation and prediction of these missing values. The code is implemented in Python and uses a nearest neighbor based approach. Two variants are used - a simple weighted nearest neighbors, and a local least squares based regression.

About:
OpenGM is a free C++ template library, a command line tool and a set of MATLAB functions for optimization in higher order graphical models. Graphical models of any order and structure can be built either in C++ or in MATLAB, using simple and intuitive commands. These models can be stored in HDF5 files and subjected to state-of-the-art optimization algorithms via the OpenGM command line optimizer. All library functions can also be called directly from C++ code. OpenGM realizes the Inference Algorithm Interface (IAI), a concept that makes it easy for programmers to use their own algorithms and factor classes with OpenGM.

About:
Pyriel is a Python system for learning classification rules from data. Unlike other rule learning systems, it is designed to learn rule lists that maximize the area under the ROC curve (AUC) instead of accuracy. Pyriel is mostly an experimental research tool, but it's robust and fast enough to be used for lightweight industrial data mining.