BayesDB is suitable for analyzing complex, heterogeneous data tables with up to tens of thousands of rows and hundreds of variables. No preprocessing or parameter adjustment is required, though experts can override BayesDB's default assumptions when appropriate.

BayesDB assumes that each row of your table is a sample from some fixed population or generative process, and estimates the joint distribution over the columns. BQL then allows you to draw Bayesian inferences about individuals and about the overall population or process. The estimates are currently provided by CrossCat, a new nonparametric Bayesian method for analyzing high-dimensional data tables.

Fill in missing data with the INFER command. Unlike a traditional regression model, where you need to separately train a supervised model for each column you're interested in predicting, INFER statements are flexible and work with any set of columns to predict.

SIMULATE salary FROM mytable WHERE age > 30;

Easily simulate new probable observations based on CrossCat's estimate of the joint density of the data.

ESTIMATE PAIRWISE DEPENDENCE PROBABILITIES FROM mytable;

With just one command, estimate any pairwise function of columns, including the probability that the two columns are statistically dependent, the mutual information between columns, and their correlation.