One of the challenges when analyzing data is having to move it from place to place. A common method for analysis would be to extract data from a data warehouse to then analyze it on a local pc/laptop. Then once the analysis or model build is complete the logic then needs to be transferred back to the data warehouse so it can be put in production. This process can be slow and error-prone.

MADlib is an opensource library of analytic and machine learning functions that can run in-database. By installing them on your database, you won’t need to move the data anymore. Instead, you can move your logic to the data.

Here is How-to: Use MADlib Pre-built Analytic Functions with Impala – check out this post from Cloudera.

As the Open Source software movement continues the strengthen, questions abound about where the opportunities to create commercially viable solutions. Red Hat did it with Linux. Can Cloudera do it with Hadoop? Read this GigaOm article.