The process of extracting patterns from data is called data mining. It is recognized as an essential tool by modern business since it is able to convert data into business intelligence thus giving an informational edge. At present, it is widely used in profiling practices, like surveillance, marketing, scientific discovery, and fraud detection.

There are four kinds of tasks that are normally involve in Data mining:

* Classification - the task of generalizing familiar structure to employ to new data
* Clustering - the task of finding groups and structures in the data that are in some way or another the same, without using noted structures in the data.
* Association rule learning - Looks for relationships between variables.
* Regression - Aims to find a function that models the data with the slightest error.

For those of you who are looking for some data mining tools, here are five of the best open-source data mining software that you could get for free:

OrangeOrange is a component-based data mining and machine learning software suite that features friendly yet powerful, fast and versatile visual programming front-end for explorative data analysis and visualization, and Python bindings and libraries for scripting. It contains complete set of components for data preprocessing, feature scoring and filtering, modeling, model evaluation, and exploration techniques. It is written in C++ and Python, and its graphical user interface is based on cross-platform Qt framework.

RapidMinerRapidMiner, formerly called YALE (Yet Another Learning Environment), is an environment for machine learning and data mining experiments that is utilized for both research and real-world data mining tasks. It enables experiments to be made up of a huge number of arbitrarily nestable operators, which are detailed in XML files and are made with the graphical user interface of RapidMiner. RapidMiner provides more than 500 operators for all main machine learning procedures, and it also combines learning schemes and attribute evaluators of the Weka learning environment. It is available as a stand-alone tool for data analysis and as a data-mining engine that can be integrated into your own products.

WekaWritten in Java, Weka (Waikato Environment for Knowledge Analysis) is a well-known suite of machine learning software that supports several typical data mining tasks, particularly data preprocessing, clustering, classification, regression, visualization, and feature selection. Its techniques are based on the hypothesis that the data is available as a single flat file or relation, where each data point is labeled by a fixed number of attributes. Weka provides access to SQL databases utilizing Java Database Connectivity and can process the result returned by a database query. Its main user interface is the Explorer, but the same functionality can be accessed from the command line or through the component-based Knowledge Flow interface.

JHepWorkDesigned for scientists, engineers and students, jHepWork is a free and open-source data-analysis framework that is created as an attempt to make a data-analysis environment using open-source packages with a comprehensible user interface and to create a tool competitive to commercial programs. It is specially made for interactive scientific plots in 2D and 3D and contains numerical scientific libraries implemented in Java for mathematical functions, random numbers, and other data mining algorithms. jHepWork is based on a high-level programming language Jython, but Java coding can also be used to call jHepWork numerical and graphical libraries.

KNIMEKNIME (Konstanz Information Miner) is a user friendly, intelligible, and comprehensive open-source data integration, processing, analysis, and exploration platform. It gives users the ability to visually create data flows or pipelines, selectively execute some or all analysis steps, and later study the results, models, and interactive views. KNIME is written in Java, and it is based on Eclipse and makes use of its extension method to support plugins thus providing additional functionality. Through plugins, users can add modules for text, image, and time series processing and the integration of various other open source projects, such as R programming language, Weka, the Chemistry Development Kit, and LibSVM.

If you know of other free and open-source data mining software, please share them with us via comment.

Wednesday, November 17, 2010

Really this raised many things. The cost of the cloud computing vs private computing. Also this sheds important light at your security at the cloud. I do not want to force my opinions so will leave you with the article. Enjoy!!!

As of today, Amazon EC2 is providing what they call "Cluster GPU Instances": An instance in the Amazon cloud that provides you with the power of two NVIDIA Tesla “Fermi” M2050 GPUs. The exact specifications look like this:

This just shows one more time that SHA1 for password hashing is deprecated - You really don't want to use it anymore! Instead, use something like scrypt or PBKDF2! Just imagine a whole cluster of this machines (Which is now easy to do for anybody thanks to Amazon) cracking passwords for you, pretty comfortable Large scaling password cracking for everybody!

If I find the time, I'll write a tool which uses the AWS-API to launch on-demand password-cracking instances with a preconfigured AMI. Stay tuned either via RSS or via Twitter.

Installation Instructions:

I used the "Cluster Instances HVM CentOS 5.5 (AMI Id: ami-aa30c7c3)" machine image as provided by Amazon (I choosed the image because it was the only one with CUDA support built in.) and selected "Cluster GPU (cg1.4xlarge, 22GB)" as the instance type. After launching the instance and SSHing into it, you can continue by installing the cracker: