Saturday, May 30, 2015

Statistics, Data Mining, Artificial Intelligence and Machine Learning are all inter-related concepts: Statistics is a field of mathematics. Statistics is at the base of Machine Learning and the application of Machine Learning enhances the field of Data Mining. Data Mining is a step of Knowledge Discovery from Data. Some peoples see Data Mining as a step of Machine Learning, other approaches Machine Learning just to algorithm/ search part of Knowledge Discovery from Data. Artificial Intelligence is an interdisciplinary field of science. "It attempts not just understand but also to build intelligent entities. Machine Learning is Applied Artificial Intelligence (AI). Machine Learning is also an interdisciplinary field of Artificial Intelligence. The Machine Learning target is specifically to develop ways to do a computer "Learning."

AI is not only about learning. AI is also about understanding language, planning, representing and reasoning with knowledge, etc. Statistics is a good tool, but it is still just one tool that is useful in many situations, especially in data mining and thus machine learning. Artificial Intelligence is a lot more, it is about building human-like intelligence. Now surely no one would seriously think that all we humans do is statistics! Surely our intelligence is not all due to some statistical module we have in our brains. We do lot more, we interpret, we do semantics and pragmatics, we reason with concepts and knowledge, we make plans, algorithms and we execute our plans (planning is a huge area in AI), and we adjust our knowledge based on the implications of our actions via framing problem in AI, etc...

Artificial Intelligence is a field whose purposes are creating computational models of natural intelligent systems; it is not necessary human intelligence, and it will apply these models to various real world problems. Of course, the most appealing challenge is to re-create an artificial human intelligence and consciousness. The empirical principles of Artificial Intelligence such as cognition and self-awareness usually limit the perception of intelligence to humans or living creatures. But let's not forget that the Universe itself is the greatest known natural proof of storing information, adaptability and decision making. And the purpose of Universe is not creating life, but developing methods to store more and more information, life and intelligence of life is a collateral effect of all processes in Universe. There is no universally accepted definition of an intelligent natural system. But such a system should have at least three fundamental features: (1). To store information about the experiences it's been through. (2). To process these information in order to adapt itself and (3). To take decisions based on its experience.

Machine learning is a science that involves development of self-learning algorithms in AI. Machine Learning is a field in Artificial Intelligence, dealing with methods to describe the three components of intelligence: Memory, Adaptation, Decision (MAD). All Machine Learning methods have different levels. Typically, the most important application of Machine Learning are pattern recognition (supervised and unsupervised classification) and prediction. -Statistics is the oldest data science. It is now accepted that Statistics and Machine Learning often do answer to the same type of questions, but in very different ways. Statistics uses linear or non-linear parametric models to explain causality and to make predictions, while Machine Learning typically uses non-linear and non-parametric approach who rarely explains causality but instead focus on performance of predictions.

The difference between classic Statistics and Data Mining: Classic Statistics studies small and moderate volumes of data sampled from populations, using asymptotic theory of convergence (hence distribution based methods), while Data mining uses moderate to large volumes of data with no or little parametric assumptions. One can see Data Mining as a continuity of Statistics to large data sets. The most important thing is that the Data Analyst/Scientist/ Researcher to know these level of appliance and to be aware of the most suitable techniques to be chosen for a specific problem and a specific data-set. If this is true, it requires a natural, fundamental and deterministic definition of intelligence. Because otherwise nobody knows what to implement.

Artificial intelligence can be viewed as the ability for a computer to learn and reason. Learning would be generating a hypothesis or output for a certain input data set, while reasoning can be seen as deciding whether or not to act upon those learned hypotheses. Statistics addresses to the study of uncertainty and is largely used in both machine learning and other artificial intelligence fields such as communication, planning and so on. Software learning is "adaptive," it becomes Artificial Intelligence only when the software embodies a partial model of human behavior which improves in accuracy with learning. Not many so called AI meet this requirement. Artificial Intelligence is an interdisciplinary field of science, so it is important to clarify the relevant concepts accordingly.