Abdul Wahidhttp://awahid.net
Expert in Web TechnologiesSun, 01 Mar 2015 23:16:42 +0000en-UShourly1http://wordpress.org/?v=4.1.1Machine Learning is a new form of statisticshttp://awahid.net/blog/machine-learning-is-a-new-form-of-statistics/
http://awahid.net/blog/machine-learning-is-a-new-form-of-statistics/#commentsSat, 28 Feb 2015 23:54:11 +0000http://awahid.net/?p=444Statistics and machine learning are thought to be two separate fields. But if you read good articles from highly reputed journals of machine learning you will realize that these two fields are merging together. Not too long ago, a new field “statistical machine learning” made it clear that these two field have too much in common. Coming from computer science background, I… Read more →]]>Statistics and machine learning are thought to be two separate fields. But if you read good articles from highly reputed journals of machine learning you will realize that these two fields are merging together. Not too long ago, a new field “statistical machine learning” made it clear that these two field have too much in common.

Coming from computer science background, I can sense statistics will dominate the future algorithms.

What is your opinion about the future of machine learning. Do you think it will find its own direction of will it follow the statistics.

]]>http://awahid.net/blog/data-science-related-top-20-short-tutorials-must-read/feed/0Datascience explained in form of a posterhttp://awahid.net/blog/datascience-explained-in-form-of-a-poster/
http://awahid.net/blog/datascience-explained-in-form-of-a-poster/#commentsSun, 25 Jan 2015 03:05:28 +0000http://awahid.net/?p=435ICRIS (http://www.icris.nl) made a simple poster to describe fundamentals of data science. Click on the following image to see the poster in high resolution. Read more →]]>ICRIS (http://www.icris.nl) made a simple poster to describe fundamentals of data science. Click on the following image to see the poster in high resolution.

]]>http://awahid.net/blog/datascience-explained-in-form-of-a-poster/feed/0Basics of Bigdatahttp://awahid.net/blog/basics-of-bigdata/
http://awahid.net/blog/basics-of-bigdata/#commentsThu, 22 Jan 2015 09:45:35 +0000http://awahid.net/?p=428Bigdata is often misunderstood and thought to be very large data, however it is just one aspect of bigdata. The term Bigdata refers to data, which is too complex for traditional approaches to handle. The bigdata have following characteristics. Volume – Large amount of the data. Velocity – Rapid generation of the data. Variability – Inconsistency of the data. Veracity – Quality of… Read more →]]>Bigdata is often misunderstood and thought to be very large data, however it is just one aspect of bigdata. The term Bigdata refers to data, which is too complex for traditional approaches to handle. The bigdata have following characteristics.

Volume – Large amount of the data.

Velocity – Rapid generation of the data.

Variability – Inconsistency of the data.

Veracity – Quality of the data.

Variety – Various forms of the data

I would also like to point out that rich data having multiple views or representation should also be considered as a characteristic of BigData. The next step for you would be to have a look at wikipedia article about BigData and explore more information.

]]>http://awahid.net/blog/basics-of-bigdata/feed/0Weka or LingPipe for New Data Scientisthttp://awahid.net/blog/weka-or-lingpipe-for-new-data-scientist/
http://awahid.net/blog/weka-or-lingpipe-for-new-data-scientist/#commentsSun, 11 Jan 2015 10:40:18 +0000http://awahid.net/?p=415I started working in Weka and Lingpipe around 2 years ago. My task was to develop a better clustering algorithm for text data. I initially used Weka to familiarize my self with basic clustering algorithms, however I found Weka has more documentation for classification algorithms than clustering algorithms. I came across Lingpipe framework on the internet and found that their blog provides… Read more →]]>I started working in Weka and Lingpipe around 2 years ago. My task was to develop a better clustering algorithm for text data. I initially used Weka to familiarize my self with basic clustering algorithms, however I found Weka has more documentation for classification algorithms than clustering algorithms. I came across Lingpipe framework on the internet and found that their blog provides different tutorials about clustering and code walk-through of clustering algorithms. The tutorials were very well written and they helped me in understanding the implementation of clustering algorithms. Lingpipe framework also provide tokenization, stemming and other text processing facilities which saved my time in basic text processing.

I would recommend to start with Lingpipe for new data scientist especially if you are into clustering algorithms and later switch to Weka. I found Lingpipe to be good framework for begin with but Weka was more reliable in case of performing complex text mining tasks especially on large datasets. There are good video tutorials on text mining in Weka which are worth listening to.

If you are new Data Scientist or experienced one, I like to hear your story and your favourite tool/framework for text mining tasks.

]]>http://awahid.net/blog/weka-or-lingpipe-for-new-data-scientist/feed/0Clustering Bigdatahttp://awahid.net/blog/clustering-bigdata/
http://awahid.net/blog/clustering-bigdata/#commentsSat, 03 Jan 2015 11:09:35 +0000http://awahid.net/?p=417Clustering large amount of data brings complexity and requires special clustering algorithms. Common clustering algorithms like k-means are not designed to handle such tasks. Anil K. Jain, A big name in domain of clustering algorithms explains this phenomena in his video lecture (http://videolectures.net/single_jain_bigdata/). He provides a solution “approximate k-means algorithm” which cluster large amount of data (bigdata). Other researcher like Xiao Cai et.… Read more →]]>Clustering large amount of data brings complexity and requires special clustering algorithms. Common clustering algorithms like k-means are not designed to handle such tasks. Anil K. Jain, A big name in domain of clustering algorithms explains this phenomena in his video lecture (http://videolectures.net/single_jain_bigdata/). He provides a solution “approximate k-means algorithm” which cluster large amount of data (bigdata). Other researcher like Xiao Cai et. al, proposes another variant of k-means to cluster bigdata.

Since k-means is faster than other algorithms and has time complexity of O(n), researchers prefer to develop new algorithm based on k-means. However, there are others such as Vincent Granville who used Hierarchical Agglomerative Clustering based algorithms with mapReduce and indexing mechanism to cluster large amount of data. It is interesting to know that his algorithm has complexity of O(nlogn) which is slightly higher, but I assume the quality of results would be better than k-mean variants.

Clustering Bigdata was the problem of 2014 and now there are many algorithms to easily handle such issues.

]]>http://awahid.net/blog/clustering-bigdata/feed/0Learning Curve of Datascience for Software Engineershttp://awahid.net/blog/learning-curve-of-datascience-for-software-engineers/
http://awahid.net/blog/learning-curve-of-datascience-for-software-engineers/#commentsWed, 24 Dec 2014 09:53:43 +0000http://awahid.net/?p=408Coursera offers a free course “Introduction to Datascience“, which provide basic knowledge and specialization track for becoming data scientist. if you are coming from Computer Science background and have good programming skills, then becoming a datascientist would be piece of cake for you. Derrick Harris has previously mentioned that its very easy to become datascientist and I think the learning curve is much shorter for… Read more →]]>Coursera offers a free course “Introduction to Datascience“, which provide basic knowledge and specialization track for becoming data scientist. if you are coming from Computer Science background and have good programming skills, then becoming a datascientist would be piece of cake for you. Derrick Harris has previously mentioned that its very easy to become datascientist and I think the learning curve is much shorter for someone who has four years of Bachelors Degree in Computer Science.

]]>http://awahid.net/blog/learning-curve-of-datascience-for-software-engineers/feed/0Linkedin Integration – Site updatedhttp://awahid.net/blog/linkedin-integration-site-updated/
http://awahid.net/blog/linkedin-integration-site-updated/#commentsSun, 21 Dec 2014 14:02:56 +0000http://awahid.net/?p=405Kudos to Claude Vedovini for developing an awesome plugin for wordpress, which allows users to integrate their linkedin information with wordpress. So far I have customized the linkedin recommendations and created a page for resume. My future plan is to create portfolio page to showcase my previous projects. Read more →]]>Kudos to Claude Vedovini for developing an awesome plugin for wordpress, which allows users to integrate their linkedin information with wordpress.
So far I have customized the linkedin recommendations and created a page for resume. My future plan is to create portfolio page to showcase my previous projects.
]]>http://awahid.net/blog/linkedin-integration-site-updated/feed/0First Blog Entryhttp://awahid.net/blog/first-blog-entry/
http://awahid.net/blog/first-blog-entry/#commentsThu, 18 Dec 2014 21:28:47 +0000http://awahid.net/?p=75I have started using twitter again, and promise my self to at least write one post on weekly basis. I will generally posts related to Datascience, Text Mining, Machine Learning, Artificial Intelligence and Clustering Algorithms. Read more →]]>I have started using twitter again, and promise my self to at least write one post on weekly basis. I will generally posts related to Datascience, Text Mining, Machine Learning, Artificial Intelligence and Clustering Algorithms.
]]>http://awahid.net/blog/first-blog-entry/feed/0Software Re-engineeringhttp://awahid.net/presentations/software-re-engineering/
http://awahid.net/presentations/software-re-engineering/#commentsTue, 16 Dec 2014 09:19:57 +0000http://awahid.net/?p=68Read more →]]>
]]>http://awahid.net/presentations/software-re-engineering/feed/0