The goal of this post is to let you better use ridge regression than just use what libraries provide. Then, ‘What is Ridge Regression?’. The simplest way to answer the question is ‘Variation of Linear Regression’. The worst way is to start with the following mathematical equations not many can understand at first glance.

In machine learning, classification problems are one of the most fundamentally exciting and yet challenging existing problems. The implications of a competent classification model are enormous?-?these models are leveraged for natural language processing text classification, image recognition, data prediction, reinforcement training, and a countless number of further applications. However, the present implementation of classification algorithms are terrible. During my time at Facebook, I found that the generic solution to any machine learning classification problem was to ‘throw a gradient decent boosting tree at it and hope for the best’. But this should not be the case?-?research is being put into modern classification algorithms and improvements that allow significantly more accurate models with considerable less training data required. Here, we explore some particularly interesting examples of modern classification algorithms. This article assumes some level of familiarity with machine learning, however, the majority of the post should still be accessible without.

Time series analysis is an approach to analyze time series data to extract meaningful characteristics of data and generate other useful insights applied in business situation. Generally, time-series data is a sequence of observations stored in time order. Time-series data often stands out when tracking business metrics, monitoring industrial processes and etc.

We all do that. Before we make any big decisions, we ask people’s opinions, like our friends, our family members, even our dogs/cats, to prevent us from being biased or irrational. The model does that too. it is very common that the individual model suffers from bias or variances and that’s why we need the ensemble learning. Ensemble learning, in general, is a model that makes predictions based on a number of different models. By combining individual models, the ensemble model tends to be more flexible (less bias) and less data-sensitive (less variance).

The beaten path of machine learning involves journeying to familiar landmarks and scenic locales. One set of familiar landmarks are predefined loss functions that give you a suitable loss value for the problem you are trying to optimize over. We’re familiar with the cross-entropy loss for classification and the mean squared error (MSE) or root-mean square error (RMSE) for regression problems. Popular ML packages including front-ends such as Keras and back-ends such as Tensorflow, include a set of basic loss functions for most classification and regression tasks. But off the beaten path there exist custom loss functions you may need to solve a certain problem, which are constrained only by valid tensor operations.

It’s a well-documented fact that knowing Python for Data Science is a must. Python has a wide range of libraries that allow us to perform core data analysis activities easily. Without wasting any more words in establishing how important it is for a data person to know Python, we will dive right into the cold sea.
This article is the first piece in Python for Data Science Series which encompasses everything related to Python, right from the basics to the ‘cool’ stuff. In this particular article, the following points will be covered:
1. Introduction To Python
2. Installing Python and Jupyter Notebook
3. The Python Basics

This tutorial is for the beginners who want to learn Elasticsearch from the scratch. In this tutorial i am going to cover all the basic and advance stuff related to the Elasticsearch. So let’s get started.

Adversarial examples are an interesting topic in the world of deep neural networks. This post will try to address some basic questions on the topic including how to generate such examples and defend against them.

Clustering is a powerful unsupervised knowledge discovery tool used today, which aims to segment your data points into groups of similar features. However, each algorithm is pretty sensitive to the parameters. Similarity based techniques (K-means, etc) are tasked with designating how many clusters exist, while hierarchical usually require manual intervention to decide when to assign finished clusters. The most common density based approach, DBSCAN, requires only two parameters pertaining to how it defines its ‘Core Points’, but finding the parameters can often be an extremely difficult task. It also will not be able to find clusters of differing densities. There is a relative of DBSCAN, called OPTICS (Ordering Points to Identify Cluster Structure), that invokes a different process. It will create a reachability plot that is then used to extract clusters and although there is still an input, maximum epsilon, it is mostly introduced only if you would like to try and speed up computation time. The other parameters don’t have as big an effect as their counterparts in other clustering algorithms, and are much easier to use defaults. First, I will explain a little how this algorithm works, how it can include in-line outlier detection, and then how it was very useful for me in a recent application.

Support Vector Machine is one of the most popular supervised classifier used in the domain of Machine Learning. Let us get to know about the intuition behind Support Vector Machine (SVM). Note that in all the coming sections Support Vector Machine would be referred as SVM.

In this article, I will introduce a couple of different techniques and applications of machine learning and statistical analysis, and then show how to apply these approaches to solve a specific use case for anomaly detection and condition monitoring.

PCA has always haunted me. It is one of those concepts which I have crammed so many times yet could not get them right in my subconscious mind. This was before I totally understand the nuances of Linear Algebra.

A hallmark of machine learning is dealing with massive amounts of data from various domains. Regardless of whether this data is processed as an image, video, text, speech, or purely numeric, it almost always exists in some high-dimensional space. In this article, I’ll show how data is represented in higher dimensions, and how we can interpolate between them. Since it’s almost impossible to visualize these abstract spaces, I’ll also provide some helpful analogies to think about.