DataAspirant August2015 newsletter

August newsletter, We rounded up the best blogs for anyone interested in learning more about data science. Whatever your experience level in data science or someone who’s just heard of the field, these blogs provide enough detail and context for you to understand what you’re reading. We also collected some videos too. Hope you enjoy August dataaspirant newsletter.

Blog Posts:

Oh god, another one of those subjective, pointedly opinionated click-bait headlines? Yes! Why did I bother writing this? Well, here is one of the most trivial yet life-changing insights and worldly wisdoms from my former professor that has become my mantra ever since: “If you have to do this task more than 3 times just write a script and automate it.”

The takeaway here is that backpropagation doesn’t optimize! It moves the error information from the end of the network to all the weights inside the network so that an different algorithm can optimize those weights to fit our data. We actually have a plethora of different nonlinear optimization methods that we could use with back propagation.

Bokeh is a Python library for interactive visualization that targets web browsers for representation. This is the core difference between Bokeh and other visualization libraries. Look at the snapshot below, which explains the process flow of how Bokeh helps to present data to a web browser.

Artificial neural networks (ANNs) were originally devised in the mid-20th century as a computational model of the human brain. Their used waned because of the limited computational power available at the time, and some theoretical issues that weren’t solved for several decades (which I will detail at the end of this post). However, they have experienced a resurgence with the recent interest and hype surrounding Deep Learning. One of the more famous examples of Deep Learning is the “Youtube Cat” paper by Andrew Ng et al.

So far we’ve covered using neural networks to perform linear regression. What if we want to perform classification using a single-layer network? In this post, I will cover two methods: the perceptron algorithm and using a sigmoid activation function to generate a likelihood.

Every one has a different style of learning. Hence, there are multiple ways to become a data scientist. You can learn from tutorials, blogs, books, hackathons, videos and what not! I personally like self paced learning aided by help from a community – it works best for me. What works best for you?

If the answer to above question was class room / instructor led certifications, you should check out machine learning certifications and data science bootcamps. They offer a great way to learn and prepare you for the role and expectations from a data scientist.

An aspect that is important but often overlooked in applied machine learning is intervals for predictions, be it confidence or prediction intervals. For classification tasks, beginning practitioners quite often conflate probability with confidence: probability of 0.5 is taken to mean that we are uncertain about the prediction, while a prediction of 1.0 means we are absolutely certain in the outcome. But there are two concepts being mixed up here. A prediction of 0.5 could mean that we have learned very little about a given instance, due to observing no or only a few data points about it. Or it could be that we have a lot of data, and the response is fundamentally uncertain, like flipping a coin.

We will be using Python a fair amount in this class. Python is a high-level scripting language that offers an interactive programming environment. We assume programming experience, so this lecture will focus on the unique properties of Python.

Programming languages generally have the following common ingredients: variables, operators, iterators, conditional statements, functions (built-in and user defined) and higher-order data structures. We will look at these in Python and highlight qualities unique to this language.

This is second post in three-part series on deep-dive into k-Means clustering. While k-Means is simple and popular clustering solution, analyst must not be deceived by the simplicity and lose sight of nuances of implementation. In previous blog post, we discussed various approaches to selecting number of clusters for k-Means clustering. This post will discuss aspects of data pre-processing before running the k-Means algorithm.

Linkedin Post:

Big Data technology has been extremely disruptive with open source playing a dominant role in shaping its evolution. While on one hand it has been disruptive, on the other it has led to a complex ecosystem where new frameworks, libraries and tools are being released pretty much every day, creating confusion as technologists struggle and grapple with the deluge.

If you are a Big Data enthusiast or a technologist ramping up (or scratching your head), it is important to spend some serious time deeply understanding the architecture of key systems to appreciate its evolution. Understanding the architectural components and subtleties would also help you choose and apply the appropriate technology for your use case. In my journey over the last few years, some literature has helped me become a better educated data professional. My goal here is to not only share the literature but consequently also use the opportunity to put some sanity into the labyrinth of open source systems.

A while back ago, I was asked: “What is your favorite office activity?” and without a doubt it is Data Science Tuesday, where the members of the team (and anyone from the company) get together to discuss research papers on a variety of topics from Data Science, Computer Science, Software Engineer, Social Networks, Psychology, Sociology, Neuroscience and even personality assessment.

I believe most of the greatest ideas that nurture the R&D projects, the product and the vision of the Data Science team, came during these collaborative times. We could enjoy a good lunch and brainstorm the heck out of cutting-edge research papers.

In my own experience, as a Data Scientist, I have grown technically and professionally by discussing ideas from several different topics. In addition, I firmly believe, it can boost the performance of any Data Science Team.

It’s a beautiful dawn out there, in the universe of Big Data & Analytics. With help from a nice weekend morning, good coffee, and the motivation from fellow bloggers on this forum, sharing what I see and experience in this space.

That’s all for August 2015 newsletter. Please leave your suggestions on newsletter in the comment box so that we improve for next month newsletter. To get all dataaspirant newsletters you can visit monthly newsletter page. Subscribe to our blog so that every month you get our news letter in your inbox.