~ Broaden your Horizon

Distilled News

2017 has been a really exciting year for a data science professional. This is pretty evident from the new technologies that have been emerging day-by-day such as Face-ID which has revolutionized the way we secure information in our mobile phones. Self-driving cars had been a myth, but now they are very much a reality, the adoption of which can be seen by governments throughout the world.
Data science is a field wherein ground-breaking research is happening at a much faster pace, in comparison to any other emergent technologies ever before. The time between contemplating a research idea and actually implementing it has come down significantly. . This is also fueled by the immense amount of resources freely available to everyone – which essentially enables even a normal person to contribute to research in their own way. For example, GitHub (a collaborative platform for software development) is now paving the way for research ideas to be shared in an implementation format. As Andrew Ng said
: ‘Data is the new Oil;AI is the new Electricity
‘.
Personalization and Automation is the talk of the day and more and more industries such as Financial Services, Healthcare, Pharmaceuticals and Automotive are adapting to the developments being brought upon by better Machine learning / Deep Learning models. This article specially focuses on the defining moments of Data Science in 2017. We have kept a few criterion in mind when we curated the list, namely:
•As a data science professional, does the event affect you in any way?
•Does it influence your learning or your daily workflow?
•Is it an innovative startup, product release or recent development?
•Is it an industry collaboration which will impact the future of data science?
Also, we have shared our predictions in Data Science industry for the year 2018, which we believe would be really something to look forward to.

One of the most interesting effects of PCA (Principal Component Analysis) is to decorrelate the input covariance matrix C, by computing the eigenvectors and operating a base change using a matrix V. The eigenvectors are sorted in descending order considering the corresponding eigenvalue, therefore Cpca is a diagonal matrix where the non-null elements are ?1 >= ?2 >= ?3 >= … >= ?n. By selecting the top p eigenvalues, it’s possible to operate a dimensionality reduction by projecting the samples in the new sub-space determined by the p top eigenvectors (it’s possible to use Gram-Schmidt orthonormalization if they don’t have a unitary length). The standard PCA procedure works with a bottom-up approach, obtaining the decorrelation of C as a final effect, however, it’s possible to employ neural networks, imposing this condition as an optimization step. One the most effective model has been proposed by Rubner and Tavan (and it’s named after them)

Everybody is freaking out about the rise of the Bitcoin and the potential of the Blockchain technologies. The advent of cryptocurrencies, game changing use cases, disruption of established business models by disintermediation, etc.. ?By the time I’m writing this article, there are more than 1300 crypto-currencies listed in coinmarketcap.. And a lot more coming with the next ICOs (Internet Coin Offering). Most certainly, the main enabler of Bitcoin and of many other currencies (although not all of them) is the Blockchain technology.

During the process of data analysis one of the most crucial steps is to identify and account for outliers, observations that have essentially different nature than most other observations. Their presence can lead to untrustworthy conclusions. The most complicated part of this task is to define a notion of “outlier”. After that, it is straightforward to identify them based on given data. There are many techniques developed for outlier detection. Majority of them deal with numerical data. This post will describe the most basic ones with their application using dplyr and ruler packages.

Reinforcement learning uses “reward” signals to determine how to navigate through a system in the most valuable way. (I’m particularly interested in the variant of reinforcement learning called “Q-Learning” because the goal is to create a “Quality Matrix” that can help you make the best sequence of decisions!) I found a toy robot navigation problem on the web that was solved using custom R code for reinforcement learning, and I wanted to reproduce the solution in different ways than the original author did. This post describes different ways that I solved the problem described at http://…/hopping-robots-and-reinforcement.html

Time series and streaming databases are hot right now. That’s why my second wife left me. Just like a lot of people these days, everybody wants one or thinks that they want one. Given this, we are bringing back another season of database technical talks at Carnegie Mellon University in Fall 2017. The ‘Time Series Database Lectures’ is a semester-long seminar series featuring speakers from the leading developers of time series and streaming data management systems. Each speaker will present the implementation details of their respective systems and examples of the technical challenges that they faced when working with real-world customers. You need to attend these lectures. Trust me on this. Videos will be posted after each talk.