Issue #69

March 19 2015

Editor Picks

DJ Patil Talks Nerdy To UsDJ Patil was recently named the White House’s deputy chief technology officer for data policy and chief data scientist, making him the first-ever national data scientist...In a phone call Monday, I spoke with DJ about open data, his transition from the private sector to government, and the Obama administration’s data-focused initiatives and transparency record. Here is a lightly edited transcript of our conversation...

Data Science Done Well Looks Easy - And That Is A Big Problem For Data ScientistsThe characteristics of the most successful data science projects I've evaluated or been a part of are: (a) a laser focus on solving the scientific problem, (b) careful and thoughtful consideration of whether the data is the right data and whether there are any lurking confounders or biases and (c) relatively simple statistical models applied and interpreted skeptically. It turns out doing those three things is actually surprisingly hard and very, very time consuming...

Advice To Graduate Students Interviewing For Industry PositionsA couple of weeks ago I saw a post in a LinkedIn group which went something like this: "I've just received a Ph.D. in physics and I know python and R. I've been applying for data scientist roles. However, I'm not getting much traction. Do you think that I need to learn a BI tool such as Tableau?"...I want to take the opportunity to give a few pieces of advice from a hiring manager's perspective...

Data Science Articles & Videos

PageRank Algorithm Reveals World's All-Time Top Soccer TeamGoogle’s PageRank algorithm...was originally designed to rank websites according to their importance by assuming that a site is important if it is linked to by other important sites...Verica Lazova and Lasko Basnarkov at Cyril and Methodius University in Macedonia have found another use for the Pagerank algorithm. These guys have used it to create an all-time ranking of the world’s national football teams using results from the 20 World Cup tournaments that have taken place since 1930...

Are Data Scientists Earning Their Salaries?
There have been murmurings that we are now in the “trough of disillusionment” of big data, the hype around it having surpassed the reality of what it can deliver. Gartner suggested that the “gravitational pull of big data is now so strong that even people who haven’t a clue as to what it’s all about report that they’re running big data projects.”...Can data scientists actually justify earning their salaries when brands seem to be struggling to realize the promise of big data?...

Life Lessons From Machine LearningThe accomplishments of Machine Learning...are certainly very technological in nature. But in truth, Machine Learning is equal parts Art and Philosophy, incorporating deep Epistemological insights in order to better make sense of the world...

Classifying Plankton With Deep Neural NetworksThe National Data Science Bowl, a data science competition where the goal was to classify images of plankton, has just ended. I participated with six other members of my research lab, the Reservoir lab of prof. Joni Dambre at Ghent University in Belgium. Our team finished 1st! In this post, we’ll explain our approach...

Computer-based Personality Judgments Are More Accurate Than Those Made By Humans Using several criteria, we show that computers’ judgments of people’s personalities based on their digital footprints are more accurate and valid than judgments made by their close others or acquaintances (friends, family, spouse, colleagues, etc.). Our findings highlight that people’s personalities can be predicted automatically and without involving human social-cognitive skills...

Machine Learning For Brain ImagingIn this talk, I would like to showcase a few examples of machine learning problems that arise when using brain imaging to understand brain function and its pathologies...

The Future of Machine Learning From The Inside Out
The second part of our conversation with with Geoffrey Hinton (Google and University of Toronto), Yoshua Bengio (University of Montreal) and Yann LeCun (Facebook and NYU). They talk with us about this history (and future) of research on neural nets...

Text Understanding From Scratch (PDF)
This article demontrates that we can apply deep learning to text understanding from characterlevel inputs all the way up to abstract text concepts, using temporal convolutional networks... We apply ConvNets to various large-scale datasets, including ontology classification, sentiment analysis, and text categorization. We show that temporal ConvNets can achieve astonishing performance without the knowledge of words, phrases, sentences and any other syntactic or semantic structures with regards to a human language. Evidence shows that our models can work for both English and Chinese...

Markov Models And Predictive Analytics With Cats
One of my favorite lectures focuses on the use of Markov Models for predictive analytics...the lecture can be used to demonstrate advanced concepts (like Bayesian inference and probabilistic reasoning) as well as basic concepts (like conditional probability and statistical dependence)...I start the lecture by telling the students that I will show them how to predict the future with a cat...

Jobs

Build the overall vision, strategy, and operation of the data and business intelligence practice. Understanding data is critical to the growth and success of our business, whether it's accurately forecasting demand for suppliers, partners, and farms, analyzing customer trends and behavior, or identifying opportunities for efficiency at our fulfillment centers. We're seeking someone who will become an integral part of the business and play a key role in building, managing and mentoring our growing data engineering and analytics team...

Training & Resources

Reflections On JuliaJulia is a new language that could become the goto choice for scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing...Is it Ready for Production? Yes! We run Julia against massive volumes of data and process tens of thousands of transactions per second...

The Grammar Of Data SciencePython and R are popular programming languages used by data scientists...In this post, I will elaborate on my experience switching teams by comparing and contrasting R and Python solutions to some simple data exploration exercises...

Books

NEW RELEASE: Practical advice for each major area of development with Python...

"Effective Python is a time-efficient way to learn – or remind yourself – what the best practices are and why we use them. It’s a concise book of practical techniques to write maintainable, performant and robust code using practices widely accepted in the community..."