As I’ve noted elsewhere, academic credentials are important but not necessary for high-quality data science. The core aptitudes – curiosity, intellectual agility, statistical fluency, research stamina, scientific rigor, skeptical nature – that distinguish the best data scientists are widely distributed throughout the population.

You know the concept of data analytics has hit the big time when people start analyzing porn. Deep Inside, a fascinating visualization project gave me a laugh, but its findings were also pretty revealing about the culture and science of sex.

These findings made me laugh and think at the same time, which is not often:

Even though 32.7% of female porn stars have blonde hair, only 5% of Americans are naturally blonde.

A whopping 70.5% of female actresses are Caucasian. What does this say about the audience?

Probably the only thing that my $625 Data Mining I course through UCSD Extension was good for was the Discussion Board where fellow classmates offered their piece of mind about the class and valuable tips. One great lead was these poll results by KD Nuggets about the most used software tools in the world of data mining and big data.

This survey backs up James Kobielus’s claim in his blog that “open-source communities are where much of the fresh action in data science is happening”, as many of the tools preferred by those in the survey are indeed open-source. That’s great news because I don’t have that much money.

Data Science 101, by Ryan Swanstrom, a great resource for budding data scientists like me, recently posted a must-read list of all the concepts a data scientist should know. Here’s the list he came up with:

linear algebra

basic statistics

linear and logistic regression

data mining

predictive modeling

cluster analysis

association rules

market basket analysis

decision trees

time-series analysis

forecasting

machine learning

Bayesian and Monte Carlo Statistics

matrix operations

sampling

text analytics

summarization

classification

primary components analysis

experimental design

unsupervised learning

constrained optimization

Although I’m familiar with some of these, from my introductory statistics and data mining courses through UCSD Extension, there’s still a lot to be learned – and mastered.