Skills you need to become a data scientist.

My goal is to create a plan where you get to the level of average industry practitioner

Skills you need: Ability to take Excel/CSV data sets, pre-process and visualize; Build a model and Visualize the results.

Recommended steps:

1. Download one data set from Kaggle/UCI or anywhere from the Internet. I am deliberately not giving a link as I want you to search through multiple sets. Create a deck of slides describing the business problem, ROI, current practices, their weakness etc.

Mile stone 1: Creating a business context for a problem is a crucial step in becoming a practitioner. Congrats, you have done that! You should spend a week for this provided you put in 20 hours a week.

2. Look at the attributes given. Brain storm whether you can create more attributes from them. If transactions are given, you can create average number of transaction per day, average value of transactions etc. Think and create as many new attributes as you can. 2. Download R, Deducer (my preference). They both are open source. 3. From the resources provided by others, learn the techniques and intuition behind standard data pre-processing (I mean ways in which you fill missing values, bin neumeric variables and merge categorical variables, scale data, dimensionality reduction etc.). 4. Use Excel/Deducer and create new data and pre-process the data.

Mile stone 2: Creating one big structured table where independent attributes are columns and records are rows is a huge step in solving. You should be able to do this with 4 weeks of work. Don’t forget to add a few slides in your ppt on data pre-processing

5. Learn descriptive statistics, histogram, box plot, scatter plot and bar chart. Learn to plot these in deducer/ggplot. 6. Do detailed descriptive statistics and visualizations on the data. There are excellent resources on this all over the net. I created a few videos myselg (http://beyond.insofe.edu.in/cate…)

Mile stone 3: Visualizing is considered most important interfacing step. and you are done with it. Add these to your slide deck. Allocate two weeks for this.

6. Learn linear, logistic regression and clustering from any of the resources given in these threads. 7. Apply then on your data sets and do all diagnostics. Deducer makes it easy to do this.

Mile stone 4: Congrats! You built your predictive models. I think, you need 3 weeks for this step.

8. Brain storm and think about how you can simplify and present these results. Goal is to present to a non-data scientist. Use your visualization skills again. Add these slides to your deck.

Milestone 5: Take a week or two for this.

You have created a slide deck, some code and knowledge base. Nore importantly, you solved a problem end-to-end. Viola, in approximately 12 weeks you are where 90% of data scientists are

Now, to get to a higher level

Add more algorithms (decision trees, neural nets etc.). Learn more domains and problems. Study techniques to solve unstructured data. There are wonderful courses in the thread. Take them slowly.