New Year Resolutions for a Data Scientist

Introduction

New Year is not just replacing your table calendar with a new one or waking up next morning rubbing your eyes. It’s celebrating the joy of a new beginning. It gives a perfect reason to inculcate new habits. It is the arrival of new ‘Hope’.

If you are reading this, I’m sure data science excites you! You want 2016 to be a game changing year for you. Don’t you? You can make it possible, if you commit to these resolutions today. You must understand that becoming a data scientist is a process, not an event. It’s not an overnight success. Hence, you must patiently work towards your goal.

I’ve shared a list of resolutions every data scientist should make depending on where he / she is in their journey. This is of course a generic list, you should adopt it for your needs. I am also provided a checklist below which can be downloaded to track these goals.

Note: These are generic resolutions meant for an aspiring / experienced data scientist. This article might not be useful for people from domain other than analytics.

New Year Resolutions for a Data Scientist

I’ve categorized these resolutions according to three levels in the life of a data scientist. You decide which suits you the best and work accordingly. You can move to the next level, once you have satisfactorily completed your level. I’ve also listed the best course available on the topic. For optimum benefit, I’d suggest you to take these courses one by one. If you still find them hard, discuss with me, I may have an alternative for you. For your convenience, I’ve also shared a checklist below which can be downloaded.

Beginner Level

Who’s a Beginner? – If you are completely new to analytics and data science. You have no idea how this industry operates. And yet, curious to pursue your career in this field, you are a beginner. These should be your resolutions:

1. Start with a programming language. Either R or Python

I’ve seen students trying hands on both R and Python. Eventually, they end up with nothing. This is a deadly approach. You must promise yourself to learn R or Python in depth. Both are open source tools hence widely used in companies too. Python is widely recognized as the easiest programming language. R still remains the favorite statistical tool. Choice is upto yours. Both are equally good.

2. Learn Statistics & Mathematics

Statistics is all about assumptions and progressions. But, you can’t progress in this industry without statistics and mathematics. It lies in the heart of a data scientist. If you are weak at mathematics, it’s time change that equation. Get comfortable with powerful statistical techniques, algebra and probability, The are many awesome courses available on statistics by Khan Academy, Udacity etc. You can get started right now if you install these apps.

3. Enroll in one MOOC at a time (Most Difficult)

Massive Online Open Courses a.k.a MOOCs are free to access and study. But, this one is the most difficult promise you can make to yourself. Student often tend to enroll and study multiple courses at a time and complete none. Hence, you must focus on one course and finish it before proceeding to the next. You can check coursera, edX, Udacity to undertake any course.

4. Engage, Discover and Socialize in Industry

You need to know what’s happening in the industry. We live in a dynamic world. Things change overnight. May be a technology prevalent today might become obsolete tomorrow. You must talk with experienced professionals, industry experts and meet your future self. So start participating in discussions, meet-ups, follow blogs, join groups and read books. To check all these, you can follow us on Facebook for latest updates on this part.

Intermediate Level

Who’s an intermediate level of data scientist ? – If you have finished the previous level, and you’ve experimented with basics of machine learning, you have gained knowledge to build predictive models, then you possess an intermediate level. Completing this level need huge determination and hours of practice. Are you ready for this challenge ?

1. Understand and Build your Machine Learning Skills

Machine Learning is the future of data science and technology. All the major companies have heavily invested in hiring candidates with this skill. No doubt, it’s in huge demand these days. And, this is a chance for you to get the best out of this situation. This year, you should dig deeper in machine learning. Master Regression, Clustering, CART in depth. Here you’ll find free resources on machine learning.

2. Focus on Ensemble and Boosting Algorithms

Once you feel confident about machine learning, get to the next models. Using boosting and ensemble, you can could achieve model accuracy much higher than other algorithms. This topic would be covered in the free resources shared above. But, promise yourself to conquer this topic with great understanding.

3. Explore Spark, NoSQL and other Big Data Tools

This year, you can start your journey in big data. Considering the fact that demand of big data professionals is surging, you must learn Spark. It has recently gained popularity. The future of big data lies in Spark. It is widely used tool to handle and manipulate big data. Along with spark, you can extend your expertise to NoSQL , Hadoop as well.

4. Educate Community Members

What could be better than sharing knowledge! This year, you should start sharing your knowledge with people who are struggling to learn data science. You can join active data science forums, answer their doubts and educate them with useful tips and hacks. You could also lead meet-ups happening in nearest circles.

5. Participate in Data Science Competitions

Time to test your knowledge. This year, you must participate in competitions. It would introduce you to your weak and strong areas. Moreover, you’ll become confident of the knowledge you’ve acquired. I’d want you to rank in Top 500 data scientist on Kaggle. For now, you should aim to become the Last Man Standing.

Addition: Competitions can be bit difficult at times. You can also check out these practice problems to check your skills and knowledge. They aren’t difficult but surely FUN!

Advanced Level

I don’t need to define the people falling in this category. These people know of data science what most people are afraid to even try! They’ve reached a level where life is cozy and easy going. But still, they love challenges. They are experienced professionals. Here are some resolutions:

1.Build a Deep Learning Model

This year, you have to set an example for the people aspiring to become a data scientist. You must promise yourself to try build model on deep learning this year. People around the world and already using it for making predictions. It’s an advanced level of machine learning. The accuracy is obviously better than normal machine learning models.

2. Give Back to Community

I believe knowledge is meant to be shared not stored. The more you share, the more you’ll learn. It’s being said, ‘if you learn a new concept, explain it to 2 friends of yours. You are more likely to remember that concept for long’. This year you must take a resolution of helping people in analytics community with your knowledge and experience. This will allow many struggling people to find a shore in this domain.

3. Explore Reinforcement Learning

Reinforcement Learning is the most powerful yet less-discovered aspect of machine learning. This year, promise yourself to research in this field. It will surely be challenging but worth trying. Self driving cars, spy drones are results of reinforcement learning. Once you start with this, you’ll automatically get into artificial intelligence.

4. Rank in Top 50 on Kaggle

This year, you must promise yourself to uphold ‘master’ status on Kaggle. Precisely, secure a rank in top 50 data scientist on Kaggle. Participate in competitions which suits best to your knowledge. Team up with other kagglers. At this level of competition, you’ll end up learning concepts which you wouldn’t have learnt otherwise.

End Notes

I understand, these resolutions can be challenging for you, but still worth trying. You are free to take up a resolution according to your current situation. I’ve simply enlisted the most important ones which an aspiring data scientist must take up.

Last week, I realized that people aren’t confident enough in deciding a new year resolution. This was a concern for me. Hence, this lead me write this article. I hope, before 2016 ends, you would finish beginner level (assuming you are a fresher).

This article would have cleared your confusions on making new year resolutions. As an aspiring data scientist, I’ve already put a lot of things on your plate to eat. Chew one by one and proceed. If you find difficulty in successful completion of your resolutions, feel free to share your thoughts with me in the comments section below.

I wish you all the best for 2016.
You are not alone this year. My plate is also full.
You can connect with me at Analytics Vidhya Discuss. Simply tag my name there to connect with me, if you face any difficulties in learning.

Thank you so much for writing this article. I have subscribed for this analytics learning 2 months back, and have read a lot of articles that you have written, but to be honest till date I have just seen the videos that you have shared once to see to get started. I have not done much progress on it. But this article is really superb. It gives a step by step procedure to attempt and move further.

I just needed one help from you, in my journey further in studying and understanding the concepts if I have any concerns can I mail you directly or text you via fb. Please let me know what you think. If yes, please do share your email id or facebook link.

Thanks a lot for whatever you are doing to guide us and have a great year ahead.

Hi Sangeeta
I am glad you found it helpful.Facebook is a bad idea actually. I am not much active there.
If you face any sort of difficulty in learning, you can discuss with me and fellow data scientists at Analytics Vidhya Discuss.

Recently i have started to get interested in the field of Data Science. I have registered for a Phd programme in the above topic, but was clueless as to where to start. This site is great for any kind of learners. I found your blog particularly motivating.. Thank you for giving an idea of where to start and how to go about it. Looking forward to further discussions with you in my journey of data science that is about to start..

Good to know. I wish you all the best for 2016. Remember, there are millions of people trying to become a data scientist. You’d find difficult concepts. Whenever you feel like giving up, Discuss with us. Analytics Vidhya will help you to become successful.

Hi Manish, the content was very useful and gives me a courage to proceed further in this industry. As I am beginner just wanted to know why SAS is not mentioned in your whole article. Is SAS being outdated or future pursuing this is not worth. Please do share your insight. Thank you.

Hi Bilwa
I saw it coming. SAS is not an open source language. And, companies prefer candidates having knowledge of atleast one open source tool. Hence, I suggested R or Python. You don’t need any subscription to master these languages but determination.

It’s really great to know about analyticsvidhya.com. I was looking for it from a long time.
I just started learning Machine Learning, I am currently following Andrew Ng’s machine learning on couresra.com.
[Sorry I haven’t came across YouTube channel before].

Can we go with Scala for Machine Learning Programming? What are drawbacks of Scala when compared to Python? Sorry my question may look silly or meaningless.

Ideally, one can learn R and python both. Scala can be a good choice too. But, I feel Python is sufficient for data science aspirant. It has well built libraries for machine learning and data manipulation, which are faster to use (I haven’t worked on Scala, so can’t say). Moreover, you can learn python faster than scala because of 2 reasons:
1. User friendly coding interface
2. Availability of enormous amount of python tutorials
3. Active community support

You’d be surprised to know that Python is growing at a pace faster than java. Companies are embracing Python at a larger scale. Due to which, candidates with python skills are in huge demand today. Even the master data scientist use python for their work.

Thank you for the nice post. This checklist helps a lot to build data science skills. It am currently in intermediate level, going through machine learning techniques. A lot to learn in 2016.
very nice work. Wish you a very happy new year!

Thanks very much for sharing your thoughts. It’s really an excellent way to start with new year resolutions. It would have been nice if you would have added something for the visualizations skills required (like d3.js or tableau, etc) as well in this list. That would really help, as I am lacking a lot on the visualization skills required by a data-scientist when it comes to presenting the results to the audience. 🙁

Initially, I planned to add qlikview and tableau in the resolutions above. But, while drafting this article, I decided to focus on building predictive modeling skills. Because, once someone becomes comfortable on data manipulation & predictive modeling, visualization would be a cake walk.
However, considering the sheer importance of visualization, one can also devote time to qlikview or tableau to acquire visualization skills. I’d suggest you to learn tableau or qlikview, instead of d3.js. Here is the learning path of qlikview and tableau.

Thank you for all your thoughtful remarks and links. I think you are providing a great service to the readers.

Some thoughts I had are as follows:

I think there is some attraction to “data scientist” and even those with no idea of what “science” is are keen to become under the umbrella “data scientist”. Your second point 2. Learn Statistics & Mathematics is fundamental. The issue I see is that if you have someone who is not a person with a technical degree from at least the BS level, and you tell them “Lean Mathematics” you might equally tell them “go to college”. From high school you need some calculus, then 4 years of math including statistics and probability, then you can, after leaning some specialized techniques (like R, Python, Spark, etc.) you could rightfully call yourself a data scientist. But taking a few Coursera math courses won’t do. Likewise, if you have not worked with data for some time, learning R will not make you a data scientist. Finally, you need some programming background to “Learn R”. In the beginning courses I’ve seen, if you have no technical background, no math background, and no programming background, you will be lost at step two of most R courses.

Regarding your point “3. Enroll in one MOOC at a time (Most Difficult)” I cannot agree more. However, I would go further–unfortunately many Coursera offerings have evolved from learning for the masses to “quickly substitute for a 9 week or 18 week college course”. As such they move very fast, and are putting more and more into “homework”. They then compound the issue by putting strict time boundaries on the homework etc.–i.e. you must complete the homework for module X by date Y plus 2 weeks. Otherwise, you get a 0. For working professionals this puts Coursera back in the same bin as University–you don’t have the bandwidth for brick and mortar University, and you likewise won’t have bandwidth to keep within 2 weeks of regular, 19-year old college students. So I would consider looking at many other non-MOOC courses and tutorials and be diligent in completing them–at your pace.

I feel the majority of your intermediate and advance resolutions are spot on. For me, I hope to get to advanced in the next 1-2 years to, for the nth time, re-invent my career!

You have raised a valid point on mathematics. If someone is from non-mathematical background, his/her path to become a data scientist can be difficult, but not impossible. I agree, (s)He might need opt for a different learning path, just to get their numbers right. Khan Academy has done a great job in teaching elementary mathematics. Anyone can check out those tutorials.
However, I beg to differ on programming part. If someone has logical mind and structured thinking, R wouldn’t be that much difficult to learn. Since, I am from non-programming background, and I might have took more time than others but I feel comfortable doing programming on R today.
In the end, it’s all about one’s determination and commitment. If someone is determined to achieve his / her got, they would stop at nothing. And, there are multiple ways of performing a task. Keep exploring.

Hi Manish–Thank you for your kind remarks and good thoughts. I agree with your response on programming; perhaps I should say if someone is not computer literate, then it will be very tough to jump into R S, etc. My comments stem from the reality here in the US that many people, nearly retirement age (like me) are finding themselves out of work and no demand for their general skills. The “hot” areas are web programming, Java, html, and now data science–R, Python, etc. But many of these folks have had a career where they did little using computers other than to use basic apps and do some office things. A friend was a superior graphic artist. But the need for that was thin, and he had to re-educate and learn some web programming skills. It was a 12-week intensive effort of 8 hours every night (still trying to work). It was very hard. I agree, logical thinking is the main life skill needed, though.

I wish you many blessings in the year to come, as I myself look forward to new opportunities!

Thanks a lot for taking the pain, doing your research and coming up with such a wonderful new year resolution for everyone and to top is all you have put it in a step-by-step approach which will serve as a source of motivation once we complete a particular step and give us a big confidence boost in this journey of data science.

Thank You very much for your valueable Time and Update !
Sincere Thanks for ” ” In the end, it’s all about one’s determination and commitment. If someone is determined to achieve his / her got, they would stop at nothing “”.. I do agree with You.

I am a complete beginner and first want to learn NoSQL (before R and Python).

Thank you for this valuable information. A few years ago I went back to grad school thinking that a degree in analytics would help me become a data scientist. While it helped lay the foundation, it in no way has helped me further my career. I am going to take your advice, beginning at the start but I must ask, how does one break into the field? What types of opportunities should I look for? I already work in IT as a System Engineer and am just struggling in finding the right course of action to take.

Howdy! I could have sworn I’ve visited this web site before but after looking at many
of the posts I realized it’s neew to me. Anyhow, I’m certainly happy I
stumbled upon it and I’ll be book-marking it and checking back frequently!