16 Options To Get Started and Make Progress in Machine Learning and Data Science

You might want a job or the opportunity to get a job in machine learning or data science. Alternatively, you might be a student or in a data role and looking to accelerate your learning in the area.

If you think your only options are to get a PhD or to read an academic textbook, think again. This post is for you.

You have a lot of options when it comes to training and educational material. So many that you should take your time, make a short list of some options and even try a few before settling in.

In this post you will discover the vast number of options available to you and have enough information to choose a direction (or two) and take that next step in your journey.

Where You Fit, Getting Ready

You are looking for educational training in machine learning or data science.

You may have specific subject areas you want to get better at or know more about. What are they? Right them down.

You may have a preference for a specific learning style, like in-person, audio, video, textual tutorials books. What are your preferences? Write them down.

You have specific reason you want to learn machine learning or data science. It may be as described above, a desire or opportunity for a job, desire to learn more or faster for an existing role, or for general interest and opportunity. What is your one reason? Write it down.

Note all three points in a comment if you like. You’re not alone.

Short-List of Machine Learning Training Options

Let’s not dance around, here is a short list of your options to get started and make progress in machine learning.

University Degree

PhD Degree (research)

Masters Degree (by research)

Masters Degree (by coursework)

Undergraduate Degree

In-Person Course

Training Courses and Workshops

Bootcamp

Online Course

MOOC

Paid Courses

Self-Study University Course Material

Books

Academic (textbooks)

Professional (O’Reilly)

Practical Books (Packt)

Free Content Online

Academic (papers, blogs)

Industrial (blogs, youtube, communities)

Tools and Libraries

Competitions

You can see that the top of the list is heavy in supervised and structured academic options and that bottom of the list is focused on less-structured self-study options. Another axis that you could consider options that is less stratified is that of academic versus industrial focus in the materials.

I thought about these axes for a while, I think they are a useful aid. I assigned scores to each option along these axis of Supervised to Unsupervised (self-study) and Academic to Industrial and created a little scatter plot. It is not a perfect breakdown, material can be self-study, unsupervised but still highly structured. A PhD is highly academic, but generally a lot less supervised than most other degrees (at least under the Australian/British system that I studied under). The supervised/unsupervised dichotomy does not capture enough, but it’s a starting point.

Let me know in the comments if this helped.

Options Available to You In Machine Learning

We’ll spend the rest of the post diving into each of these in turn, what they offer, who they’re suited to and specific examples you can follow-up with.

University Degree

A university degree provides a highly structured, mostly academic, mostly theoretical introduction to a topic. You probably know what a degree is.

Undergraduate degrees and some masters programs are by course work, and U.S. PhD’s also have a course work element. Some honors, masters and PhD programs have a research component, more so as you step deeper into the system.

Great coursework subjects are highly structured, designed by an expert in the field to give you the best introduction in the subject matter. Great research programs give you an apprenticeship into the scientific method and research methods.

Degrees are also expensive, take a long time, designed for the average student and teach older even out-dated information.

A degree can be the right move if you have a lot of time, money and don’t want to design your own study programs at all.

PhD Degree (research): Join a research lab and study a subject that fits into their over arching program of study. Your work will be highly academic and specialized and you will be measured by your formal work product in the form of papers. For example, take a look at the PhD programs in Machine Learning at CMU .

Masters Degree (by research): Like a smaller PhD program, but you are encouraged to bite off a smaller piece, such as reproduce existing results.

The degree is the starting point, not the end point. It’s a slow burn on a subject that gets you to a place where you are ready to begin practicing. It’s also the one time when you have the time to go deep into a subject with little other responsibilities.

Some people that ask for advice believe (deeply) that they cannot get into machine learning without going back to university for a handful of years and study a formal degree.

You do not need a degree to learn and practice machine learning. In fact, you don’t need a degree if you want to explore research in machine learning.

In-Person Courses

There are options that are not nearly as long and expensive as degree programs, but offer highly structured in-person training, and they are highly industry focused rather than academic.

Options include short training courses and bootcamps.

Training Courses and Workshops

You can take a short training course on a specific machine learning topic. The course will be highly targeted on a specific technique or a specific tool.

IT training companies have been around forever and have stared offering training around specific data science and machine learning topics.

There are also new companies that only target this type of training. For example, Persontyle is an example company that offers a vast array of short (1, 2, 3 and 5-day) courses on specific courses like Hadoop for Data Scientists and Introduction to Data Science Using R.

Finally, universities may offer short training courses for industry, local meet-up groups often offer training, and academic conference often have workshops in modern methods deigned for industry and graduate students.

Bootcamps

A popular approach are data science and machine learning boot camps. These are 6-12 week programs that professionals attend in person and learn applied skills. Often there is a hiring day at the end of the program to match employers with course participants.

Zipfian Academy is a popular example that offers a 12-week full time program in dat science with modules, capstone project and hiring day. Prices are in the range of $16,000.

Online Courses

Education throughout the rest of your life will be rooted in self-study and mentorship.

There are a lot of self-study programs available and some like MOOCs are also highly structured. Most, like MOOC’s are spun out of university subject material and therefore are generally more academic focused.

Massive Open Online Course (MOOC)

This are still a very popular method for getting started in machine learning, given the success of the Stanford Machine Learning MOOC that launched Coursera.

Courses are often 10-12 weeks in length, requiring many hours per week. Many are free or offered at a small cost. The are less industry focused and more academic than bootcamps, but offer training that was only ever accessible within a university graduate program. They often include lecture videos, homework, assignments and a community form to discuss the material with fellow students.

Courses operate in batch, to ensure a cohort have support in the form of classmates on their path through the program.

Paid Courses

Some MOOC courses are paid (like the John Hopkins Data Science MOOC). There are also shorter length paid courses availabe. Some are spun out of university subjects (like these MIT courses) and workshops and others are completely standalone.

Linda also offers lots of short videos on machine learning and data science, for example checkout their playlist titled “Data Science Basics” if you have a linda account.

Variations on MOOCs

There are also variations on MOOCs.

For example, you can get free access to the course materials for undergraduate and graduate machine learning university courses and study the course yourself. Some have lecture videos available as well.

Books

There are many amazing books on machine learning and data science, but you are probably reading the wrong books. This can throw you off track and crush your motivation.

I like to break the books down into three categories: academic, professional, practical.

Academic Books

These are the textbooks used in graduate and undergraduate programs.

You do not want to read these books until you are ready. Until you have been practicing for a while, have some confidence with some algorithms and tools and are ready to dig deeper into why the algorithms work rather than how.

Springer books come to mind readily, but there are many other publishers like The MIT Press, Cambridge University Press and more.

Textbooks are academic and require a discipline to read, to take notes, to do the exercises to dive into the references. The work is all on you. Textbooks are best used as a reference on select topics when needed.

Professional Books

These are the books you read if you are a software engineer or practicing data scientist and are looking to add more structure to your work or improve in a specific area.

Free Online Content

There is a lot of free content. Some of it is amazing, and a lot of it is dross.

The content is generally unstructured, or structured within the content, but no cohesive grand plan that links the content together. No study plan. You must construct your own study plan.

You can use this content to learn what you want, when you want. Blog posts are typically too short to dive deeply into a topic, you often need to jump into a book or course to get depth.

I think of free content in two classes: academic material such as papers and professional materials such as blog posts and youtube.

Academic Materials

Academic materials include papers, articles, technical reports and theses. The onus is on you to extract what you are looking for, such as the details of an obscure algorithm or ideas on feature engineering for a specific data type.

Professional Materials

Professional materials are created by those learning or practicing machine learning. They may be students, programmers or data scientists. They may be creating materials to teach, to share or to better understand the material.

YouTube channels also fit into this and there are some excellent ones (and we’re not limited to university lectures)

The Mathematical Monk has a great channel on machine learning. You can get a lot out of recorded talks from industry conferences and meetups such as Pycon 2014 (search for machine learning related videos). Google tech talks are great (again, search for topics on machine learning). You can get a lot of industry news from O’Reilly Strata videos, such as those from the 2014 meeting.

Tools and Libraries

I separate out tools and libraries because they are important area of machine learning education. They are the means by which you do and practice.

There are books, blogs and videos on the tools, and if you’re lucky, there are tutorials and documentation.

An important area is to study both the landscape of tools and libraries available to you and go deep into specific examples.

Generally, this is a wholly on the industrial side rather than academic and wholly self-study. There are very few courses that teach you how to get the most out of tools and libraries.

Tools I often recommend learning a lot about, depending on where you are on your journey are:

There are suits of big data infrastructure to learn about as well as niche tools for specific domains and techniques.

There are a lot of tools and libraries available and a lot of room to go wide and deep.

Machine Learning Competitions

Out on the edge you have machine learning competitions.

These require a certain level of skill in a tool, data handling and algorithm usage before you can get started, and world class expertise to do well.

You are on your own in terms of guidance, but there is community and great opportunity for learning state-of-the art algorithms and practices in a competitive environment.

The skills are you learn are applicable in industry, but real-world problems do require more from you. This area of learning is not for everyone, but does offer a lot for those it does suit.

Competitions are often held in conjunction with academic conferences, and more often are now hosted by companies such as Kaggle and TunedIt. Recent popularity has meant more companies are opening up their data to competitions so that access to varied and interesting industrial datasets is now common place.

Summary

We have covered a lot of ground in this post and you have discovered that there are a lot more options available to you than you probably first think.

I want to see comments like “I need a degree” and “there are no good resources” go away. There have never been more options and more resources available to start and practice machine learning, both on the academic side and the industrial side, both in a highly structured and supervised environment and self-study.

What are you going to study? Leave a comment.

About Jason Brownlee

Jason Brownlee, PhD is a machine learning specialist who teaches developers how to get results with modern machine learning methods via hands-on tutorials.

23 Responses to 16 Options To Get Started and Make Progress in Machine Learning and Data Science

Data visualization is also important when it comes to communicating results of machine learning. That said, your chart above “Options Available to You In Machine Learning” is very difficult to read with tiny black letters and white shadowing over blue background.

Awesome list! For the time being, I am going to continue reading your articles before making any big decisions about what I spend my time on. However, I feel like I’m leaning towards MOOCs (Andrew Ng’s on Coursera and the CalTech one) and some of the non-academic books (Programming Collective Intelligence and Applied Predictive Modeling), hopefully finding a project to work on in the meantime. I don’t want to self-limit by waiting until I’ve finished books/MOOCs to start a project.

Andrew Ng’s Coursera course should have everything you want. My favorite class assignment/project was learning to recognize 5000 handwritten numbers. Another cool one was learning to classify emails into spam/not spam. All the software is free and runs on any PC or Mac. Goes all the way on how to build Neural networks from the ground up. Lecture notes have been transcribed for download, so no need to buy a book.

Hi Jason,
Thanks a lot for making this post! This has made me consider a lot of things that I wasn’t taking inventory of. My only remaining question is: do you think it’s possible that firms will hire people with training in machine learning WITHOUT a degree? From what I hear/read, job listing criteria always list that they want PhD / Master’s degree holders.

That said, I know a person who’s got the training is gonna be just as skilled no matter where he got it from, or how much he paid for it, so I think the current system is ridiculous.

But would you happen to know if circumventing degree requirements is doable?

Yes, organizations/people want results. They want value. A candidate only needs to demonstrate they can provide value.

A degree and higher degrees are short cut used to help in the hiring process, and some large organizations will be too inflexible in their process to consider someone without one. That’s their problem, not yours.

If you’re a developer, I’m sure you have been around a large number of developers who are killing it and who did not come through a CS degree program.

The same for ML skill – focus on delivering value and demonstrating this value to decision makers – to people who need help in these areas.

Thank you very much for providing this comprehensive and concise summary. It’s been very helpful for me. My question for you is – what is your (or the general community) opinion of the Master in Computer Science in Data Science from University of Illinois at Urbana-Champaign, offered via Coursera? Seems to be a great alternative for those that want to obtain an accredited masters, yet have a full time job.

Thanks for a very useful post! I am debating between the Coursera 10 course Data Science specialisation or Udacity’s nanodegree in data analysis. I am unable to decide between the two, except that Coursera’s seems to be much more elaborate.