This was inspired by a bright high school student that emailed me for advice about his interest in deep learning.

Q: Hello Dr. Thomas! I’ve been trying to find good resources for deep learning, but the field does seem rather cryptic and a bit technically prohibitive for me at this point. If you wouldn’t mind, I had a couple of questions I’d love to ask you about learning deep learning:

Is there a single book or a set that you’ve found that explains deep learning well? I’ve looked at ones like deeplearning.net or MIT’s free book, but all resources I’ve found are either too brief an introduction or wonderfully mathematically engaged but not applicable at all

Do you think it’s a good idea for me to frontload mathematical rigor at this point, or should I wait until I’m further down the path to try to get the technical details down?

When you take on a data science problem, how do you answer the classic “what to try next” question? For instance, sometimes on Kaggle problems, I’ll hit a wall where I don’t know what the best next move is.

A:
Your assessment that most deep learning resources are either too brief or too mathematical is spot-on! My partner Jeremy Howard and I feel the same way, and we are working to create more practical resources. We will soon be producing a MOOC based on the in-person course we taught this autumn in collaboration with the Data Institute at USF. Until then, here are my recommendations:

In my opinion, the best existing resource is the Stanford CNN course. I recommend working through all the assignments.

Below are some of my favorite tutorials, blog posts, and videos for those getting started with Deep Learning:

Embeddings

As for your question about whether to front-load mathematical rigor, I think it’s good to focus on practical coding, since that way you can experiment and develop a good intuition and understanding of what you’re doing. Math is best learned on an as-needed basis - if you can’t understand something you’re trying to learn because math concepts are popping up you’re not familiar with, jump over to Khan Academy or to the absolutely beautiful 3 Blue 1 Brown Essence of Linear Algebra videos (great for visual thinkers) and get to work! Jeremy’s RNN tutorial above is nice example of a code-oriented approach to deep learning, although I know this can be hard given the existing resources.

It’s great that you’re doing Kaggle competitions. That is a fantastic way to learn–and to see if you understand the theory that you’re reading about. I’d have to know more about what you’re trying to know what to suggest next.

I sometimes receive emails asking for guidance related to data science, and I’m going to start sharing my answers here as a data science advice column. If you have a data science related quandary, email me at rachel@fast.ai. Note that questions may be edited for clarity or brevity.

Q: Hello Rachel, I’m VP of Engineering at a start-up that is increasingly seeing our data & ML algorithms as our core asset. In thinking about the next few technical hires we want to make, we want to target engineers that will be able to accelerate the efforts of our Data Science team, so I’m trying to do some pre-recruiting research to understand how engineering teams focused on support of production ML pipelines are structured. Some of what I’m wondering about:

How are the Data Science & Engineering teams structured? e.g. Is there a notion of a “Data Engineering” team that is paired with the Data Science team? Or are Data Scientists & Engineers just integrated together in “vertical” product teams?

How do the Data Science & Engineering teams interact? How is the roadmap for Data Science coordinated with the roadmap for Data Engineering?

How are responsibilities split between Data Scientists vs. Engineers? Is there a notion of a hybrid role, and what does it look like if so?

A: This answer is based on my experience as a data scientist, my experience interviewing for data science roles, and talking with a number of data scientists. I’ve watched employers go through multiple data science re-orgs.

There are a lot of potential pitfalls related to data science and org structure (no matter what you choose). I’m going to take the liberty of expanding your question to cover the relationship between data science and other teams, as well as data engineering. Consider these scenarios:

The data science team interviews a candidate with impressive math modeling and engineering skills. Once hired, the candidate is embedded in a vertical product team that needs simple business analytics. The data scientist is bored and not utilizing their skills.

The data science team is separate (not embedded within other teams). They build really cool stuff that never gets used. There’s no buy-in from the rest of the org for what they’re working on, and some of the data scientists don’t have a good sense of what can realistically be put into production.

There is a backlog with data scientists producing models much faster than there is engineering support to put them in production.

The data infrastructure engineers are separate from the data scientists. The pipelines don’t have the data the data scientists are asking for now, and the data scientists are under-utilizing the data sources the infrastructure engineers have collected.

The company has definitely decided on feature/product X. They need a data scientist to gather some data that supports this decision. The data scientist feels like the PM is ignoring data that contradicts the decision; the PM feels that the data scientist is ignoring other business logic.

Recommendations

Having data scientists all on a separate team makes it nearly impossible for their work to be appropriately integrated with the rest of the company. Have your data scientists distributed throughout the company, but also have a team doing data science evangelism within the company. Vertical product teams need to know what is possible and how to best utilize data science. It’s too hard for a lone data scientist to advocate for the role of data-driven decisions within the team they’re embedded in.

Data scientists should report to both a data science manager and a manager within the product team. You need a lot of communication: to make sure that the team is getting the most value and that the data scientist is finding fulfilling work. One approach that can work really well is to have half the data scientists switch to a different group each year (or even more often).

While it’s common to have machine learning, engineering, and data/pipeline/infrastructure engineering all as separate roles, try to avoid this as much as possible. This leads to a lot of duplicate or unused work, particularly when these roles are on separate teams. You want people who have some of all these skills: can build the pipelines for the data they need, create models with that data, and put those models in production. You’re not going to be able to hire many people who can do all of this. So you’ll need to provide them with training. In general, the most underused resource of most companies is their own employees, and the situation is even worse with data scientists (since “data science” encompasses such a wide variety of possible skills). Tech companies waste their employees’ potential by not offering enough opportunities for on-the-job learning, training, and mentoring. Your people are smart and eager to learn. Be prepared to offer training, pair-programming, or seminars to help your data scientists fill in skills gaps. I always tell students who are interested in both data science and engineering, that the more you know about software development, the better a data scientist you will be.

Even when you have people who are both data scientists and engineers (that is, they can create machine learning models and put those models into production), you still need to have them embedded in other teams and not cordoned off together. Otherwise, there won’t be enough institutional understanding and buy-in of what they’re doing, and their work won’t be as integrated as it needs to be with other systems.

The term data scientist refers to at least 5 distinct jobs, so communication and clarity is key. Companies need to be clear on what their needs are and what they’re hiring for. I can tell you from firsthand experience as a job applicant that lots of companies want to hire a data scientist but aren’t sure why, or how they will use data science. You want to hire someone who is interested in the role they’d be working in. You probably won’t find a candidate that’s both interested in writing machine learning implementations in C and extensively using Google Analytics, although that is a real job description I’ve encountered. Note I say “interested in” and not “already has the skills”. Assume that any applicant will be learning lots of new skills on the job (if not, they will soon grow bored).

Further reading: After drafting this post, I came across an excellent article called The Data Science Delusion which details several additional problems companies may encounter when incorporating data science into their org.

At the Financial Times-Nikkei conference on The Future of AI, Robots, and Us a few weeks ago, Andreessen Horowitz partner Chris Dixon spoke just before Jeremy Howard and I were on stage. Dixon said many totally reasonable things in his talk–but it’s no fun to comment on them, so I’m going to focus on something rather unreasonable that he said, which was: “A few years ago it was called big data and then it was machine learning and now it’s called deep learning”. It was not entirely clear if he was saying that these are all terms for the same thing (they are most definitely not!) or if he was suggesting that the “in” data-driven approach changes from year to year. Either way, this obscures what a complete game-changer deep learning is. It is not just the 2016 version of “big data” (which has always been an empty buzzword). It is going to have an impact the size of the impact of the internet, or as Andrew Ng suggests, the impact of electricity. It is going to affect every industry, and leaders of every type of organization are going to be wishing that they had looked into it sooner.

First, to clear up some terms:

Big Data: This was an empty marketing term that falsely convinced many people that the size of your data is what matters. It also cost companies huge sums of money on Hadoop clusters they didn’t actually need. Vendors of these clusters did everything they could to maintain momentum on this nonsense because when CEOs believe it’s the size of your hardware that counts, it’s a very profitable situation if you make, sell, install, or service that hardware…

Machine Learning: Machine learning is the science of getting computers to act without being explicitly programmed. For instance, instead of coding rules and strategies of chess into a computer, the computer can watch a number of chess games and learn by example. Machine learning encompasses a wide variety of algorithms.

Deep Learning: Deep learning refers to many-layered neural networks, one specific class of machine learning algorithms. Deep learning is achieving unprecedented state of the art results, by an order of magnitude, in nearly all fields to which it’s been applied so far, including image recognition, voice recognition, and language translation. I personally think deep learning is an unfortunate name, but that’s no reason to dismiss it. If you studied neural networks in the 80s and are wondering what has changed since then, the answer is the development of:

Using multiple hidden layers instead of just one. (Even though the Universal Approximation Theorem shows that it’s theoretically possible to just have one hidden layer, it requires exponentially more hidden units, which means exponentially more parameters to learn.)

GPGPU, programmable libraries for GPUs that allow them to be used for applications other than video games, resulting in orders of magnitude faster training and inference for deep learning

A number of algorithmic tweaks (especially the Adam optimizer, ReLU activation functions, batch normalization, and dropout) that have made training faster and more resilient

Larger datasets– although this has been a driver of progress, it’s value is often over-emphasized, as the “little data” examples above show.

Another common misconception Chris Dixon stated was that deep learning talent is incredibly scarce, and it will take years for graduate programs at the top schools to catch up to the demand. Although in the past a graduate degree from one of just a handful of schools was necessary to become a deep learning expert, this is a completely artificial barrier and no longer the case. As Josh Schwartz, chief of engineering and data science at Chartbeat, writes in the Harvard Business Review, “machine learning is not just for experts”. There has been a proliferation of cutting-edge commercially usable machine learning frameworks, machine learning specific services being released from major cloud providers Amazon and Google, tutorials, publicly released code, and publicly released data sets.

We are currently in the middle of teaching 100 students deep learning from scratch, with the only prerequisite being one year of programming experience. This will be turned into a MOOC shortly after the in-person class finishes. We’re in the 4th week of the course, and already the students are building world-class image recognition models in Python.

It is far better to take a domain expert within your organization and teach them deep learning, than it is to take a deep learning expert and throw them into your organization. Deep learning PhD graduates are very unlikely to have the wide range of relevant experiences that you value in your most effective employees, and are much more likely to be interested in solving fun engineering problems, instead of keeping a razor-sharp focus on the most commercially important problems. In our experiences across many industries and many years of applying machine learning to a range of problems, we’ve consistently seen organizations under-appreciate and under invest in their existing in-house talent. In the days of the big data fad, this meant companies spent their money on external consultants. And in these days of the false “deep learning exclusivity” meme, it means searching for those unicorn deep learning experts, often including paying vastly inflated sums for failing deep learning startups.

We have been getting a lot of interest in our upcoming deep learning course over the last couple of days. With applications closing today, we’ve heard from USF that they will be able to extend the application deadline. The revised deadline is October 17. This will mean some late nights from the team at USF to ensure that all enrollments are processed in time for the start of the course–so many thanks to them! USF’s page with logistical details about the course is available here.

We are also excited to announce an additional diversity fellowship. After learning about our decision to sponsor a diversity fellowship, USF has generously said that they will match us, by sponsoring a 2nd diversity fellowship for our course! Women, people of Color, and LGBTQ people are all invited to apply. To apply, please email datainstitute@usfca.edu a copy of your resume, a sentence stating that you are interested in the diversity fellowship, and a paragraph describing the problem you want to use deep learning to solve. We are excited to start addressing the diversity crisis in Artificial Intelligence.

Finally, we’d like to announce that we have launched an International Fellowship program for up to five people, who will be able to fully participate in the course remotely, for free. We are very excited to introduce our first successful applicant, Samar Haider from Pakistan. Samar first taught himself machine learning using online resources like Andrew Ng’s Coursera class and is now a researcher applying natural language processing to his native language of Urdu, at the Center for Language Processing in Lahore. Pakistan has a rich heritage of 70 different spoken languages, many of which have not been well-studied. At fast.ai, this is exactly the type of project that we want to equip people to work on–domains outside of mainstream deep learning research, meaningful but low-resource areas, problems that smart people from a wide variety of backgrounds are passionate about. And Samar is exactly the kind of passionate person that we want to support–as well as teaching himself machine learning, he has even invested his own money to get access to GPU time on Amazon so that he can train models. We hope to see Samar’s fellowship benefit the Pakastani community more widely, by making available Urdu deep learning resources for the first time.

To apply to join Samar as an international fellow of the program, please email rachel@fast.ai a copy of your resume, a sentence stating that you are interested in the international fellowship, and a paragraph describing the problem you want to use deep learning to solve. The program is open to people anywhere in the world (including the US) who can not attend the course in person in SF.

International fellowship winners:

Must be willing to attend the course via Skype in realtime, even if the time is inconvenient in their home time zone

Must be willing to participate remotely with group members (who will be based in the San Francisco Bay Area)

I’ve been saying for some time that Deep Learning is going to be even more transformative than the internet. This view is shared by the always insightful Andrew Ng (Chief Scientist at Baidu, former CEO of Coursera, and head of Google Brain–and perhaps the only person I’m aware of who understands both business strategy and deep learning). This month’s Fortune magazine has Deep Learning as their cover story, and in it they quote Ng as saying: “In the past a lot of S&P 500 CEOs wished they had started thinking sooner than they did about their Internet strategy. I think five years from now there will be a number of S&P 500 CEOs that will wish they’d started thinking earlier about their AI strategy.”

I remember how many of my colleagues and clients reacted when I was at McKinsey & Co in the early nineties, and I was telling everyone I could that the internet was going to impact every part of every industry. At that time, as a very new consultant, I very little success in getting heard. (In hindsight, I clearly should have left consulting and started a company based on my conclusions!) I hope that this time around I am in a better position to help organizations understand why they need to invest in deep learning as soon as possible.

I have had many opportunities to discuss this issue with the S&P 500 executives who have attended my data science classes as part of the executive program at Singularity University. Many execs have gone on to develop data driven organization initiatives at their companies - but for those that don’t, these are some of the excuses that I’ve heard:

As a big company, we focus on competing, and our competitors aren’t doing this now

We run on expertise–we don’t trust models, but trust our instincts

Our data is too messy; our data projects aren’t ready yet

We can’t hire the right experts

Let’s look at each of these in turn.

1. You need to lead, not follow, on massive industry transformations

The lesson of the internet shows us the danger of being a follower when there’s a massive industry transformation going on. Whether it is Kodak vs Instagram, Amazon vs Borders, or any of the other pre-internet companies that got destroyed by new competitors, there are more than enough examples of the danger of waiting until you see your competitors’ completing transformation projects. You won’t know about your new competitors until it is far, far too late. And it’s much easier to get started early, when there’s time to build up the infrastructure and capabilities you need.

We can also see from the internet example that companies that are amongst the first into a space are the ones that win in the long term. Look at some of these examples:

2. Data and instinct work together

There is nothing wrong with trusting your instincts as an industry leader–for most execs, it’s your instincts that have gotten you to where you are today. But today’s data-driven companies are powering ahead on every metric that matters; the best role model is surely Google, which has nearly all leadership positions filled by computer science and math PhDs, and has used data to become one of the world’s largest companies in quick time.

Data and models should not be used to make decisions on their own, and neither should instinct. The best execs use a combination of both. Deep learning models provide deeper insight and greater accuracy, make existing products better, improve operations (e.g. Google used deep learning to reduce data center cooling requirements by 40%!) and make new classes of product available.

3. Using the data and infrastruture you have now is the best way to start

Every large organization I’ve even worked with has been in the middle of a major data infrastructure project going on at all times. If you wait until your data infrastructure is perfect, you’ll never start actually using that data to create value! There’s a lot of benefits to starting to use the infrastructure you already have to start creating value now:

You learn quickly what data is the most valuable in practice, and can focus your development efforts there

You create role model project results you can use to evangelize data-driven projects throughout the organization

Your further data infrastructure work can be funded by the value from your initial projects

You find out which of your team is most effective at delivering value from data, and can identify your recruiting needs more accurately

Deep learning is particularly effective at handling noise in data, and in handling unstructured data - so if your data infrastructure is not in a good state, it’s even more important that you invest in deep learning.

4. Rather than hiring experts, develop them internally

The people that best understand your business are the people who are in your business. Looking externally for deep learning experts, rather than developing deep learning expertise within your existing staff, means that you will be creating a gap between your domain experts and your new data experts. This gap can be nearly impossible to fill, and can lead to many organizational problems.

Furthermore, deep learning experts are like unicorns at the moment–there are very few available, and they are very expensive ($5m-$10m acquihire value, according to VC Steve Jurvetson). But any reasonable numerate coder can develop deep learning skills within a few months; in fact, we’re trying to teach the best practices in just seven weeks in our deep learning certificate!

The best approach, of course, is to do both: hire existing deep learning experts if you can, whilst developing your own team’s skills at the same time.

In closing

If you think that your organization should heed Andrew Ng’s advice, please send this article to the manager of every team that you think could benefit. My talk Deep Learning Changes Everything is included on USF’s deep learning certificate site, and has more information, including a sample deep learning lesson.