What is machine learning, and what kinds of problems can it solve? Google thinks about machine learning slightly differently -- of being about logic, rather than just data. We talk about why such a framing is useful for data scientists when thinking about building a pipeline of machine learning models.
Then, we discuss the five phases of converting a candidate use case to be driven by machine learning, and consider why it is important the phases not be skipped. We end with a recognition of the biases that machine learning can amplify and how to recognize this.
>>> By enrolling in this specialization you agree to the Qwiklabs Terms of Service as set out in the FAQ and located at: https://qwiklabs.com/terms_of_service <<<

Taught By

Google Cloud Training

Transcript

In this lecture, we're going to talk about The Secret Sauce. So Google is going to share The Secret Sauce with you. But that secret sauce is not code, it's not just an algorithm, it's actually this organizational know-how that we've acquired over years of managing probably more value-generating ML systems than any other company in the world. So if we're going to share this organizational know-how, why start with technical ML skills? Well, we want you to become great ML strategists. And to do that, we believe that you need to get your hands dirty. You actually need to go out and you need to build some of these systems and learn about them. And the good news about that is that these technical ML skills that you're here looking for on Coursera, well, they're mostly software and data handling skills anyway. There are things you may already be very comfortable with. And as we talk about these technical skills, it also gives us an opportunity to leverage Google's experience to help you avoid some of these common pitfalls. What are some of these common pitfalls? I'm glad you asked. So here is our kind of click baity fun, top ten pitfalls organizations hit when they first try ML. And here's a list, very informally I've aggregated after several years of talking with new ML practitioners that come to us and they say, "We're so excited to this great new thing, it's going to be awesome." And then they might fall into some common pitfalls. I've seen it at Google, and I've seen it with our partners as well. First one, perhaps one of the most common, you thought training your own ML algorithm would be faster than writing the software. Usually, this is not the case. And the reason is that to make a great ML system beyond just the algorithm, you're going to need lots of things around the algorithm like a whole software stack to serve, to make sure that it's robust and it's scalable and has great uptime. And all of this, you're going to have to do for software anyway. But then if you try to use an ML algorithm, you put in additional complexities around data collection, training, all of that just gets little bit more complicated. So usually, we really push people to start with something simpler in software only. Next one, one of my favorites. You want to do ML, but you haven't collected the data yet. Full stop, you need the data. There's really no use talking about doing great ML if you have not collected great data or you do not have access to great data. And let's say you do have that data, you've been logging in for years, so it's written on some system that someone in another department controls, but you haven't looked at it, I'm willing to bet that if you haven't looked, that data is not really ready to use, and it goes even beyond that. If there's not someone in your organization who's regularly reviewing that data or generating reports or new insights, if that data is not generating value already, likely, it's not the effort to maintain it is not being put in and data has this kind of magical way of going stale. Of all the clients I've ever talked to, I've never met one who overestimated the amount of effort it would take to collecting clean data. No one has ever said that was easier than I expected, wxpect there to be a lot of pain and friction here. What's the next one? You forgot to put and keep humans in the loop. So when we get into these ML systems that start to perform core tasks or core business processes in our organizations, they become really important. And appropriately, organizations become risk averse around these systems because they are the breadwinners of the organization and then becomes very important to mitigate this risk. And one of the myriad of ways we do that is we keep humans inside the loop so that they are reviewing the data, handling cases the ML did not handle very well and curating its training inputs. And we're going to talk about this more later, but this is a feature of every production ML system I know in Google, is that it has humans in the loop. What about this one? You launched a product whose initial value prop was its ML algorithm instead of some other feature. So this is a problem because A, your users probably don't care if what you're giving them is ML, they just care if it's got that new cool feature or if its recommendations are really good. And, if you launch something whose initial value prop is just ML, it has no data to operate on. It needs lots of users to generate that data so it may learn how to interact better. What about you made a great end ML system, it just happens to optimize for the wrong thing. So imagine if Google Search was optimizing for, let's say user engagement as measured by how often someone clicked on search results. It sounds good. right? We want our users to like our product, we want our users to stay engaged. But if we optimize for how often they click, maybe then the ML algorithm will learn to kind of serve bad content because it forces users to come back, keep clicking. So we always want to be careful about optimizing for something that's pretty good, need not be perfect, but we will always want to look out for perverse incentives. So what happens if you forget to measure if your ML algorithm is actually improving things in the real world? You put it out there, you turned it on, it serves users, but you can't tell how much better it is, you can't tell if there's any uplifting customer engagement, or lifetime value. That's always really worrisome because then how are you going to go back to your boss or your boss's boss and say, "Hey, I want to do this for another product," if you cannot show the impact of the success. And then I've seen a couple of customers do this next one, you confuse the ease of use and the value add of somebody else's pre-trained ML algorithm with building your own. So Google Cloud has a couple what we call "ML APIs." For instance, with vision, you can send it an image and it will perform image classification on some predefined labels. Well that's great, it's super easy to use. You don't have to worry about any infrastructure, or any training data, or any data collection, very easy to use. It is a very different ballgame than if you went to start to build your own, especially if you want to do your own ML algorithm that does not kind of come pre-canned, it's a lot more effort. You thought after research that production ML algorithms were trained only once. You're like, "Hey, it's on my laptop, it's doing great on that data set. I'm basically done." No, you're probably about 10 percent of the way through. It turns out that if you're going to have an ML algorithm that's going to be part of your core business processes, it's going to be retrained many, many times and you're going to want to invest the effort to make that process very easy and seamless. And the final one is actually the only one these I have that addresses a confusion about the challenge involved in optimizing the ML algorithm, and that's, you want to design your own in-house perception, i.e. image or speech, or NLP classification, or that's natural language processing. So these are kind of a peculiar pitfall in the sense that they seem they're much easier than they really are. And in fact, all the algorithms we have to address these are very highly tuned from decades of academic research and you should almost always take one off the shelf, already made or already kind of defined, instead of trying to do your own research, it's very expensive. So that's a lot of bad news. That's a lot of pitfalls. That's a lot of problems. What's the good news? So the good news is, most of the value comes along the way. As you march towards ML, you may not get there, and you will still greatly improve everything you're working on. And if you do get there, ML improves almost everything it touches once you're ready. And think about this, if the process to build and use ML is hard for your company, it's likely hard for the other members of your industry, right? And once you have that ML-enable product or internal process, it's going to provide the users or the consumers of that process great experiences that become very hard to duplicate or catch up to because of this beautiful feedback loop where it's collecting more data and learning all the time. So, I would like to double click into this idea that value comes along the way. I know it's tempting to try to jump to a fully machine learned, automated end to end, auto magic everything solution. We all want to make this leap, but it usually doesn't lead to great products or organizational outcomes. I've seen that in Google, and I've seen that in our partner organizations as well. So what I want to do now is review a more realistic path and all the great things that come along the way.

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.