If you are a software developer who wants to build scalable AI-powered algorithms, you need to understand how to use the tools to build them. This Specialization will teach you best practices for using TensorFlow, a popular open-source framework for machine learning.
In Course 3 of the deeplearning.ai TensorFlow Specialization, you will build natural language processing systems using TensorFlow. You will learn to process text, including tokenizing and representing sentences as vectors, so that they can be input to a neural network. You’ll also learn to apply RNNs, GRUs, and LSTMs in TensorFlow. Finally, you’ll get to train an LSTM on existing text to create original poetry!
The Machine Learning course and Deep Learning Specialization from Andrew Ng teach the most important and foundational principles of Machine Learning and Deep Learning. This new deeplearning.ai TensorFlow Specialization teaches you how to use TensorFlow to implement those principles so that you can start building and applying scalable models to real-world problems. To develop a deeper understanding of how neural networks work, we recommend that you take the Deep Learning Specialization.

SS

This course covers the overview of NLP without going into much mathematical detail. In short time span, many things can be learnt from this course, helpful for the beginners.

GI

Jun 22, 2019

Filled StarFilled StarFilled StarFilled StarFilled Star

Amazing course by Laurence Moroney. But only after finishing Sequence Models by Andrew NG, I was able to understand the concepts taught here.

À partir de la leçon

Sentiment in text

The first step in understanding sentiment in text, and in particular when training a neural network to do so is the tokenization of that text. This is the process of converting the text into numeric values, with a number representing a word or a character. This week you'll learn about the Tokenizer and pad_sequences APIs in TensorFlow and how they can be used to prepare and encode text and sentences to get them ready for training neural networks!

Enseigné par

Laurence Moroney

AI Advocate

Transcription

In the previous video, you saw how to tokenize the words and sentences, building up a dictionary of all the words to make a corpus. The next step will be to turn your sentences into lists of values based on these tokens. Once you have them, you'll likely also need to manipulate these lists, not least to make every sentence the same length, otherwise, it may be hard to train a neural network with them. Remember when we were doing images, we defined an input layer with the size of the image that we're feeding into the neural network. In the cases where images where differently sized, we would resize them to fit. Well, you're going to face the same thing with text. Fortunately, TensorFlow includes APIs to handle these issues. We'll look at those in this video. Let's start with creating a list of sequences, the sentences encoded with the tokens that we generated and I've updated the code that we've been working on to this. First of all, I've added another sentence to the end of the sentences list. Note that all of the previous sentences had four words in them. So this one's a bit longer. We'll use that to demonstrate padding in a moment. The next piece of code is this one, where I simply call on the tokenizer to get texts to sequences, and it will turn them into a set of sequences for me. So if I run this code, this will be the output. At the top is the new dictionary. With new tokens for my new words like amazing, think, is, and do. At the bottom is my list of sentences that have been encoded into integer lists, with the tokens replacing the words. So for example, I love my dog becomes 4, 2, 1, 3. One really handy thing about this that you'll use later is the fact that the text to sequences called can take any set of sentences, so it can encode them based on the word set that it learned from the one that was passed into fit on texts. This is very significant if you think ahead a little bit. If you train a neural network on a corpus of texts, and the text has a word index generated from it, then when you want to do inference with the train model, you'll have to encode the text that you want to infer on with the same word index, otherwise it would be meaningless. So if you consider this code, what do you expect the outcome to be? There are some familiar words here, like love, my, and dog but also some previously unseen ones. If I run this code, this is what I would get. I've added the dictionary underneath for convenience. So I really love my dog would still be encoded as 4, 2, 1, 3, which is 'I love my dog' with 'really' being lost as the word is not in the Word Index, and 'my dog loves my manatee' would get encoded to 1, 3, 1, which is just 'my dog my'.