This course is for finance professionals, investment management professionals, and traders. Alternatively, this course can be for machine learning professionals who seek to apply their craft to trading strategies.
At the end of the course you will be able to do the following:
- Understand the fundamentals of trading, including the concept of trend, returns, stop-loss and volatility
- Understand the differences between supervised/unsupervised and regression/classification machine learning models
- Identify the profit source and structure of basic quantitative trading strategies
- Gauge how well the model generalizes its learning
- Explain the differences between regression and forecasting
- Identify the steps needed to create development and implementation backtesters
- Use Google Cloud Platform to build basic machine learning models in Jupyter Notebooks
To be successful in this course, you should have a basic competency in Python programming and familiarity with pertinent libraries for machine learning, such as Scikit-Learn, StatsModels, and Pandas. Experience with SQL will be helpful. You should have a background in statistics (expected values and standard deviation, Gaussian distributions, higher moments, probability, linear regressions) and foundational knowledge of financial markets (equities, bonds, derivatives, market structure, hedging).

Revisiones

AA

Good course that gives a lot of breadth as an introduction to machine learning in finance. Well put together

CR

Jan 02, 2020

Filled StarFilled StarFilled StarFilled StarFilled Star

Other courses recommended before doing this one! Basics of ML, Basics of the stock market, python and sql

De la lección

Introduction to Neural Networks and Deep Learning

In this module you'll learn about neural networks and how they relate to deep learning. You'll also learn how to gauge model generalization using regularization, and cross-validation. Also, you'll be introduced to Google Cloud Platform (GCP). Specifically, you'll be shown how to leverage GCP for implementing trading techniques.

Impartido por:

Jack Farmer

Ram Seshadri

Transcripción

So in addition to helping you choose between two different ML models, should I use linear regression or a neural network, you can also use your validation dataset to help fine tune those hyperparameters of a single model. Which, if you recall, those hyperparameters are set before training. This tuning process is accomplished through successive training runs and then comparing those training runs against that independent validation dataset to check for overfitting. So here's how your validation set will actually be used during training. As you saw when we covered during optimization, training the models where we start to calculate random weights, calculate that derivative, look at the direction down the gradient descent loss curve, minimize your loss metric and then repeat. Periodically, you want to assess the performance of our model against data that it has not yet seen in training, which is where we use the validation dataset. So after a completed training once this happened, validate that model's results against your validation dataset to see if those hyperparameters are any good or if you can tune them a little bit more. If there's not a significance divergence between the loss metric from the training run and the loss metrics for the validation dataset run, then we could potentially go back and optimize our hyperparameters a little bit more. Now, once the loss metrics from our model had been sufficiently optimized and passed the validation dataset, remember when you start to see that divergence and you confirm that the model is not overfitting, that's when we know we need to stop and say our model is tuned, we're ready for production. Now you can use a loop similar to this one to also figure out what model parameters for your individual models like what we did for the hyperparameters that we set before training. For example, the layers of a network or the number of nodes that you should use. Essentially, you'll train with one configuration like six nodes in your neural network and then train against another one and then evaluate to see which one performs better on your validation dataset. You're going to end up choosing a model configuration that results in a lower loss in the validation dataset, not the model configuration that results in a lower loss on the training one. Now, later in this specialization, we're going to show you how Cloud ML Engine and can carry out a Bayesian short search through hyperparameter space so you don't have to do this kind of experimentation one hyperparameter at a time. Now, Cloud Machine Learning Engine help us do this sort of experimentation in a parallel fashion using a different optimized strategy. Now, once you're done your training, you need to tell your boss how well is your model doing. What dataset are you going to use for that final go or no-go evaluation? Can you just simply report the loss or the error on your validation dataset even if it's consistent with your training dataset? Actually you can't. Why not? Because you used your validation dataset to choose when you should stop the training. It's no longer independent. The model has seen it. So what do you have to do? Well, you actually have to split your data into three parts; Training, validation, and a brand new completely isolated silo called test or testing. Once you're model has been trained and validated, then you can run it once and only once against the independent test dataset, and that's the loss metric that you can report to your boss. It's the loss metric that then on your testing dataset decides whether or not you want to use this model in production. What happens if you fail on your testing dataset even though you pass validation? It means you can't retest the same ML model. You've got to either retrain a brand new machine learning model or go back to the drawing board and collect more data samples to provide new data for your ML model. While this is a good approach, there's one teeny-tiny problem. Nobody likes to waste data and it seems like the test data is essentially wasted. I'm only using it once, right? It's held out. Can't you use all your data in training and still get a reasonable indication of how well your model's going to perform? Well, the answer is you can. The compromise between these methods is to do a training validation split and do that many different times. Train and then compute the loss in the validation dataset, keeping in mind this validation set could consist of points that were not used in training the first time, then split the data again. Now you're training data might include some points that you use in your original validation on that first run, but you're completely doing multiple iterations, and then finally after a few rounds of this blending you average the validation loss metrics across the board. You'll get a standard deviation of the validation losses and be able to help you analyze that spread and go with the final number. This process is called bootstrapping or cross-validation. The upside is you get to use all data but you have to train lots and lots more times because you're creating more of the splits. Right? So at the end of the day, here's what you have to remember. If you have lots of data, use the approach of having a completely independent held-out test dataset. That's like go or no go decision. If you don't have that much data, use the cross-validation approach.