I wrote (almost) all of the posts on this site and I wrote all of the books in the catalog.

I make it a priority to write and dedicate time every single day to replying to blog comments, replying to emails from readers, and to writing and editing tutorials on applied machine learning.

I very rarely take a break and even write while on vacation.

I aim to be super responsive and accessible, so much so, that some readers and customers get frustrated when I take a long haul flight.

I’m also fortunate that working on Machine Learning Mastery is my full-time job, and has been since May 2016. Before that, it was a side project where I wrote early in the morning before work (4:30 am to 5:30am) and on weekends.

For example, a classical academic approach is to start by learning the mathematical prerequisite subjects, then learn general machine learning theories and their derivations, then the derivations of machine learning algorithms. One day you might be able to run an algorithm on real data.

This approach can crush the motivation of a developer interested in learning how to get started and use machine learning to add value in business.

An alternative is to study machine learning as a suite of tools that generate results to problems.

In this way, machine learning can be broken down into a suite of tutorials for studying the tools of machine learning. Specifically, applied machine learning, because methods that are not useful or do not generate results are not considered, at least not at first.

By focusing on how to generate results, a practitioner can start adding value very quickly. They can also filter possible areas to study in broader machine learning and narrow their focus to those areas that produce results that are directly useful and relevant to their project or goals.

I call this approach to studying machine learning “results-first“, as opposed to “theory-first“.

Undergraduate and postgraduate courses on machine learning are generally designed to teach you theoretical machine learning. They are training academics.

This also applies to machine learning textbooks, designed to be used in these courses.

These courses are great if you want to be a machine learning academic. They may not be great if you want to be a machine learning practitioner.

The approach starts from first principles and is rooted in theory and math. I refer to this as bottom-up machine learning.

An alternate approach is to focus on what practitioners need to know in order to add value using the tools of machine learning in business. Specifically, how to work through predictive modeling problems end-to-end.

Theory and math may be used but is touched on later, in the context of the process of working through a project and only in ways that make working through a project clearer or allow the practitioner to achieve better results.

I recommend that developers that are interested in being machine learning practitioners use this approach.

I refer to it as top-down machine learning.

You can discover more about how you are already using the top-down approach to learning in this post:

It’s an interesting (and personal) question. It presupposes a few things:

That I think I’m good because I run this site, maybe there are other reasons.

That the best practitioners are at big tech companies, perhaps they are.

That the goal of a practitioner is to work at big tech companies, it’s not my goal.

That being said, thousands of employees from top companies, as well as students and faculty from top universities buy and use my training material.

I don’t think I’m better or worse than other practitioners. I write to help developers, not to prove anything. This blog is full of tutorials for you, the reader. Mostly based on questions that I’m asked.

Nevertheless, why wouldn’t I take a job offer at some big tech firm?

I prefer to focus on this website instead of working for a tech company because (in no particular order):

I prefer my freedom, not having a boss.

I prefer working from home, not a random office building.

I prefer to work towards my own goals, not the impersonal goals of a company.

I prefer to see 100% of the upside of my hard work, not a small salary.

I prefer to follow my curiosity, not a fixed plan controlled by someone else.

I prefer to have direct access to the people I am helping, not to be anonymous or filtered.

I prefer a mix of research (reading/writing) and business (sales/marketing), not just one or the other.

I have also received many comments and criticisms of these preferences (thanks for the feedback).

Some common criticisms and my comments include:

But you can impact millions of people by working at a big tech company!

I’m already impacting millions of people as this site gets millions of pageviews each month.

But you can make a big salary at a big tech company!

The sales from my Ebooks supports the site and my family just fine.

But you are taking a big risk running a small business!

I think its riskier to have a single source income rather than getting paid by many customers.

I’ve studied and worked in a few different areas of artificial intelligence, computational intelligence, multi-agent systems, severe weather forecasting, but I keep coming back to applied machine learning.

2) I want to help developers get started and get good at machine learning.

I see so many developers wasting so much time. Studying the wrong way, focusing on the wrong things, getting ready to get started in machine learning but never pulling the trigger. It’s a waste and I hate it.

Previous

Next

Practitioner Questions (88)

Generally, you do not need special hardware for developing deep learning models.

You can use a sample of your data to develop and test small models on your workstation with the CPU. You can then develop larger models and run long experiments on server hardware that has GPU support.

For beginners, I recommend learning machine learning on small data first, before tackling machine learning on big data. I think that you can learn the processes and methods fast using small in-memory datasets. Learn more here:

Predictive Modeling is a subfield of machine learning that is what most people mean when they talk about machine learning. It has to do with developing models from data with the goal of making predictions on new data. You can learn more about predictive modeling in this post:

Artificial Intelligence or AI is a subfield of computer science that focuses on developing intelligent systems, where intelligence is comprised of all types of aspects such as learning, memory, goals, and much more.

Big data involves methods and infrastructure for working with data that is too large to fit on a single computer, such as a single hard drive or in RAM.

An exciting aspect of big data is that simple statistical methods can reveal surprising insights, and simple models can produce surprising results when trained on big data. An example is the use of simple word frequencies prepared on a very big dataset instead of the use of sophisticated spelling correction algorithms. For some problems data can be more valuable than complex hand-crafted models. You can learn more about this here:

Data science is a new term that means using computational and scientific methods to learn from and harness data.

A data scientist is someone with skills in software development and machine learning who may be tasked with both discovering ways to better harness data within the organization toward decision making and developing models and systems to capitalize on those discoveries.

A data scientist uses the tools of machine learning, such as predictive modeling.

I generally try not to use the term “data science” or “data scientist” as I think they are ill defined. I prefer to focus on and describe the required skill of “applied machine learning” or “predictive modeling” that can be used in a range of roles within an organization.

There are many roles in an organization where machine learning may be used. For a fuller explanation, see the post:

Predictive Modeling is a subfield of machine learning that is what most people mean when they talk about machine learning. It has to do with developing models from data with the goal of making predictions on new data. You can learn more about predictive modeling in this post:

A neural network designed for a regression problem can easily be changed to classification.

It requires two changes to the code:

A change to the output layer.

A change to the loss function.

A neural network designed for regression will likely have an output layer with one node to output one value and a linear activation function, for example, in Keras this would be:

1

2

...

model.add(Dense(1,activation='linear'))

We can change this to a binary classification problem (two classes) by changing the activation to sigmoid, for example:

1

2

...

model.add(Dense(1,activation='sigmoid'))

We can change this to a multi-class classification problem (more than two classes) by changing the number of nodes in the layer to the number of classes (e.g. 3 in this example) and the activation function to softmax, for example:

1

model.add(Dense(3,activation='softmax'))

Finally, the model will have an error based loss function, such as ‘mse‘ or ‘mae‘, for example:

1

2

3

...

# compile model

model.compile(loss='mse',optimizer='adam')

We must change the loss function for a binary classification problem (two classes) to binary_crossentropy, for example:

1

2

3

...

# compile model

model.compile(loss='binary_crossentropy',optimizer='adam')

We must change the loss function for a multi-class classification problem (more than two classes) to categorical_crossentropy, for example:

1

2

3

...

# compile model

model.compile(loss='categorical_crossentropy',optimizer='adam')

That is it.

Generally, I would recommend re-tuning the hyperparameters of the neural network for your specific predictive modeling problem.

For some examples of neural networks for classification, see the posts:

You can evaluate a machine learning algorithm on your specific predictive modleing problem.

I assume that you have already collected a dataset of observations from your problem domain.

Your objective is to estimate the skill of a model trained on a dataset of a given size by making predictions on a new test dataset of a given size. Importantly, the test set must contain observations not seen during training. This is so that we get a fair idea of how the model will perform when making predictions on new data.

The size of the train and test datasets should be sufficiently large to be representative of the problem domain. You can learn more about how much data is required in this post:

A simple way to estimate the skill of the model is to split your dataset into two parts (e.g. a 67%/33% train/test split), train on the training set and evaluate on the test set. This approach is fast, and is suitable if your model is very slow to train or you have a lot of data and a suitably large and representative train and test sets.

train/test split.

Often we don’t have enough data for the train and test sets to be representative. There are statistical methods called resampling methods that allow us to economically reuse the one dataset and split it multiple times. We can use the multiple splits to train and evaluate multiple models, then calculate the average performance across each model to get a more robust estimate of the skill of the model on unseen data.

Two popular statistical resampling methods are:

k-fold cross-validation.

bootstrap.

I have tutorials on how to create train/test splits and use resampling methods for a suite of machine learning platforms on the blog. Use the search feature. Here are some tutorials that may help to get started:

This means that you very quickly learn how to work through predictive modeling problems and deliver results.

As part of this process, I teach a method of developing a portfolio of completed projects. This demonstrates your skill and gives you a platform from which to take on ever more challenging projects.

It is this ability to deliver results and the projects that demonstrate that you can deliver results is what will get you a position.

Business use credentials as a shortcut for hiring, they want results more than anything else. Smaller companies are more likely to value results above credentials. Perhaps focus your attention on smaller companies and start-ups seeking developers with skills in machine learning.

If you don’t know how to code, I recommend getting started with the Weka machine learning workbench. It allows you to learn and practice applied machine learning without writing a line of code. You can learn more here:

A p-value is the probability of observing a result given a null hypothesis (e.g. no change, no difference, or no result):

1

p-value = Pr(data | hypothesis)

A p-value is not the probability of the hypothesis being true, given the result.

1

p-value is not Pr(hypothesis | result)

The p-value is interpreted in the context of a pre-chosen significance level, called alpha. A common value for alpha is 0.05, or 5%. It can also be thought of as a confidence level of 95% calculated as (1.0 – alpha).

The p-value can be interpreted with the significance level as follows:

A significance level of 5% means that there is a 95% likelihood that we will detect a result (reject H0), if there is a result to detect. Put another way, there is a 5% likelihood of finding an effect (reject H0) if there is no effect, called a false positive or more technically a Type I error.

Statistics, specifically applied statistics is concerned with using models that are well understood, such that it can clearly be shown why a specific prediction was made by the model.

Explaining why a prediction is made by a model for a given input is called model interpretability.

Examples of predictive models where it is straightforward to interpret a prediction by the model are linear regression and logistic regression. These are simple and well understood methods from a theoretical perspective.

Note, interpreting a prediction does not (only) mean showing “how” the prediction was made (e.g. the equation for how the output was arrived at), it means “why“, as in the theoretical justification for why the model made a prediction. An interpretable model can show the relationship between input and output, or cause and effect.

It’s worse than this, there are great claims for the need for model interpretability, but little definition of what it is or why it is so important. See the paper:

I think the goal of model interpretability may be misguided by machine learning practitioners.

In medicine, we use drugs that give a quantifiable result using mechanisms that are not understood. The how may (or may not) be demonstrated, but the cause and effect for individuals is not. We allow the use of poorly understood drugs through careful and systematic experimental studies (clinical trials) demonstrating efficacy and limited harm. It mostly works too.

As a pragmatist, I would recommend that you focus on model skill, on delivering results that add value, and on a high level of rigor in the evaluation of models in your domain.

A model has skill if the performance is better than the performance of the baseline model. This is what we mean when we talk about model skill being relative, not absolute, it is relative to the skill of the baseline method.

Additionally, model skill is best interpreted by experts in the problem domain.

I strongly believe that self-study is the path to getting started and getting good at applied machine learning.

I have dedicated this site to help you with your self-study journey toward machine learning mastery (hence the name of the site).

I teach an approach to machine learning that is different to the way it is taught in universities and textbooks. I refer to the approach as top-down and results-first. You can learn more about this approach here:

A big part of doing well at school, especially the higher degrees, is hacking your own motivation. This, and the confidence it brings, was what I learned at university.

You must learn how to do the work, even when you don’t feel like it, even when the stakes are low, even when the work is boring. It is part of the learning process. This is called meta-learning or learning how to learn effectively. You doing the learning, not humans in general.

Learning how you learn effectively is a big part of self-study.

Find and use what motivates you.

Find and use the mediums that help you learn better.

Find and listen to teachers and material to which you relate strongly.

External motivators like getting a coach or an accountability partner sound short-term to me. You want to solve “learning how to self-study” in a way that you have the mental tools for the rest of your life.

A big problem with self-directed learning is that it is curiosity-driven. It means you are likely to jump from topic to topic based on whim. It also offers a great benefit, because you read broadly and deeply depending on your interests, making you a semi-expert on disparate topics.

An approach I teach to keep this on track is called “small projects“.

You design a project that takes a set time and has a defined endpoint, such as a few man-hours and a report, blog post or code example. The projects are small in scope, have a clear endpoint and result in a work product at the end to add to the portfolio or knowledge base that you’re building up.

You then repeat the process of designing and executing small projects. Projects can be on a theme of interest, such as “Deep Learning for NLP” or on questions you have along the way “How do you use SVM in scikit-learn?“.

Generally, you cannot use k-fold cross-validation to estimate the skill of a model for time series forecasting.

The k-fold cross-validation method will randomly shuffle the observations, which will cause you to lose the temporal dependence in the data, e.g. the ordering of observations by time. The model will no longer be able to learn how prior time steps influence the current time step. Finally, the evaluation will not be fair as the model will be able to cheat by being able to look into past and future observations.

The recommended method for estimating the skill of a model for time series forecasting is to use walk-forward validation.

Keras wraps powerful computational engines, such as Google’s TensorFlow library, and allows you to create sophisticated neural network models such as Multilayer Perceptrons, Convolutional Neural Networks, and Recurrent Neural Networks with just a few lines of code.

The k-fold cross-validation method is used to estimate the skill of a model when making predictions on new data.

It is a resampling method, which makes efficient use of your small training dataset to evaluate a model.

It works by first splitting your training dataset into k groups of the same size. A model is trained on all but one of these groups, and then is evaluated on the hold out group. This process is repeated so that each of the k sub-groups of the training dataset is given a chance to be used as a the hold-out test set.

This means that k-fold cross-validation will train and evaluate k models and give you k skill scores (e.g. accuracy or error). You can then calculate the average and standard deviation of these scores to get a statistical impression of how well the model performs on your data.

Regardless of the configuration you choose, you must carefully and systematically evaluate the configuration of the model on your dataset and compare it to a baseline method in order to demonstrate skill.

I recommend an approach to self-study that I call “small projects” or the “small project methodology” (early customers may remember that I even used to sell a guide by this name).

The small project methodology is an approach that you can use to very quickly build up practical skills in technical fields of study, like machine learning. The general idea is that you design and execute small projects that target a specific question you want to answer. You can learn more about this approach to studying machine learning here:

Keras wraps powerful computational engines, such as Google’s TensorFlow library, and allows you to create sophisticated neural network models such as Multilayer Perceptrons, Convolutional Neural Networks and Recurrent Neural Networks with just a few lines of code.

Verbose is an argument in Keras on functions such as fit(), evaluate(), and predict().

It controls the output printed to the console during the operation of your model.

Verbose takes three values:

verbose=0: Turn off all verbose output.

verbose=1: Show a progress bar for each epoch.

verbose=2: Show one line of output for each epoch.

When verbose output is turned on, it will include a summary of the loss for the model on the training dataset, it may also show other metrics if they have been configured via the metrics argument.

The verbose argument does not affect the training of the model. It is not a hyperparameter of the model.

Note, that if you are using an IDE or a notebook, verbose=1 can cause issues or even errors during the training of your model. I recommend turning off verbose output if you are using an IDE or a notebook.

Typically, a model is overfit if the skill of the model is better on the training dataset than on the test dataset.

If the model skill is poor on both the training and the test datasets, the model may be underfit.

Sometimes, it can be the case that the skill of the model is better on the test dataset than on the training dataset.

This is likely because the test dataset is not representative of the broader prediction problem, for example the size of the test dataset is too small.

To remedy this problem, I would recommend experimenting with different variations of the test harness, including at least different or different sized train and test datasets and different model configurations.

Some algorithms such as Random Forest and Stochastic Gradient Boosting have being show to work well if not the best on large numbers of classification and regression predictive modeling problems. Perhaps try these methods first? You can learn more about this here:

Regression involves predicting a numerical quantity for an observation.

Some examples include:

Predicting the price for the description of a house.

Predicting the number of bugs given a sample of code.

Predicting the number of pageviews for a given new article.

Classification and regression predictive modeling problems are two high level types of problems, although there are many specalizations, such as recommender systems, time series forecasting and much more.

Supervised learning is used on problems where the goal is to learn a mapping from inputs to outputs.

The methods are referred to as “supervised” because the learning process operates like a teacher supervising a student. The model continually makes predictions, the predictions are compared to the expected outcomes, error is calculated, and the model is corrected using these errors.

Examples of supervised machine learning problems include:

Classification or the mapping of input variables to a label.

Regression or the mapping if input variables to a quantity.

Examples of supervised machine learning algorithms include:

k-nearest neighbours.

support vector machines.

multilayer perceptron neural networks.

Unsupervised methods are used on a problem where there are only the inputs, and the goal is to learn or capture the inherent interesting structure in the data.

The methods are referred to as “unsupervised” to distinguish them from the “supervised” methods. There is no teacher, instead the models are updated based on repeated exposure to examples from the problem domain.

Examples of unsupervised machine learning problems include:

Clustering or the learning of the groups in the data.

Association of the learning of relationships in the data.

Examples of unsupervised machine learning algorithms include:

k-means.

apriori.

self-organizing map neural network.

You can learn more about supervised vs unsupervised methods in this post:

Larger weights in a neural network (weights larger than they need to be) are a sign of overfitting and can make the model unstable.

Both weight regularization and weight constraints are regularization approaches intended to reduce overfitting (improve the generalization) of a neural network model.

Weight regularization updates the loss function used during training to penalize the model based on the size of the weights, calculated as the vector norm (magnitude) like L1 (sum of absolute weights) or L2 (sum of squared weights). Use of the L2 vector norm is often called “weight decay“.

Weight constraint is an if-then check during optimization for the size of the weights. If triggered, e.g. if the size of the weights calculated as the vector norm (often max norm) is larger than a pre-defined value, all weights are scaled so that the norm of the weights is below the desired level.

So “weight regularization” encourages the model to have small weights, where as “weight constraints” forces the model to have small weights.

In neural networks, a batch and an epoch are two hyperparametres that you must choose when training the network.

They are used in the stochastic gradient descent.

A sample is a single row of data, including the inputs for the network and the expected output.

A batch is a collection of samples that the network will process, after which the model weights will be updated. The model will make predictions for each sample in the batch, the error will be calculated by comparing the prediction to the expected value, an error gradient will be estimated and the weights will be updated. A training dataset is split into one or more batches.

An epoch involves one pass over the training dataset. One epoch is comprised of one or more batches, depending on the chosen batch size.

You can learn more about the difference between a batch and an epoch here:

Generally, the no free lunch theorem suggests that no single machine learning method will perform better than any other when averaged across all possible problems.

The theorem concerns optimization and search, although has implications for predictive modeling with machine learning as most methods solve an optimization problem in order to approximate a function.

The implication is that there is no single algorithm that will be the best, let alone perform well, across all problems. Stated another way, all algorithms perform equally well when their performance is averaged across all possible problems. There is no silver bullet.

It is theoretical and assumes that we do not know anything about the problem, therefore cannot use knowledge of the problem to narrow the selection of algorithms. This is not true in practice.

Although there are no silver bullets, there are algorithms (such as Random Forest and Stochastic Gradient Boosting) that perform surprisingly well across many predictive modeling problems that we are interested in, i.e. a subset of all possible problems.

Not all of these areas of math are relevant, only parts. Also, you need the intersections of these fields. For example, Linear Algebra + Statistics is Multivariate Analysis, and it’s needed to get into PCA and other projection methods, Linear Algebra + Calculus is Multivariate Calculus, needed for learning algorithms in deep learning. These are really hard postgraduate topics and not the place to start for developers that have a problem to solve.

Don’t make the beginners mistake of thinking you need to start here. Circle back after you know how to work through a predictive modeling problem end-to-end.

Learn math when you’re ready and only learn the relevant parts to help you get the most out of a method, or better results on your next project.

It is possible for the performance of your model to get stuck during training.

This can happen with neural networks that achieve a specific loss, error or accuracy and no longer improve, showing the same score at the end of each subsequent epoch.

In the simplest case, if you have a fixed random number seed for the code example, then try changing the random seed, or do not specify the seed so that different random numbers are used for each run of the code.

Machine learning algorithms use randomness, such as in the initialization, during learning, and in the evaluation of the algorithm.

Random numbers are calculated using a pseudorandom number generator.

Pseudorandom number generators can be seeded such that they produce the same sequence of numbers each time they are run. This can be useful to reproduce a model exactly, such as during a tutorial or as a final model trained on all available data.

The value used to seed the pseudorandom number generator does not matter. You can use any number you wish.

To learn more about pseudorandom number generators and when it is appropriate to seed them, see the post:

By definition, a user parameter cannot be set automatically. There may exist heuristics to set the hyperparameter, but if they were reliable, they would be used instead of requiring for you to set the value for the hyperparameter.

Standardization refers to scaling a variable that has a Gaussian distribution such that it has a mean of zero and a standard deviation of one.

Normalization refers to scaling a variable that has any distribution so that all values are between zero and one.

It is possible to normalize after standardizing a variable.

Generally, normalization and standardization are data scaling methods, and there are other methods that you may want to use.

Scaling methods are often appropriate with machine learning models that learn or make prediction on data using the distance between observations (e.g. k-nearest neighbors and support vector machines) and methods that calculate weighted sums of inputs (e.g. linear regression, logistic regression, and neural networks).

If you are still in doubt as to whether you should standardize, normalize, both, or something else, then I would recommend establishing a baseline model performance on your raw data, then experiment with each scaling method and compare the resulting skill of the model.

For more help with data standardization, normalization and scaling in general, see the posts (and further reading sections):

A Multilayer Perceptron or MLP can approximate a mapping function from inputs to outputs. They are flexible and can be adapted to most problems, nevertheless, they are perhaps more suited to classification and regression problems.

A Convolutional Neural Network or CNN was developed and is best used for image classification. They can also be used generally for working with data that has a spatial structure, such as a sequence of words and can be used for document classification.

A Recurrent Neural Network or RNN (such as the LSTM network) was developed for sequence prediction and are well suited for problems that have a sequence on input observations or a sequence of output observations. They are suitable for text data, audio data and similar applications.

Most useful network architectures are a hybrid, combining MLP, CNN and/or RNNs in some way.

Some model evaluation metrics such as mean squared error (MSE) are negative when calculated in scikit-learn.

This is confusing, because error scores like MSE cannot actually be negative, with the smallest value being zero or no error.

The scikit-learn library has a unified model scoring system where it assumes that all model scores are maximized. In order this system to work with scores that are minimized, like MSE and other measures of error, the sores that are minimized are inverted by making them negative.

This can also be seen in the specification of the metric, e.g. ‘neg‘ is used in the name of the metric ‘neg_mean_squared_error‘.

When interpreting the negative error scores, you can ignore the sign and use them directly.

The persistence forecast is the best that we can do on challenging time series forecasting problems, such as those series that are a random walk, like short range movements of stock prices. You can learn more about this here:

I call learning the math and theory for machine learning first the “bottom-up” approach to machine learning.

It is the approach taught by universities and used in textbooks.

It requires that you learn the mathematical prerequsites, then the general theories of the field, then the equations and their derivations for each algorithm.

It is much slower.

It is much harder.

It is great for training academics (not practitioners).

A final problem is, that is where the bottom-up approach ends.

I teach an alternative approach that inverts the process called “top-down” machine learning.

We start by learning the process of how to work through predictive modeling problems end to end, from defining the problem to making predictions. Then we practice this process and get good at it. We start by learning how to deliver results and add value.

Later we circle back to the math and theory, but only in the context of the process. Meaning, only the theory and math that helps us deliver better results faster is considered.

You can learn more about the contrast between these two approaches here:

But Math is Required!

Learning how algorithms work and about machine learning theory can make you a better machine learning practitioner.

But, it can come later, and it can come progressively.

You can iteratively dip into textbooks and papers, as needed, with a specific focus of learning a specific thing that will make you better, faster or more productive.

Knowing how an algorithm works is important, but it cannot tell you much about when to use it.

In supervised machine learning, we are using data to build a model to approximate an unknown and noisy mapping function. If we knew enough about this function in order to correctly choose the right algorithm, we probably don’t need machine learning (e.g. we could use statistics and descriptive modeling of already understood relationships).

The badly kept secret in machine learning is that you can use machine learning algorithms like black boxes, at least initially, because the hard part is actually figuring out how to best frame the problem, prepare the data and figure out which of one thousand methods might perform well.

My focus is on writing and helping readers through the website and books.

Consulting is a huge investment of time and resources for me. It takes too much time and mental space away from writing new tutorials and books, which I think is the most effective way I can help the most people.

I’m eager to help, but reading a paper (or someone else’s material) to the level required to then explain it to you requires a large amount of time and effort. I just don’t have the capacity to this for every request that I get.

My best advice is to contact the author and ask your questions directly.

I believe an honest academic will want their work read and understood.

If you prepare well (e.g. do your homework), if you’re courteous (e.g. humble, polite, and not demanding), and if you are clear with your questions (e.g. specific), I would expect a helpful response.

I recommend practicing this process with your chosen tools/libraries and develop a portfolio of completed machine learning projects. This portfolio can be used to demonstrate your growing skills and provide a code base that you can leverage on larger and more sophisticated projects.

You can learn more about developing a machine learning portfolio here:

I recommend searching for a job in smaller companies and start-ups that value your ability to deliver results (demonstrated by your portfolio) over old-fashioned ways of hiring (e.g. having a degree on the topic).

Sorry, I cannot help you with machine learning for predicting the stock market, foreign exchange, or bitcoin prices.

I do not have a background or interest in finance.

I’m really skeptical.

I understand that unless you are operating at the highest level, that you will be eaten for lunch by the fees, by other algorithms, or by people that are operating at the highest level. I love this quote from a recent Freakonomics podcast, asking about people picking stocks:

It’s a tax on smart people who don’t realize their propensity for doing stupid things.

I also understand that short-range movements of security prices (stocks) are a random walk and that the best that you can do is to use a persistence model. I love this quote from the book “A Random Walk Down Wall Street“:

A random walk is one in which future steps or directions cannot be predicted on the basis of past history. When the term is applied to the stock market, it means that short-run changes in stock prices are unpredictable.

Generally, I recommend that you complete homework and assignments yourself.

You have chosen a course and (perhaps) have even paid money to take the course. You have chosen to invest in yourself via self-education.

In order to get the most out of this investment, you must do the work.

Also, you (may) have paid the teachers, lectures and support staff to teach you. Use that resource and ask for help and clarification about your homework or assignment from them. They work for you in some sense, and no one knows more about your homework or assignment and how it will be assed than them.

Nevertheless, if you are still struggling, perhaps you can boil your difficulty down to one sentence and contact me.

Generally, I would recommend picking a topic that you are really excited about. Motivation is important and research projects can take many years to complete. It is important that you spend that time on a project in which you are deeply interested.

I also think that the best person for you to talk to about research topics is your research advisor. This is their job.

Previous

Next

Website Questions (16)

Machine Learning Mastery is based in Australia, although we have readers and customers in the EU. Therefore, I have done my best in good faith to make Machine Learning Mastery compliant with the General Data Protection Regulation (GDPR).

Sample code is presented in tutorials on this website using a special plugin that lets you easily copy and paste the code into your editor.

It is important that when you copy code from a tutorial that white space is preserved. This is because in languages such as Python tabs and new lines are part of the language and must be used exactly as they appear in the tutorial code for the example to work correctly.

When you hover on the code box you will see a small menu appear with buttons.

For example:

Click on the second button from the far right on this menu.

It looks like two sheets of paper.

For example:

This will highlight or select all of the code in the code box.

Copy the highlighted code. The specifics of how to copy selected code depend on your platform.

Right click on the code and click “copy“.

Or, if on windows on linux: hold down the “control” key and press the “c” key on the keyboard.

Or, if on mac: hold down the “command” key and press the “c” key on the keyboard.

You will now have code copied onto your clipboard.

Open your editor and paste the code from your clipboard into the editor.

This will vary depending on your platform and your editor. Ensure you have a new document open in your editor:

Right click on the new document in the editor and click “paste“.

Or, if your editor has a menu, click the “Edit” menu and click “Paste“.

Or, if on windows or linux: click on the new document and hold the “control” key and press the “v” key.

Or, if on mac: click on the new document and hold the “command” key and press the “v” key.

You will now have the code in your editor will all white space preserved.

You can now run the code example. Ensure that any data files that the code example depend upon are in the same directory as the code file.

The books are full of tutorials that must be completed on the computer.

The books assume that you are working through the tutorials, not reading passively.

The books are intended to be read on the computer screen, next to a code editor.

The books are playbooks, they are not intended to be used as references texts and sit the shelf.

The books are updated frequently, to keep pace with changes to the field and APIs.

I hope that explains my rationale.

If you really do want a hard copy, you can purchase the book or bundle and create a printed version for your own personal use. There is no digital rights management (DRM) on the PDF files to prevent you from printing them.

I release new books every few months and develop a new super bundle at those times.

All existing customers will get early access to new books at a discount price.

Note, that you do get free updates to all of the books in your super bundle. This includes bug fixes, changes to APIs and even new chapters sometimes. I send out an email to customers for major book updates or you can contact me any time and ask for the latest version of a book.

The book “Master Machine Learning Algorithms” is for programmers and non-programmers alike. It teaches you how 10 top machine learning algorithms work, with worked examples in arithmetic, and spreadsheets, not code. The focus is on an understanding on how each model learns and makes predictions.

The book “Machine Learning Algorithms From Scratch” is for programmers that learn by writing code to understand. It provides step-by-step tutorials on how to implement top algorithms as well as how to load data, evaluate models and more. It has less on how the algorithms work, instead focusing exclusively on how to implement each in code.

The book “Deep Learning With Python” could be a prerequisite to”Long Short-Term Memory Networks with Python“. It teaches you how to get started with Keras and how to develop your first MLP, CNN and LSTM.

The book “Long Short-Term Memory Networks with Python” goes deep on LSTMs and teaches you how to prepare data, how to develop a suite of different LSTM architectures, parameter tuning, updating models and more.

The book “Deep Learning for Time Series Forecasting” focuses on how to use a suite of different deep learning models (MLPs, CNNs, LSTMs, and hybrids) to address a suite of different time series forecasting problems (univariate, multivariate, multistep and combinations).

The LSTM book teaches LSTMs only and does not focus on time series. The Deep Learning for Time Series book focuses on time series and teaches how to use many different models including LSTMs.

I do test my tutorials and projects on the blog first. It’s like the early access to ideas, and many of them do not make it to my training.

Much of the material in the books appeared in some form on my blog first and is later refined, improved and repackaged into a chapter format. I find this helps greatly with quality and bug fixing.

The books provide a more convenient packaging of the material, including source code, datasets and PDF format. They also include updates for new APIs, new chapters, bug and typo fixing, and direct access to me for all the support and help I can provide.

I believe my books offer thousands of dollars of education for tens of dollars each.

They are months if not years of experience distilled into a few hundred pages of carefully crafted and well-tested tutorials.

I think they are a bargain for professional developers looking to rapidly build skills in applied machine learning or use machine learning on a project.

Also, what are skills in machine learning worth to you? to your next project? and you’re current or next employer?

Nevertheless, the price of my books may appear expensive if you are a student or if you are not used to the high salaries for developers in North America, Australia, UK and similar parts of the world. For that, I am sorry.

It is a matching problem between an organization looking for someone to fill a role and you with your skills and background.

That being said, there are companies that are more interested in the value that you can provide to the business than the degrees that you have. Often, these are smaller companies and start-ups.

You can focus on providing value with machine learning by learning and getting very good at working through predictive modeling problems end-to-end. You can show this skill by developing a machine learning portfolio of completed projects.

My books are specifically designed to help you toward these ends. They teach you exactly how to use open source tools and libraries to get results in a predictive modeling project.