Quick Bias/Variance Trade-Off

栏目: IT技术 · 发布时间: 4年前

内容简介：This post will explain one of the most common issues in Machine Learning: The Bias/Variance Trade-off. We will see what it is, why it’s important to take it into account when building a Machine Learning model, and we will explain it intuitively and with ze

The Bias/Variance trade-off easily explained

Jaime Zornoza

Mar 9 ·7min read

This post will explain one of the most common issues in Machine Learning: The Bias/Variance Trade-off. We will see what it is, why it’s important to take it into account when building a Machine Learning model, and we will explain it intuitively and with zero math.

Quick Bias/Variance Trade-Off — Image from Unsplash

What is the Bias/Variance trade-off?

As stated above, the bias/variance trade-off is one of the most common issues that has to be addressed when building an application that will use a supervised Machine Learning model to make predictions.

It is a problem that has to do with the error of our models and also with their flexibility. The overall problem is denoted as a trade-off because generally it is not possible to improve the bias and variance of our models at the same time: usually when one goes down, the other goes up, and vice versa.

Why is the Bias/Variance trade-off important?

We want our Machine learning models to be as accurate as possible when put into production. This means that it is important to reduce the possible errors it might make, an error which has three main terms:

Source: Codecogs

This formula means the following: the overall error of our model can be divided into three terms; the error due to variance , the error due to bias and the irreducible error . This last one is the error that can’t be reduced by playing with our models or data, and it’s generally due to noise in the data or because the model's performance can’t be increased any further (i.e it has reached the same performance as the top human experts on a specific task).

Knowing this, it is obvious that we have two ways of reducing the overall error: as we can’t reduce the irreducible error we must reduce the errors that come from variance or bias.

Another way to see these errors is the following: the difference between the human level performance on some task (errors made by the humans who labelled the data), and the error our model makes on this training data is the error due to the bias of our model. The difference in error between our training error and the error on our test data is the error our model makes due to variance.

Let's see where these errors come from, how we can reduce them, and also speak about their trade-off.

The Bias/Variance trade-off explained intuitively

Alright, we’ve briefly described the bias/variance trade-off, let's see what each of these terms mean and then describe the problem in depth:

Bias: The bias of our model has to do with the assumptions that it makes about the data, and how well it fits to it when it is trained. A model with high bias doesn’t fit well the training data, has limited flexibility, or is extremely simple for the data that we have, resulting generally on a high training error.

The bias tells us how well our model approximates reality.

Variance: The variance of our model has to do with how it varies its results depending on the sample of data that it uses for its training. A model with high variance can fit specific to data well, so it has problems generalising to unseen data, resulting on a high test error.

The variance tells us how sensible our model is to the training data.

The following figure is an image that is normally used for illustrating what variance and bias are:

The explanation of this figure is the following: each dartboard gives us an idea of how well our model performs; the red crosses represent predictions, that are better when they are closer to the bulls-eye (centre of the board).

When we have high variance and high bias our predictions are very spread out and not close to the centre. Our model does not make good predictions on any data samples.
When we have high variance and low bias our predictions are spread out but around the centre of the board, so some of them hit the bullseye but others don’t. On some data, our model predicts well but on other data samples it doesn’t.
High bias and low variance means that our predictions are close together, but not near the centre of the board. Generally our model does not predict well, although it makes similar predictions on different samples.
Low variance and low bias means that our predictions are close together and centred: this is the best scenario, where our model predicts well for all kinds of data.

Let's see an example on a real world application now, to finish off acquiring an intuition of the problem.

Imagine we are building an application for recognising cats in images, if we train a model with high bias , it would predict cat images very badly, independently of the samples of the cat data you train it with.

A model with high variance would predict well the specific cat species (for example) it was trained with, but it would make errors when facing images of cats that are not very close to images it has previously been trained with. It would generalise badly to new cats.

Lastly, a model with low variance and low bias would predict well independently the data samples it was trained with; in our example, it would generalise enough to interpret when an animal is a cat or a dog, without being tricked by different cat species.

Examples of models with high bias/variance

Now that we know what bias and variance are, their relation, and intuition, let’s see some examples of models with high variance/bias.

As we said, a model with high bias does not fit well the training data . Models that can suffer from this problem are Linear models for example, as they assume a linear relationship between the features and the target variable that does not always exists.

If you are not familiar with Linear regression, you can learn about it here:

Linear Regression Explained

[ — Linear Regression explained simply — ]

towardsdatascience.com

In the following image we can see a linear regression model fit to data that clearly has no linear correlation: as a result our model will have a high bias and not perform very well.

Models with high variance are Decision Trees for example: they create specific branches and splits for samples of the training data, that are specific to this data. Moreover, if we let a decision tree grow forever, it will grow as many leave nodes as data samples, creating a specific path for each data point. This means that when it finds a new sample, if it does not exactly match the feature values of any of the samples of the training data, it will not classify it very well.

You can find a simple explanation of Decision Trees in the following article:

Decision Trees Explained

Learn everything about Decision Trees for Machine Learning

towardsdatascience.com

This is the main reason while Decision Trees always have some sort of stop condition (Nº leave nodes, minimum samples on leave node, maximum depth…). By doing this, we make our tree generalise better on new data points.

How to fix the bias/variance problem

As we have seen fitting the training data too well results in a high variance but a low bias. Not fitting the training data well results in high bias and low variance. How can we fix the problem of having high bias or high variance?

If we have a high bias (high training error), we can do the following to try to fix the problem:

Change the optimisation algorithm of our model.
Do better hyper-parameter tuning . (Run a Coarse Grid Search and then a more specific one around the best results from the first one).
Switch the model type

If we have high variance (high test error), we can try some of the following solutions:

Regularise our algorithm using L1 or L2 regularisation, dropout, tree pruning, etc…
Get more data to train on, or try data augmentation techniques.
Can also try a different model type .

Conclusion and Other resources

We have seen what the bias/variance trade off is, how it relates to our Machine Learning models, various examples, and how to tackle it. If you want to dive any deeper into it, check out the following resources: