Bias-Variance Tradeoff — Fundamentals of Machine Learning

栏目: IT技术 · 发布时间: 4年前

内容简介：The Bias-Variance tradeoff is the fundamental design decision of machine learning. It is the tradeoff between model capacity and variance of predictions — so we have to decideAll models are wrong, but some are useful — George BoxThis change in models is th

The Bias-Variance tradeoff is the fundamental design decision of machine learning. It is the tradeoff between model capacity and variance of predictions — so we have to decide in which way our model will be wrong . This article will explain thematically how this tradeoff relates to No Free Lunch and show how the handoff arises from probability distributions .

All models are wrong, but some are useful — George Box

Bias-Variance Tradeoff — Fundamentals of Machine Learning — Source — Author, Playa Hermosa, Costa Rica.

No free lunch

Whenever an optimization, program, or algorithm gains information or specialization in a region of interest it loses ability elsewhere. The idea that you cannot gain anything without cost is the No Free Lunch Theorem. This is a crucial idea to machine learning topics, and bias-variance tradeoff.

Model Accuracy as Distributions

Mapping model accuracy to a probability density function over the target space is a great mental model. It gives us the fact that the integral of any model — any distribution — must be equal to 1, or some constant in the design space. To the left, I have sketched how different models can predict in a target space.

Capacity In Neural Networks

The No Free Lunch theorem applies to neural networks as well. In this case, when we have a network with finite capacity (finite layers, neurons), we have finite predictive power. Changing training methods on a set network, changes the focus of the outputs . A network has a finite total capacity, which is that the integral of the accuracy curve is constant. A model can gain accuracy in one area (the specific testing distribution), but it’ll lose ability in other areas.

When training networks, engineers normally search for the minimization of the test error — or when predictions over the unknown are most accurate. Maximizing predictions over unknown elements is flattening the distribution of prediction accuracy (larger coverage, less specificity).

No free lunch in search and optimization

In computational complexity and optimization the no free lunch theorem is a result that states that for certain types…

en.wikipedia.org

The design decision is specificity versus generalizability. How does this continue numerically?

Bias Variance Tradeoff

Again, core to useful machine learning models is the inverse tradeoff between the underlying structure of a model and the resulting variation in downstream predictions. Numerically, this relationship we have introduced is known as the bias-variance tradeoff.

Numerically

If we look at the accuracy of a predictor “f-hat” over a dataset, a useful equation emerges. Consider how the model fit differs from the true data, f , plus noise, ε — ( y = f + ε ).

Mean error of a predictor (MSE).

What happens here is via probability rules and axioms, we derive the numerical bias-variance tradeoff. First, the definitions of bias and variance of a model.

Bias: how well a model structure matches the training set.

Bias models structure. Looking at the equation below, the bias is the difference of the mean value of the model over the dataset from the true values. Think when fitting a line to noisy data from an unknown (m,b) y=mx+b — we can add terms such as an offset or slope to get closer to the data. A model that is y=cx or y =c will be far off from the true, always — high bias. But adding terms like y=cx+d+ex² may be lower the bias at a cost.

Variance: how much the model changes from a small change in the data used.

Variance carries uncertainty in the model. How much will a small change of x change the prediction? Consider again the last example. The biased solutions y=cx or y =c will change very little with a perturbation in x . But, when we add higher order terms to further lower the bias, the risk in variance increases. Intuitively, one can see this in the equations because we have the new term with the square of the model in the expectation (left). The term on the (right) ties a mean term into the variance to account for offsets.

After a few steps (omitted), we arrive at the equation we want. The derivation is here (Wikipedia) . Note that σ is the standard deviation of the original function noise (ε).

The bias variance tradeoff.

What happens here is — no matter how the model is changed, the bias and variance terms will have an inverse relationship. There is a point where error is minimized over the training dataset D but there is no guarantee that the dataset 100% mirrors the real world.

An Example

Consider an example where we are trying to fit datapoints to an model, source — Wikipedia .

The underlying function (red) is sampled with noise. We then want to use approximations to fit the sampled points. The model is constructed using radial basis functions (blue, below). From left to right the model gains terms and capacity (multiple lines because multiple models are trained on different subsets of the data ). It is apparent that the models on the left have higher bias — similar structure, but little variation between the datapoints. Towards the right, the variance rises.

This change in models is the bias variance tradeoff.

以上所述就是小编给大家介绍的《Bias-Variance Tradeoff — Fundamentals of Machine Learning》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Bias-Variance Tradeoff — Fundamentals of Machine Learning

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

计算几何

周培德 / 2008-7 / 69.00元

《计算几何:算法设计与分析(第3版)》系统地介绍了计算几何中的基本概念、求解诸多问题的算法及复杂性分析，概括了求解几何问题所特有的许多思想方法、几何结构与数据结构。全书共分11章，包括：预备知识，几何查找（检索），多边形，凸壳及其应用，Voronoi图、三角剖分及其应用，交与并及其应用，多边形的获取及相关问题，几何体的划分与等分、算法的运动规划、几何拓扑网络设计、随机几何算法与并行几何算法等。一起来看看《计算几何》这本书的介绍吧!

码农工具