Bias-Variance Tradeoff — Fundamentals of Machine Learning

栏目: IT技术 · 发布时间: 4年前

内容简介:The Bias-Variance tradeoff is the fundamental design decision of machine learning. It is the tradeoff between model capacity and variance of predictions — so we have to decideAll models are wrong, but some are useful — George BoxThis change in models is th

The Bias-Variance tradeoff is the fundamental design decision of machine learning. It is the tradeoff between model capacity and variance of predictions — so we have to decide in which way our model will be wrong . This article will explain thematically how this tradeoff relates to No Free Lunch and show how the handoff arises from probability distributions .

All models are wrong, but some are useful — George Box

Bias-Variance Tradeoff — Fundamentals of Machine Learning

Source — Author, Playa Hermosa, Costa Rica.

No free lunch

Whenever an optimization, program, or algorithm gains information or specialization in a region of interest it loses ability elsewhere. The idea that you cannot gain anything without cost is the No Free Lunch Theorem. This is a crucial idea to machine learning topics, and bias-variance tradeoff.

Bias-Variance Tradeoff — Fundamentals of Machine Learning

Possible distributions over a target space. No model can achieve perfect accuracy everywhere — the capacity is spread differently. The order of most to least specified is Green, Yellow, Red, Blue.

Model Accuracy as Distributions

Mapping model accuracy to a probability density function over the target space is a great mental model. It gives us the fact that the integral of any model — any distribution — must be equal to 1, or some constant in the design space. To the left, I have sketched how different models can predict in a target space.

Capacity In Neural Networks

The No Free Lunch theorem applies to neural networks as well. In this case, when we have a network with finite capacity (finite layers, neurons), we have finite predictive power. Changing training methods on a set network, changes the focus of the outputs . A network has a finite total capacity, which is that the integral of the accuracy curve is constant. A model can gain accuracy in one area (the specific testing distribution), but it’ll lose ability in other areas.

When training networks, engineers normally search for the minimization of the test error — or when predictions over the unknown are most accurate. Maximizing predictions over unknown elements is flattening the distribution of prediction accuracy (larger coverage, less specificity).

The design decision is specificity versus generalizability. How does this continue numerically?

Bias Variance Tradeoff

Again, core to useful machine learning models is the inverse tradeoff between the underlying structure of a model and the resulting variation in downstream predictions. Numerically, this relationship we have introduced is known as the bias-variance tradeoff.

Numerically

Bias-Variance Tradeoff — Fundamentals of Machine Learning
Notation for predictive model.

If we look at the accuracy of a predictor “f-hat” over a dataset, a useful equation emerges. Consider how the model fit differs from the true data, f , plus noise, ε — ( y = f + ε ).

Mean error of a predictor (MSE).

What happens here is via probability rules and axioms, we derive the numerical bias-variance tradeoff. First, the definitions of bias and variance of a model.

Bias: how well a model structure matches the training set.

Bias models structure. Looking at the equation below, the bias is the difference of the mean value of the model over the dataset from the true values. Think when fitting a line to noisy data from an unknown (m,b) y=mx+b — we can add terms such as an offset or slope to get closer to the data. A model that is y=cx or y =c will be far off from the true, always — high bias. But adding terms like y=cx+d+ex² may be lower the bias at a cost.

Variance: how much the model changes from a small change in the data used.

Variance carries uncertainty in the model. How much will a small change of x change the prediction? Consider again the last example. The biased solutions y=cx or y =c will change very little with a perturbation in x . But, when we add higher order terms to further lower the bias, the risk in variance increases. Intuitively, one can see this in the equations because we have the new term with the square of the model in the expectation (left). The term on the (right) ties a mean term into the variance to account for offsets.

After a few steps (omitted), we arrive at the equation we want. The derivation is here (Wikipedia) . Note that σ is the standard deviation of the original function noise (ε).

The bias variance tradeoff.

What happens here is — no matter how the model is changed, the bias and variance terms will have an inverse relationship. There is a point where error is minimized over the training dataset D but there is no guarantee that the dataset 100% mirrors the real world.

An Example

Consider an example where we are trying to fit datapoints to an model, source — Wikipedia .

Bias-Variance Tradeoff — Fundamentals of Machine Learning

Sampled data and true function.

The underlying function (red) is sampled with noise. We then want to use approximations to fit the sampled points. The model is constructed using radial basis functions (blue, below). From left to right the model gains terms and capacity (multiple lines because multiple models are trained on different subsets of the data ). It is apparent that the models on the left have higher bias — similar structure, but little variation between the datapoints. Towards the right, the variance rises.

Bias-Variance Tradeoff — Fundamentals of Machine Learning

Bias-Variance Tradeoff — Fundamentals of Machine Learning

Bias-Variance Tradeoff — Fundamentals of Machine Learning

Different model fits. From left to right an increasing number of terms in the model are used. Each model is trained on a different subset of the sampled points. This is a visualization of the bias-variance tradeoff. Source — Wikipedia .

This change in models is the bias variance tradeoff.


以上所述就是小编给大家介绍的《Bias-Variance Tradeoff — Fundamentals of Machine Learning》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

计算几何

计算几何

周培德 / 2008-7 / 69.00元

《计算几何:算法设计与分析(第3版)》系统地介绍了计算几何中的基本概念、求解诸多问题的算法及复杂性分析,概括了求解几何问题所特有的许多思想方法、几何结构与数据结构。全书共分11章,包括:预备知识,几何查找(检索),多边形,凸壳及其应用,Voronoi图、三角剖分及其应用,交与并及其应用,多边形的获取及相关问题,几何体的划分与等分、算法的运动规划、几何拓扑网络设计、随机几何算法与并行几何算法等。一起来看看 《计算几何》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

MD5 加密
MD5 加密

MD5 加密工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具