My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

栏目: IT技术 · 发布时间: 3年前

内容简介:Although deep learning technique works, it’s most of the time unclearA neural network identifies that a cell biopsy is cancerous — It does not tell why.Typically, a classifier model is forced to decide between two possible outcomes even though it does not

Motivation

Although deep learning technique works, it’s most of the time unclear why deep learning works . This makes it tricky to deploy artificial intelligence in high-risk areas like aviation, judiciary, and medicine.

A neural network identifies that a cell biopsy is cancerous — It does not tell why.

Typically, a classifier model is forced to decide between two possible outcomes even though it does not have any clue. It has just flipped a coin. in real life, a model for medical diagnosis should not only care about the accuracy but also about how certain the prediction is. If the uncertainty is too high, a doctor should take this into account in his decision process.

A deep learning model should be able to say: “sorry, I don’t know”.

A model for self-driving cars that has learned from an insufficiently diverse training set is another interesting example. If the car is unsure where there is a pedestrian on the road, we would expect it to let the driver take charge.

Networks with greater generalization are less interpretable. Interpretable networks don’t generalize well. ( source )

Some models may not require explanations because they are used in low-risk applications, such as a product recommender system. Nevertheless, integrating critical models into our daily lives requires interpretability to increase the social acceptance of AI . This is because people like to attribute beliefs, desires, and intentions to things ( source ).

Understanding and explaining what a neural network doesn’t know is crucial for the end-users. Practitioners also seek better interpretability to build more robust models that are resistant toadversarial attacks.

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

Images By Goodfellow et al, ICLR 2015. Explaining and Harnessing Adversarial Examples . Adding a little noise to a photo of panda causes incorrect classification as gibbon.

In the following sections, we will have a closer look at the concept of uncertainty. We also introduce easy techniques for how to assess uncertainty in deep learning models.

Types of uncertainty

There are two major different types of uncertainty in deep learning: epistemic uncertainty and aleatoric uncertainty. Both terms do not roll off the tongue easily.

Epistemic uncertaintydescribes what the model does not know because training data was not appropriate. Epistemic uncertainty is due to limited data and knowledge. Given enough training samples, epistemic uncertainty will decrease. Epistemic uncertainty can arise in areas where there are fewer samples for training.

Aleatoric uncertaintyis the uncertainty arising from the natural stochasticity of observations. Aleatoric uncertainty cannot be reduced even when more data is provided. When it comes to measurement errors, we call it homoscedastic uncertainty because it is constant for all samples. Input data-dependent uncertainty is known as heteroscedastic uncertainty.

The illustration below represents a real linear process ( y=x ) that was sampled around x=-2.5 and x=2.5 .

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

An exhibit of the different kinds of uncertainty in a linear regression context (Image by Michel Kana).

A sensor malfunction introduced noise in the left cloud. Noisy measurements of the underlying process leading to high aleatoric uncertainty in the left cloud. This uncertainty cannot be reduced by additional measurements, because the sensor keeps producing errors around x=-2.5 by design .

High epistemic uncertainty arises in regions where there are few observations for training. This is because too many plausible model parameters can be suggested for explaining the underlying ground truth phenomenon. This is the case of the left and right parts of our clouds. Here we are not sure which model parameters describe the data best. Given more data in that space uncertainty would decrease. In high-risk applications, it is important to identify such spaces.

How to access uncertainty using Dropout

Bayesian statistics allow us to derive conclusions based on both data and our prior knowledge about the underlying phenomenon. One of the key distinctions is that parameters are distributions instead of fixed weights.

If instead of learning the model’s parameters, we could learn a distribution over them, we would be able to estimate uncertainty over the weights.

How can we learn the weights’ distribution? Deep Ensembling is a powerful technique where a large number of models or re-multiple copies of a model are trained on respective datasets and their resulting predictions collectively build a predictive distribution.

Because ensembling can require plentiful computing resources an alternative approach was suggested: Dropout as a Bayesian Approximation of a model ensemble. This technique was introduced by Yarin Gal and Zoubin Ghahramani in their 2017’s paper .

Dropout is a well-used practice as a regularizer in deep learning to avoid overfitting. It consists of randomly sample network nodes and drop them out during training. Dropout zeros out neurons randomly according to a Bernoulli distribution.

In general, there seems to be a strong link between regularization and prior distributions in Bayesian models. Dropout is not the only example. The frequently used L2 regularization is essentially a Gaussian prior.

In their paper, Yarin and Zoubin showed that a neural network with dropout applied before every weight layer is mathematically equivalent to a Bayesian approximation of the Gaussian process.

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

Image By Yu Ri Tan on yuritan.nl — Dropout changes model architecture at different forward passes allowing Bayesian approximation. (Authorized citation of the image obtained from Yu Ri Tan)

With droupout, each subset of nodes that is not dropped out defines a new network. The training process can be thought of as training 2^m different models simultaneously, where m is the number of nodes in the network. For each batch, a randomly sampled set of these models is trained.

The key idea is to do dropout at both training and testing time. At test time, the paper suggests repeating prediction a few hundreds times with random dropout. The average of all predictions is the estimate. For the uncertainty interval, we simply calculate the variance of predictions. This gives the ensemble’s uncertainty.

Predicting Epistemic Uncertainty

We will assess epistemic uncertainty on a regression problem using data generated by adding normally distributed noise to the function y=x as follows:

  • 100 data points are generated in the left cloud between x=-2 and x=-3
  • 100 data points are generated in the right cloud between x=2 and x=3.
  • Noise is added to the left cloud with 10 times higher variance than the right cloud.

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

Below we design two simple neural networks, one without dropout layers and a second one with a dropout layer between hidden layers. The dropout layer randomly disables 5% on neurons during each training and inference batch. We also include L2 regularizers to apply penalties on layer parameters during optimization.

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

network without dropout layers

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

layout with dropout layers

The rmsprop optimizer is used to train batches of 10 points by minimizing the mean squared errors. The training performance is displayed below. Convergence is very fast for both models. The model with dropout exhibits slightly higher loss with more stochastic behavior. This is because random regions of the network are disabled during training causing the optimizer to jump across local minima of the loss function.

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

Below, we show how the models perform on test data. The model without dropout predicts a straight line with a perfect R2 score. Including dropout caused a nonlinear prediction line with an R2 score of 0.79. Although dropout overfits less, has higher bias, and decreased accuracy, it highlights uncertainty in predictions in the regions without training samples. The prediction line has higher variance in those regions, which can be used to computed epistemic uncertainty.

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

The model with dropout exhibits predictions with high variance in regions without training samples. This property is used to approximate epistemic uncertainty.

Below, we evaluate both models (with and without dropout) on a test dataset, while using dropout layers at evaluation a few hundreds of times. This is equivalent to simulating a Gaussian process. We obtain each time, a range of output values for each input scalar from test data. This allows us to compute the standard deviation of the posterior distribution and display it as a measure of epistemic uncertainty .

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

The model without dropout predicts fixed values without 100% certainty even in regions without training samples.

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

The model with dropout estimates high epistemic uncertainty in regions without training samples.

As expected, data for x <-3 and x>3 have high epistemic uncertainty as no training data is available at these points.

Dropout allows the model to say: “all my predictions for x <-3 and x>3 are just my best guess.”

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

Image by OpenClipart-Vectors on Pixabay

Polynomial Regression

In this section, we investigate how to assess epistemic uncertainty by dropout for more complex tasks, such as polynomial regression.

For this purpose, we generate a synthetic training dataset randomly sampled from a sinusoidal function, and adding noise of different amplitudes.

The results below suggest that including dropout brings a way to access epistemic uncertainty in the region where there is no data, even for nonlinear data. Although dropout affects model performance, it clearly shows that predictions are less certain in data regions where there were not enough training samples.

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

Model without dropout overfits to training samples and shows over-confidence when predicting in regions without training data.

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

The model with dropout has a high bias, but is less confident in regions without training data. Epistemic uncertainty is higher where training samples are missing.

Predicting Aleatoric Uncertainty

While epistemic uncertainty is a property of the model, aleatoric uncertainty is a property of the data. Aleatoric uncertainty captures our uncertainty concerning information that our data cannot explain.

When aleatoric uncertainty is a constant, not dependent on the input data, it is called homoscedastic uncertainty , otherwise, the term heteroscedastic uncertainty is used.

Heteroscedastic uncertainty depends on the input data and therefore can be predicted as a model output. Homoscedastic uncertainty can be estimated as a task-dependent model parameter.

Learning heteroscedastic uncertainty is done by replacing the mean-squared error loss function with the following ( source ):

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

The model predicts both a mean y ^ and variance σ ². If the residual is very large, the model will tend to predict large variance. The log term prevents the variance to grow infinitely large. An implementation of this aleatoric loss function in Python is provided below.

The aleatoric loss can be used to train a neural network. Below, we illustrate an architecture that is similar to the one used for epistemic uncertainty in the previous section with two differences:

  • there is no dropout layer between hidden layers,
  • the output is a 2D tensor instead of a 1D tensor. This allows the network to learn not only the response y^ , but also the variance σ ².

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

The learned loss attenuation forced the network to find weights and variance which minimize the loss during training, as shown below.

Inference for aleatoric uncertainty is done without dropout.The result below confirms our expectation: the aleatoric uncertainty is higher for data on the left than on the right. The left region has noisy data due to a sensor error around x=-2.5 . Adding more samples wouldn’t fix the problem. Noise will still be present in that region. By including aleatoric uncertainty in the loss function, the model will predict with less confidence for test data falling in the regions, where training samples were noisy.

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

The model with dropout detects regions with noisy training data. This helps in predicting with higher aleatoric uncertainty in these regions.

Measuring aleatoric uncertainty can become crucial in computer vision. Such uncertainty in images can be attributed to occlusions when cameras can’t see through objects. Aleatoric uncertainty can also be caused by over-exposed regions of images or the lack of some visual features.

Both epistemic and aleatoric uncertainty can be summed up to provide total uncertainty. Including the total level of uncertainty in predictions of a self-driving car can be very useful.

My Deep Learning Model Says: “sorry, I don’t know the answer”. That’s Absolutely OK.

Image By Alex Kendall, the University of Cambridge on Arvix — Aleatoric and epistemic uncertainty for semantic segmentation in computer vision. Aleatoric uncertainty (d) captures object boundaries where labels are noisy due to occlusion or distance. Epistemic uncertainty (e) highlights regions where the model is unfamiliar with image features such as an interrupted footpath.

Conclusion

In this article we demonstrated how using Dropout at inference time is equivalent to doing Bayesian approximation for assessing uncertainty in deep learning predictions.

Knowing how confident a model is with its predictions is important in a business context. Uber has been using this technique to assess uncertainty in time-series predictions .

Properly including uncertainty in machine learning can also help to debug models and making them more robust against adversarial attacks. The new TensorFlow Probability offers probabilistic modeling as add-ons for deep learning models .

You can read further through my article about responsible data science and see what can go wrong when we trust our machine learning models a little too much. This comprehensive introduction to deep learning and practical guide to Bayesian inference can help deepen and challenge classical approaches to deep learning.

Thanks to Anne Bonner from Towards Data Science for her editorial notes.

Stay safe in uncertain times.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

趣学Python编程

趣学Python编程

Jason Briggs / 尹哲 / 人民邮电出版社 / 2014-3 / 45.00元

python是一款解释型、面向对象、动态数据类型的高级程序设计语言。python语法简捷而清晰,具有丰富和强大的类库,因而在各种行业中得到广泛的应用。对于初学者来讲,python是一款既容易学又相当有用的编程语言,国内外很多大学开设这款语言课程,将python作为一门编程语言学习。 《趣学python编程》是一本轻松、快速掌握python编程的入门读物。全书分为3部分,共18章。第1部分是第......一起来看看 《趣学Python编程》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具