A Beginner’s Guide to Machine Learning Model Monitoring

栏目: IT技术 · 发布时间: 4年前

内容简介:There are several metrics that you can use to monitor an ML model. The metric(s) you choose depends on various factors:Below are various metrics that are commonly used in model monitoring:Also known as a

Metrics in Model Monitoring

There are several metrics that you can use to monitor an ML model. The metric(s) you choose depends on various factors:

  • Is it a regression or classification task?
  • What is the business objective? Eg. precision vs recall
  • What is the distribution of the target variable?

Below are various metrics that are commonly used in model monitoring:

Type 1 Error

Also known as a false positive , it is an outcome where the model incorrectly predicts the positive class. For example, a pregnancy test with a positive outcome, when you aren’t pregnant is an example of a type 1 error.

Type 2 Error

Also known as a false negative , it is an outcome where the model incorrectly predicts the negative class. An example of this is when a result says that you don’t have cancer when you actually do.

Accuracy

The accuracy of a model is simply equal to the fraction of predictions that a model got right and is represented by the following equation:

Precision

Precision attempts to answer “What proportion of positive identifications was actually correct?” and can be represented by the following equation:

Recall

Recall attempts to answer “What proportion of actual positives was identified correctly?” and can be represented by the following equation:

F1 score

The F1 score is a measure of a test’s accuracy — it is the harmonic mean of precision and recall. It can have a maximum score of 1 (perfect precision and recall) and a minimum of 0. Overall, it is a measure of the preciseness and robustness of your model and can be represented with the following equation:

R-Squared

R Squared is a measurement that tells you to what extent the proportion of variance in the dependent variable is explained by the variance in the independent variables. In simpler terms, while the coefficients estimate trends, R-squared represents the scatter around the line of best fit.

For example, if the R² is 0.80, then 80% of the variation can be explained by the model’s inputs.

If the R² is 1.0 or 100%, that means that all movements of the dependent variable can be entirely explained by the movements of the independent variables.

Adjusted R-Squared

Every additional independent variable added to a model always increases the R² value — therefore, a model with several independent variables may seem to be a better fit even if it isn’t. This is where Adjusted R² comes in. The adjusted R² compensates for each additional independent variable and only increases if each given variable improves the model above what is possible by probability.

Mean Absolute Error (MAE)

The absolute error is the difference between the predicted values and the actual values. Thus, the mean absolute error is the average of the absolute error.

Mean Squared Error (MSE)

The mean squared error or MSE is similar to the MAE, except you take the average of the squared differences between the predicted values and the actual values.

Because the differences are squared, larger errors are weighted more highly, and so this should be used over the MAE when you want to minimize large errors. Below is the equation for MSE, as well as the code.

O verall, the metric(s) that you choose to monitor ultimately depends on the task at hand, and the business context that you’re working in.

For example, it’s common knowledge in the data science world that accuracy metrics are irrelevant when it comes to fraud detection models because the percentage of fraudulent transactions is usually less than 1%. Therefore, even if a fraudulent detection model has an accuracy of 99% because it classifies all transactions as non-fraudulent, that doesn’t help us determine whether the model is effective or not.

Another example is that the severity of a false negative classification when it comes to cancer screening tests is much worse than a false positive classification. Saying that a patient with cancer doesn’t have cancer can ultimately lead to his or her death. This is much worse than saying that a patient has cancer, conducting further tests, only to realize that the patient does not have cancer. (It’s always better to be safe than sorry!)


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Algorithms Sequential & Parallel

Algorithms Sequential & Parallel

Russ Miller、Laurence Boxer / Charles River Media / 2005-08-03 / USD 59.95

With multi-core processors replacing traditional processors and the movement to multiprocessor workstations and servers, parallel computing has moved from a specialty area to the core of computer scie......一起来看看 《Algorithms Sequential & Parallel》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试