Use a Negative Binomial for Count Data

栏目: IT技术 · 发布时间: 4年前

内容简介：The Negative Binomial distribution is a discrete probability distribution that you should have in your toolkit for count data. For example, you might have data on the number of pages someone visited before making a purchase or the number of complaints or e

Up your Statistics Game

Ravi Charan

Aug 2 ·8min read

The Negative Binomial distribution is a discrete probability distribution that you should have in your toolkit for count data. For example, you might have data on the number of pages someone visited before making a purchase or the number of complaints or escalations associated with each customer service representative. Given this data, you might want to model the process and, later, see if some covariates affect the parameters. And in many contexts, you might find that a negative binomial distribution is a good fit.

In this article we’ll introduce the distribution and compute its probability mass function (PMF). We’ll cover its basic properties (mean and variance) by using the binomial theorem. This is in contrast to the usual treatments you will find which either just give you a formula or use more complicated tools to derive the results. Finally, we’ll turn to focus on the distributions’ interpretations.

The Negative Binomial Distribution

Suppose you are going to flip a biased coin that has probability p of coming up heads, which we will call a “success.” Furthermore, you are going to flip the coin continuously until at r successes occur. Let k be the number of failures along the way (so k+r coin flips happen in total).

In the context of our examples, we could imagine:

A user might browse your website. On each page they have a probability of p= 1% of seeing an item they want to buy. We imagine that when they have put r =3 items in their basket, they are ready to checkout. k is the number of pages they will browse and not buy from. Of course we will want to fit the model to find the true values of r and p as well as if/how they vary between users.
A customer service representative might in general receive complaints. After receiving complaints, there is a probability p that they will be reprimanded. Then after r times being told off, they will stop getting complaints due to changed behavior. k is the number of complaints on which they are not reprimanded before they change their behavior.

Whether you actually think this is true is, as always, up to your prior beliefs and how well the model fits the data. Also, note that the number of failures is closely related to the number of events (k versus k plus r).

It is relatively straightforward to write down the probability mass function using some combinatorics. The probability that the r -th success happens on the (k+r)-th coin flip is:

The probability that there are r–1 successes on the first k+r–1 flips, times
The probability of success on the ( k+r)- th flip.

There are (k+r–1) choose k orderings of (r–1) successes and k failure on the first k+r–1 flips. (The number of ways to arrange k A’s and (r–1) B’s in a line). Each has the same probability of occurring. This gives the PMF:

Hopefully you remember some basic facts about combinations and permutations. If not, here is a brief review of facts you can convince yourself of to help you out. Suppose there are 3 A’s and 2 B’s and you want to arrange them into a string like “AAABB” or “ABABA”. The number of ways to do this is 5 choose 2 (there are 5 total things and 2 B’s) which is the same as 5 choose 3 (there are 3 A’s). To see this, pretend that each letter is actually a distinct symbols (so the 5 symbols are A1, A2, A3, B1, B2). Then there are 5!=120 ways to arrange the distinct symbols. But there are 3!=6 ways to rearrange the A1 A2 A3 without changing the placements of the A’s, and 2!=2 ways to arrange the B’s. So the total number is 5!/2!3! = 10.

Now, the trick is, binomials also work for negative numbers on top, or with non-integers. For example, if we expand what we have above, we can add a minus sign to each of the k terms in the numerator:

Hence the name “negative binomial.”

The other trick to keep in mind is that we can define binomials with non-integer numbers. Using the fact that the Γ function ( Gamma function ) satisfies, for positive integers n ,

We can write our binomial coefficients in the form

And this enables us to allow that, in the negative binomial distribution, the parameter r does not have to be an integer. This will be useful because when we estimate our models, we generally don’t have a way to constrain r to be an integer. So a non-integer value for r won’t be a problem. (We will require r to be positive, however). We’ll come back to how to interpret a non-integer value of r .

Properties of the Negative Binomial Distribution

We would like to compute the expectation and variance. As a warmup, let’s check that the negative binomial distribution is in fact a probability distribution. For convenience, let q=1–p .

The crucial point is the third line, where we used the binomial theorem (yes, it works with negative exponents).

Now let’s compute the expectation:

To get the third line, we used the identity

Where we used the binomial theorem again to get the third to last line.

Warning: this is the opposite of what you will find on Wikipedia as of this writing. It is what you will find from Wolfram (the makers of Mathematica). This is because Wikipedia thinks about the number of successes before r failures, where as we count failures before r successes. In general, there is a variety of similar ways to parameterize/interpret the distribution, so be careful you have everything straight when looking at formulas in different places.

Next, we can compute the variance in two steps. First, we repeat the trick from above, using the identity twice this time to get the third line. We again use the binomial theorem to compute the sum and obtain the third-to-last line.

Now we can compute:

Again, this is the opposite of what is on Wikipedia.

Interpretation of the Negative Binomial Distribution

We have covered the “defining interpretation” of the Negative Binomial Distribution: it is the number of failures before r success occur, with the probability of success at each step being p . But there are a few other ways to look at the distribution that can be illuminating and also help interpret the case where r is not an integer.

Over-Dispersed Poisson Distribution

The Poisson distribution is a very simple model for count data, which assumes that events happen randomly at a certain rate. Then it models the distribution of how many events will occur in a given time interval. In the context of our examples, it would say that:

Customer service representatives get complaints at a constant rate. The variation in counts is just determined by random variation. (Compare the model where their behavior eventually changes). Again, in modeling this, we could model a difference in rate between representatives based on exogenous covariates.

One big problem with the Poisson distribution is that the variance is equal to the mean. This may not fit our data. Let’s say we parameterize our Negative Binomial distribution with a mean λ and stopping parameter r . Then we have

Our probability mass function becomes

Now let’s consider what happens if we take the limit as r →∞ holding λ fixed. (This means that the probability of success goes to 1 as well, in the way defined by p=r/[λ+r]). In this limit, the binomial term approaches (–r) to the power of k divided by k! and r + λ approaches r.

In the last line, the r to the k-th powers cancel and we have used the definition of the exponential. The result is that we recover the Poission distribution.

Therefore, we can interpret the Negative Binomial Distribution as a generalization of the Poisson distribution. If the distribution is in fact Poission, we will see a large r and p close to 1. This makes sense because as p approaches 1, the variance approaches the mean. When p is smaller than one, the variance is higher than that of a Poisson distribution with the same mean, so we can see that the Negative Binomial distribution generalizes Poisson by increasing the variance.

Mixture of Poisson Distributions

The Negative Binomial Distribution also arises as a mixture of Poisson random variables. For example, suppose that our customer service representatives each receive complaints at a given rate (they never change their behavior), but that rate varies between representatives. If that rate is randomly distributed according to a Gamma distribution , we get a Negative Binomial Distribution for the ensemble.

The intuition behind this is as follows. We initially said the Negative Binomial Distribution was the count of failures before r successes when we do coin flips. Instead, replace the coin flip with two Poisson processes. Process one (the “success” process) has rate p and process two, the “failure” process, has rate (1-p). This means that instead of thinking of the Negative Binomial Distribution as counting coin flips, we think that there are independent processes generating “success” and “failure” independently and we just count how many failures before a certain number of successes.

Now, the Gamma Distribution is the distribution of waiting times for Poisson processes. Let T be the waiting time for r successes from the “success” process. T is Gamma distributed. Then the number of failures has a mean of (1–p)T and is Poisson distributed.

Conclusion

The last few points worth pointing out. First of all, there is no analytic way to fit the Negative Binomial Distribution to data. Instead, use the Maximum Likelihood Estimator and numerical estimation. You can use the statsmodels package to do this in Python.

Also, it is possible to do Negative Binomial regression, modeling the effects of covariates. We’ll save that for a future article.

以上所述就是小编给大家介绍的《Use a Negative Binomial for Count Data》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Use a Negative Binomial for Count Data

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

因计算机而强大

[美]西摩佩珀特 Seymour Papert / 梁栋 / 新星出版社 / 2019-1 / 38

本书有两个中心主题—— 孩子可以轻松自如地学习使用计算机；学习使用计算机能够改变他们学习其他知识的方式。（前苹果公司总裁约翰·斯卡利）最有可能带来文化变革的就是计算机的不断普及。计算机不仅是一个工具，它对我们的心智有着根本和深远的影响。计算机不仅帮助我们学习，还帮助我们学习怎样学习。计算机是一种调解人与人之间关系的移情对象。一个数学的头脑......一起来看看《因计算机而强大》这本书的介绍吧!

码农工具