Use a Negative Binomial for Count Data

栏目: IT技术 · 发布时间: 3年前

内容简介:The Negative Binomial distribution is a discrete probability distribution that you should have in your toolkit for count data. For example, you might have data on the number of pages someone visited before making a purchase or the number of complaints or e

Up your Statistics Game

Aug 2 ·8min read

The Negative Binomial distribution is a discrete probability distribution that you should have in your toolkit for count data. For example, you might have data on the number of pages someone visited before making a purchase or the number of complaints or escalations associated with each customer service representative. Given this data, you might want to model the process and, later, see if some covariates affect the parameters. And in many contexts, you might find that a negative binomial distribution is a good fit.

In this article we’ll introduce the distribution and compute its probability mass function (PMF). We’ll cover its basic properties (mean and variance) by using the binomial theorem. This is in contrast to the usual treatments you will find which either just give you a formula or use more complicated tools to derive the results. Finally, we’ll turn to focus on the distributions’ interpretations.

The Negative Binomial Distribution

Suppose you are going to flip a biased coin that has probability p of coming up heads, which we will call a “success.” Furthermore, you are going to flip the coin continuously until at r successes occur. Let k be the number of failures along the way (so k+r coin flips happen in total).

In the context of our examples, we could imagine:

  • A user might browse your website. On each page they have a probability of p= 1% of seeing an item they want to buy. We imagine that when they have put r =3 items in their basket, they are ready to checkout. k is the number of pages they will browse and not buy from. Of course we will want to fit the model to find the true values of r and p as well as if/how they vary between users.
  • A customer service representative might in general receive complaints. After receiving complaints, there is a probability p that they will be reprimanded. Then after r times being told off, they will stop getting complaints due to changed behavior. k is the number of complaints on which they are not reprimanded before they change their behavior.

Whether you actually think this is true is, as always, up to your prior beliefs and how well the model fits the data. Also, note that the number of failures is closely related to the number of events (k versus k plus r).

It is relatively straightforward to write down the probability mass function using some combinatorics. The probability that the r -th success happens on the (k+r)-th coin flip is:

  1. The probability that there are r–1 successes on the first k+r–1 flips, times
  2. The probability of success on the ( k+r)- th flip.

There are (k+r–1) choose k orderings of (r–1) successes and k failure on the first k+r–1 flips. (The number of ways to arrange k A’s and (r–1) B’s in a line). Each has the same probability of occurring. This gives the PMF:

Use a Negative Binomial for Count Data

Hopefully you remember some basic facts about combinations and permutations. If not, here is a brief review of facts you can convince yourself of to help you out. Suppose there are 3 A’s and 2 B’s and you want to arrange them into a string like “AAABB” or “ABABA”. The number of ways to do this is 5 choose 2 (there are 5 total things and 2 B’s) which is the same as 5 choose 3 (there are 3 A’s). To see this, pretend that each letter is actually a distinct symbols (so the 5 symbols are A1, A2, A3, B1, B2). Then there are 5!=120 ways to arrange the distinct symbols. But there are 3!=6 ways to rearrange the A1 A2 A3 without changing the placements of the A’s, and 2!=2 ways to arrange the B’s. So the total number is 5!/2!3! = 10.

Now, the trick is, binomials also work for negative numbers on top, or with non-integers. For example, if we expand what we have above, we can add a minus sign to each of the k terms in the numerator:

Use a Negative Binomial for Count Data

The Negative Binomial Distribution as an actual Negative Binomial

Hence the name “negative binomial.”

The other trick to keep in mind is that we can define binomials with non-integer numbers. Using the fact that the Γ function ( Gamma function ) satisfies, for positive integers n ,

Use a Negative Binomial for Count Data
The Gamma Function extends the Factorial

We can write our binomial coefficients in the form

Use a Negative Binomial for Count Data
Binomial Coefficients with n not an integer

And this enables us to allow that, in the negative binomial distribution, the parameter r does not have to be an integer. This will be useful because when we estimate our models, we generally don’t have a way to constrain r to be an integer. So a non-integer value for r won’t be a problem. (We will require r to be positive, however). We’ll come back to how to interpret a non-integer value of r .

Properties of the Negative Binomial Distribution

We would like to compute the expectation and variance. As a warmup, let’s check that the negative binomial distribution is in fact a probability distribution. For convenience, let q=1–p .

Use a Negative Binomial for Count Data

The Negative Binomial Distribution is in fact a Probability Distribution

The crucial point is the third line, where we used the binomial theorem (yes, it works with negative exponents).

Now let’s compute the expectation:

Use a Negative Binomial for Count Data

Expected Value of the Negative Binomial Distribution

To get the third line, we used the identity

Use a Negative Binomial for Count Data

Where we used the binomial theorem again to get the third to last line.

Warning: this is the opposite of what you will find on Wikipedia as of this writing. It is what you will find from Wolfram (the makers of Mathematica). This is because Wikipedia thinks about the number of successes before r failures, where as we count failures before r successes. In general, there is a variety of similar ways to parameterize/interpret the distribution, so be careful you have everything straight when looking at formulas in different places.

Next, we can compute the variance in two steps. First, we repeat the trick from above, using the identity twice this time to get the third line. We again use the binomial theorem to compute the sum and obtain the third-to-last line.

Use a Negative Binomial for Count Data

Now we can compute:

Use a Negative Binomial for Count Data

Variance of the Negative Binomial Distribution

Again, this is the opposite of what is on Wikipedia.

Interpretation of the Negative Binomial Distribution

We have covered the “defining interpretation” of the Negative Binomial Distribution: it is the number of failures before r success occur, with the probability of success at each step being p . But there are a few other ways to look at the distribution that can be illuminating and also help interpret the case where r is not an integer.

Over-Dispersed Poisson Distribution

The Poisson distribution is a very simple model for count data, which assumes that events happen randomly at a certain rate. Then it models the distribution of how many events will occur in a given time interval. In the context of our examples, it would say that:

  • Customer service representatives get complaints at a constant rate. The variation in counts is just determined by random variation. (Compare the model where their behavior eventually changes). Again, in modeling this, we could model a difference in rate between representatives based on exogenous covariates.

One big problem with the Poisson distribution is that the variance is equal to the mean. This may not fit our data. Let’s say we parameterize our Negative Binomial distribution with a mean λ and stopping parameter r . Then we have

Use a Negative Binomial for Count Data

Re-parametrization of the Negative Binomial Distribution

Our probability mass function becomes

Use a Negative Binomial for Count Data

Probability Mass Function for the Negative Binomial parameterized with Mean λ

Now let’s consider what happens if we take the limit as r →∞ holding λ fixed. (This means that the probability of success goes to 1 as well, in the way defined by p=r/[λ+r]). In this limit, the binomial term approaches (–r) to the power of k divided by k! and r + λ approaches r.

Use a Negative Binomial for Count Data

Limit of the Negative Binomial for Large r with fixed mean λ

In the last line, the r to the k-th powers cancel and we have used the definition of the exponential. The result is that we recover the Poission distribution.

Therefore, we can interpret the Negative Binomial Distribution as a generalization of the Poisson distribution. If the distribution is in fact Poission, we will see a large r and p close to 1. This makes sense because as p approaches 1, the variance approaches the mean. When p is smaller than one, the variance is higher than that of a Poisson distribution with the same mean, so we can see that the Negative Binomial distribution generalizes Poisson by increasing the variance.

Mixture of Poisson Distributions

The Negative Binomial Distribution also arises as a mixture of Poisson random variables. For example, suppose that our customer service representatives each receive complaints at a given rate (they never change their behavior), but that rate varies between representatives. If that rate is randomly distributed according to a Gamma distribution , we get a Negative Binomial Distribution for the ensemble.

The intuition behind this is as follows. We initially said the Negative Binomial Distribution was the count of failures before r successes when we do coin flips. Instead, replace the coin flip with two Poisson processes. Process one (the “success” process) has rate p and process two, the “failure” process, has rate (1-p). This means that instead of thinking of the Negative Binomial Distribution as counting coin flips, we think that there are independent processes generating “success” and “failure” independently and we just count how many failures before a certain number of successes.

Now, the Gamma Distribution is the distribution of waiting times for Poisson processes. Let T be the waiting time for r successes from the “success” process. T is Gamma distributed. Then the number of failures has a mean of (1–p)T and is Poisson distributed.

Conclusion

The last few points worth pointing out. First of all, there is no analytic way to fit the Negative Binomial Distribution to data. Instead, use the Maximum Likelihood Estimator and numerical estimation. You can use the statsmodels package to do this in Python.

Also, it is possible to do Negative Binomial regression, modeling the effects of covariates. We’ll save that for a future article.


以上所述就是小编给大家介绍的《Use a Negative Binomial for Count Data》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

新机器的灵魂

新机器的灵魂

Tracy Kidder / 龚益、高宏志 / 机械工业出版社华章公司 / 2011-10 / 45.00元

计算机从1981年开始发生巨大的变化。《新机器的灵魂》完整地记录下了当时一家公司齐心协力把一种新的小型计算机推向市场的过程中所发生的一系列戏剧性的、充满戏剧色彩的、激动人心的故事。 本书以美国通用数据公司研制鹰电子计算机的全过程为主线,对美国计算机工业的发展和管理中鲜为人知的侧面,作了条理清晰、颇具诗情画意的描述。 你想知道一代新型计算机怎样诞生,精明干练而又富于幽默感的工程技术人员怎......一起来看看 《新机器的灵魂》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

html转js在线工具
html转js在线工具

html转js在线工具