One-proportion and goodness of fit test (in R and by hand)

栏目: IT技术 · 发布时间: 5年前

One-proportion and goodness of fit test (in R and by hand)

Learn how to perform the one proportion and goodness of fit test, useful to check if a distribution follows a specific known distribution

Antoine Soetewey

May 13 ·15min read

One-proportion and goodness of fit test (in R and by hand) — Photo by Roman Mager

Introduction

In a previous article, I presented the Chi-square test of independence in R which is used to test the independence between two categorical variables. In this article, I show how to perform, first in R and then by hand, the:

one-proportion test (also referred as one-sample proportion test)
Chi-square goodness of fit test

The first test is used to compare an observed proportion to an expected proportion, when the qualitative variable has only two categories . The second test is used to compare multiple observed proportions to multiple expected proportions, in a situation where the qualitative variable has two or more categories .

Both tests allow to test the equality of proportions between the levels of the qualitative variable or to test the equality with given proportions. These given proportions could be determined arbitrarily or based on the theoretical probabilities of a known distribution.

In R

Data

For this section, we use the same dataset than in the article on descriptive statistics . It is the well-known iris dataset, to which we add the variable size . The variable size corresponds to small if the length of the petal is smaller than the median of all flowers, big otherwise:

# load iris dataset
dat <- iris# create size variable
dat$size <- ifelse(dat$Sepal.Length < median(dat$Sepal.Length),
  "small", "big"
)# show first 5 observations
head(dat, n = 5)##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species  size
## 1          5.1         3.5          1.4         0.2  setosa small
## 2          4.9         3.0          1.4         0.2  setosa small
## 3          4.7         3.2          1.3         0.2  setosa small
## 4          4.6         3.1          1.5         0.2  setosa small
## 5          5.0         3.6          1.4         0.2  setosa small

One-proportion test

For this example, we have a sample of 150 flowers and we want to test whether the proportion of small flowers is the same than the proportion of big flowers (measured by the variable size ). Here are the number of flowers by size, and the corresponding proportions:

# barplot
library(ggplot2)
ggplot(dat) +
  aes(x = size) +
  geom_bar(fill = "#0c4c8a") +
  theme_minimal()

# counts by size
table(dat$size)## 
##   big small 
##    77    73# proportions by size, rounded to 2 decimals
round(prop.table(table(dat$size)), 2)## 
##   big small 
##  0.51  0.49

Among the 150 flowers forming our sample, 51% and 49% are big and small, respectively. To test whether the proportions are the same among both sizes, we use the prop.test() function which accepts the following arguments:

number of successes
number of observations/trials
expected probability (the one we want to test against)

Considering (arbitrarily) that big is the success, we have: 1

# one-proportion test
test <- prop.test(
  x = 77, # number of successes
  n = 150, # total number of trials (77 + 73)
  p = 0.5
) # we test for equal proportion so prob = 0.5 in each grouptest## 
##  1-sample proportions test with continuity correction
## 
## data:  77 out of 150, null probability 0.5
## X-squared = 0.06, df = 1, p-value = 0.8065
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.4307558 0.5952176
## sample estimates:
##         p 
## 0.5133333

We obtain an output with, among others, the null probability ( 0.5 ), the test statistic ( X-squared = 0.06 ), the degrees of freedom ( df = 1 ), the p -value ( p-value = 0.8065 ) and the alternative hypothesis ( true p is not equal to 0.5 ). The p -value is 0.806 so, at the 5% significance level, we do not reject the null hypothesis that the proportions of small and big flowers are the same.

Assumption of `prop.test()` and `binom.test()`

Note that prop.test() uses a normal approximation to the binomial distribution. Therefore, one assumption of this test is that the sample size is large enough (usually, n > 30 ). If the sample size is small, it is recommended to use the exact binomial test.

The exact binomial test can be performed with the binom.test() function and accepts the same arguments as the prop.test() function. For this example, suppose now that we have a sample of 12 big and 3 small flowers and we want to test whether the proportions are the same among both sizes:

# barplot
barplot(c(12, 3), # observed counts
  names.arg = c("big", "small"), # rename labels
  ylab = "Frequency", # y-axis label
  xlab = "Size" # x-axis label
)
abline(
  h = 15 / 2, # expected counts in each level
  lty = 2 # dashed line
)

# exact binomial test
test <- binom.test(
  x = 12, # counts of successes
  n = 15, # total counts (12 + 3)
  p = 0.5 # expected proportion
)test## 
##  Exact binomial test
## 
## data:  12 and 15
## number of successes = 12, number of trials = 15, p-value = 0.03516
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.5191089 0.9566880
## sample estimates:
## probability of success 
##                    0.8

The p -value is 0.035 so, at the 5% significance level, we reject the null hypothesis and we conclude that the proportions of small and big flowers are significantly different. This is equivalent than concluding that the proportion of big flowers is significantly different from 0.5 (since there are only two sizes).

If you want to test that the proportion of big flowers is greater than 50%, add the alternative = "greater" argument into the binom.test() function: 2

test <- binom.test(
  x = 12, # counts of successes
  n = 15, # total counts (12 + 3)
  p = 0.5, # expected proportion
  alternative = "greater" # test that prop of big flowers is > 0.5
)test## 
##  Exact binomial test
## 
## data:  12 and 15
## number of successes = 12, number of trials = 15, p-value = 0.01758
## alternative hypothesis: true probability of success is greater than 0.5
## 95 percent confidence interval:
##  0.5602156 1.0000000
## sample estimates:
## probability of success 
##                    0.8

The p -value is 0.018 so, at the 5% significance level, we reject the null hypothesis and we conclude that the proportion of big flowers is significantly larger than 50%.

Chi-square goodness of fit test

Suppose now that the qualitative variable has more than two levels as it is the case for the variable Species :

# barplot
ggplot(dat) +
  aes(x = Species) +
  geom_bar(fill = "#0c4c8a") +
  theme_minimal()

# counts by Species
table(dat$Species)## 
##     setosa versicolor  virginica 
##         50         50         50

The variable Species has 3 levels, with 50 observations in each level. Suppose for this example that we want to test whether the 3 species are equally common. If they were equally common, they would be equally distributed and the expected proportions would be 1/3 for each of the species.

This test can be done with the chisq.test() function, accepting the following arguments:

a numeric vector representing the observed proportions
a vector of probabilities (of the same length of the observed proportions) representing the expected proportions

Applied to our research question (i.e., are the 3 species equally common?), we have:

# goodness of fit test
test <- chisq.test(table(dat$Species), # observed proportions
  p = c(1 / 3, 1 / 3, 1 / 3) # expected proportions
)test## 
##  Chi-squared test for given probabilities
## 
## data:  table(dat$Species)
## X-squared = 0, df = 2, p-value = 1

The p -value is 1 so, at the 5% significance level, we do not reject the null hypothesis that the proportions are equal among all species.

This was quite obvious even before doing the statistical test given that there are exactly 50 flowers of each species, so it was easy to see that the species are equally common. We however still did the test to show how it works in practice.

Does my distribution follow a given distribution?

In the previous section, we chose the proportions ourselves. The goodness of fit test is also particularly useful to compare observed proportions with expected proportions that are based on some known distribution.

Remember the hypotheses of the test:

H0: there is no significant difference between the observed and the expected frequencies
H1: there is a significant difference between the observed and the expected frequencies

For this example, suppose that we measured the number of girls in 100 families of 5 children. We want to test whether the (observed) distribution of number girls follows a binomial distribution.

Observed frequencies

Here is the distribution of the number of girls per family in our sample of 100 families of 5 children:

And the corresponding frequencies and relative frequencies (remember that the relative frequency is the frequency divided by the total sample size):

# counts
dat##   Girls Frequency Relative_freq
## 1     0         5          0.05
## 2     1        12          0.12
## 3     2        28          0.28
## 4     3        33          0.33
## 5     4        17          0.17
## 6     5         5          0.05

Expected frequencies

In order to compare the observed frequencies to a binomial distribution and see if both distributions match, we first need to determine the expected frequencies that would be obtained in case of a binomial distribution. The expected frequencies assuming a probability of 0.5 of having a girl (for each of the 5 children) are as follows:

# create expected frequencies for a binomial distribution
x <- 0:5
df <- data.frame(
  Girls = factor(x),
  Expected_relative_freq = dbinom(x, size = 5, prob = 0.5)
)
df$Expected_freq <- df$Expected_relative_freq * 100 # *100 since there are 100 families# create barplot
p <- ggplot(df, aes(x = Girls, y = Expected_freq)) +
  geom_bar(stat = "identity", fill = "#F8766D") +
  xlab("Number of girls per family") +
  ylab("Expected frequency") +
  labs(title = "Binomial distribution Bi(x, n = 5, p = 0.5)") +
  theme_minimal()
p

# expected relative frequencies and (absolute) frequencies
df##   Girls Expected_relative_freq Expected_freq
## 1     0                0.03125         3.125
## 2     1                0.15625        15.625
## 3     2                0.31250        31.250
## 4     3                0.31250        31.250
## 5     4                0.15625        15.625
## 6     5                0.03125         3.125

Observed vs. expected frequencies

We now compare the observed frequencies to the expected frequencies to see whether the two differ significantly. If the two differ significantly, we reject the hypothesis that the number of girls per family of 5 children follows a binomial distribution. On the other hand, if the observed and expected frequencies are similar, we do not reject the hypothesis that the number of girls per family follows a binomial distribution.

Visually we have:

# create data
data <- data.frame(
  num_girls = factor(rep(c(0:5), times = 2)),
  Freq = c(dat$Freq, df$Expected_freq),
  obs_exp = c(rep("observed", 6), rep("expected", 6))
)# create plot
ggplot() +
  geom_bar(
    data = data, aes(
      x = num_girls, y = Freq,
      fill = obs_exp
    ),
    position = "dodge", # bar next to each other
    stat = "identity"
  ) +
  ylab("Frequency") +
  xlab("Number of girls per family") +
  theme_minimal() +
  theme(legend.title = element_blank()) # remove legend title

We see that the observed and expected frequencies are quite similar, so we expect that the number of girls in families of 5 children follows a binomial distribution. However, only the goodness of fit test will confirm our belief:

# goodness of fit test
test <- chisq.test(dat$Freq, # observed frequencies
  p = df$Expected_relative_freq # expected proportions
)test## 
##  Chi-squared test for given probabilities
## 
## data:  dat$Freq
## X-squared = 3.648, df = 5, p-value = 0.6011

The p -value is 0.601 so, at the 5% significance level, we do not reject the null hypothesis that the observed and expected frequencies are equal. This is equivalent than concluding that we cannot reject the hypothesis that the number of girls in families of 5 children follows a binomial distribution (since the expected frequencies were based on a binomial distribution).

Note that the goodness of fit test can of course be performed with other types of distribution than the binomial one. For instance, if you want to test whether an observed distribution follows a Poisson distribution, this test can be used to compare the observed frequencies with the expected proportions that would be obtained in case of a Poisson distribution.

By hand

Now that we showed how to perform the one-proportion and goodness of fit test in R, in this section we show how to do these tests by hand. We first illustrate the one-proportion test then the Chi-square goodness of fit test.

One-proportion test

For this example, suppose that we tossed a coin 100 times and noted that it landed on heads 67 times. Following this, we want to test whether the coin is fair, that is, test whether the probability of landing on heads or tails is equal to 50%.

As for many hypothesis tests, we do it through 4 easy steps:

State the null and alternative hypotheses
Compute the test-statistic (also known as t-stat)
Find the rejection region
Conclude by comparing the test-statistic with the rejection region

Step 1.

In our example, the null and alternative hypotheses are:

H0: p0 = 0.5
H1: p0 ≠ 0.5

where p0 is the expected proportion of landing on heads.

Step 2.

The test statistic is: 3

(See how to perform hypothesis tests in a Shiny app if you need more help in computing the test statistic.)

Step 3.

The rejection region is found via the normal distribution table. Assuming a significance level α = 0.05, we have:

±z(α/2) = ±z(0.025) = ±1.96

Step 4.

We compare the test statistic (found in step 2) with the rejection region (found in step 3) and we conclude. Visually, we have:

The test statistic lies within the rejection region (i.e., the grey shaded areas). Therefore, at the 5% significance level, we reject the null hypothesis and we conclude that the proportion of heads (and thus tails) is significantly different than 50%. In other words, still at the 5% significance level, we conclude that the coin is unfair.

If you prefer to compute the p -value instead of comparing the t-stat and the rejection region, you can use this Shiny app to easily compute p -values for different probability distributions. After having opened the app, set the t-stat, the corresponding alternative and you will find the p -value at the top of the page.

Verification in R

Just for the sake of illustration, here is the verification of the above example in R:

# one-proportion test
test <- prop.test(
  x = 67, # number of heads
  n = 100, # number of trials
  p = 0.5 # expected probability of heads
)test## 
##  1-sample proportions test with continuity correction
## 
## data:  67 out of 100, null probability 0.5
## X-squared = 10.89, df = 1, p-value = 0.0009668
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.5679099 0.7588442
## sample estimates:
##    p 
## 0.67

The p -value is 0.001 so, at the 5% significance level, we reject the null hypothesis that the proportions of heads and tails are equal, and we conclude that the coin is biased. This is the same conclusion than the one found by hand.

Goodness of fit test

We now illustrate the goodness of fit test by hand with the following example.

Suppose that we toss a dice 100 times, we note how many times it lands on each face (1 to 6) and we test whether the dice is fair. Here are the observed counts by dice face:

## dice_face
##  1  2  3  4  5  6 
## 15 24 10 19 19 13

With a fair dice, we would expect it to land 100/6 ≈ 16.67 times on each face (this expected value is represented by the dashed line in the above plot). Although the observed frequencies are different than the expected value of 16.67:

##   dice_face observed_freq expected_freq
## 1         1            15         16.67
## 2         2            24         16.67
## 3         3            10         16.67
## 4         4            19         16.67
## 5         5            19         16.67
## 6         6            13         16.67

we need to test whether they are significantly different. For this, we perform the appropriate hypothesis test following the 4 easy steps mentioned above:

State the null and alternative hypotheses
Compute the test-statistic (also known as t-stat)
Find the rejection region
Conclude by comparing the test-statistic with the rejection region

Step 1.

The null and alternative hypotheses of the goodness of fit test are:

H0: there is no significant difference between the observed and the expected frequencies
H1: there is a significant difference between the observed and the expected frequencies

Step 2.

The test statistic is:

where Oi is the observed frequency, Ei is the expected frequency and k is the number of categories (in our case, there are 6 categories, representing the 6 dice faces).

This χ2 statistic is obtained by calculating the difference between the observed number of cases and the expected number of cases in each category. This difference is squared (to avoid negative and positive differences being compensated) and divided by the expected number of cases in that category. These values are then summed for all categories, and the total is referred to as the χ2 statistic. Large values of this test statistic lead to the rejection of the null hypothesis, small values mean that the null hypothesis cannot be rejected. 4

Given our data, we have:

Step 3.

Whether the χ2 test statistic is small or large depends on the rejection region. The rejection region is found via the χ2 distribution table. With a degrees of freedom equals to k − 1 (where k is the number of categories) and assuming a significance level α = 0.05, we have:

χ2(α; k−1) = χ2(0.05; 5) = 11.0705

Step 4.

We compare the test statistic (found in step 2) with the rejection region (found in step 3) and we conclude. Visually, we have:

The test statistic does not lie within the rejection region (i.e., the grey shaded area). Therefore, at the 5% significance level, we do not reject the null hypothesis that there is no significant difference between the observed and the expected frequencies. In other words, still at the 5% significance level, we cannot reject the hypothesis that the dice is fair.

Again, you can use the Shiny app to easily compute the p -value given the test statistic if you prefer this method over the comparison between the t-stat and the rejection region.

Verification in R

Just for the sake of illustration, here is the verification of the above example in R:

# goodness of fit test
test <- chisq.test(dat$observed_freq, # observed frequencies for each dice face
  p = rep(1 / 6, 6) # expected probabilities for each dice face
)test## 
##  Chi-squared test for given probabilities
## 
## data:  dat$observed_freq
## X-squared = 7.52, df = 5, p-value = 0.1847

The test statistic and degrees of freedom are exactly the same than the ones found by hand. The p -value is 0.185 which, still at the 5% significance level, leads to the same conclusion than by hand (i.e., failing to reject the null hypothesis).

Thanks for reading. I hope this article helped you to understand and perform the one-proportion and goodness of fit test in R and by hand.

As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.

Get updates every time a new article is published by subscribing to this blog.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

One-proportion and goodness of fit test (in R and by hand)

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

数据结构与算法分析

Frank.M.Carrano / 金名 / 清华大学出版社 / 2007-11 / 98.00元

“数据结构”是计算机专业的基础与核心课程之一，Java是现今一种热门的语言。本书在编写过程中特别考虑到了面向对象程序设计（OOP）的思想与Java语言的特性。它不是从基于另一种程序设计语言的数据结构教材简单地“改编”而来的，因此在数据结构的实现上更加“地道”地运用了Java语言，并且自始至终强调以面向对象的方式来思考、分析和解决问题。本书是为数据结构入门课程（通常课号是CS-2）而编写的教......一起来看看《数据结构与算法分析》这本书的介绍吧!

码农工具

One-proportion and goodness of fit test (in R and by hand)