## It’s A Math, Math World (Chi-Square Goodness-of-Fit Test)

In this week’s post, we will be analyzing categorical data. When dealing with categorical data, we are concerned with frequencies of each occurrence of the values of a random variable. We will be testing hypotheses of multiple proportions (more than 2 at a time). This technique we will use, called the **Chi-Square Goodness of Fit Test**, works for any set of proportions that add up to 1.

This test involves taking the observed frequencies (O) given in the problem and comparing these to the expected frequencies (E) which we calculate.

We are given, for example:

H0: p1 = 0.3, p2 = 0.2, p3 = 0.5; where 0.3 + 0.2 + 0.5 = 1

There are a total of n onbservartions so the expected frequencies are:

0.3 x n, 0.2 x n and 0.5 x n

*If the observed frequencies differ too much from the expected frequencies, we would reject the null hypothesis. *

We calculate the test statistic which is Χ^{2} = ∑ ((O-E)^{2}/E) which has the Chi-Square distribution. The properties of the Chi-Square are as follows:

- There is an infinite number of Chi-Square distributions, each one associated with a number called its degrees of freedom. We calculate df = k-1 where k is the number of categories. We use this number to specify which Chi-Square distribution we are using.
- The test statistic has a Chi-square distribution if the sample size is sufficiently large such that the expected value of each category is at least 5.

Ex. (From the textbook, General Statistics (2000), by Chase and Brown). We look at the production of electronic instruments. Four assembly lines are used to produce the same item. Each assembly line is equivalent in theory, so each should have the same rate of items produced that need servicing under warranty.

A decision was made to look at the next 100 instruments returned as defective and see how many came from each plant.

The observed values of returned instruments for plants 1 through 4 are respectively: 53,18,14,15.

Plant 1 operates two shifts per day, while the other 3 plants operate 1 shift per day each.

Carry out the test of equivalence of the assembly lines at 10% level of significance.

Hypothesis:

H0: p1 = 2/5 = 0.4, p2=p3=p4=1/5 = 0.2 (Because plant 1 operates 2 of the 5 shifts while the remaining plants each operate a single shift of the five)

Ha: H0 not true

We have to compare the observed frequencies (O) against the expected frequencies (E) using the Chi-Square Goodness of Fit Test:

**Expected Frequencies**

Assembly Line |
P |
NP = E |

1 | 0.4 | (100)(0.4) = 40 |

2 | 0.2 | (100)(0.2) = 20 |

3 | 0.2 | (100)(0.2) = 20 |

4 | 0.2 | (100)(0.2) = 20 |

**Calculation of Test Statistic**

Assembly Line |
O |
E |
(O-E) |
(O-E)^{2} |
(O-E)^{2}/E |

1 | 53 | 40 | 13 | 169 | 4.225 |

2 | 18 | 20 | -2 | 4 | 0.200 |

3 | 14 | 20 | -6 | 36 | 1.800 |

4 | 15 | 20 | -5 | 25 | 1.250 |

χ^{2} = ∑ ((O-E)^{2}/E) **= 7.457**, df = k-1 = 4-1= **3** where k= number of groups. This value, χ^{2}, is the **test statistic.**

When df=3, χ^{2}(0.10) = 6.251 which we get from a Chi-Squared table or any software package. This is the **critical value.**

**Since test statistic = 7.457 > 6.251 = critical value, we reject the null hypothesis, in a right tailed test, and conclude that the assembly lines are not equivalent.**

*NOTE: I will be taking the next few weeks off for vacation. I will be returning with my next blog post on Monday, October 18 ^{th}.*

*Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World*

Michael, very nice tutorial.

The chi-square, is a really useful statistic, if the random variables, are drawn from independent normal random variables. For academic discussion is is always nice to point it out. In most real world application, when the sample is large one can hypothesize that the variables are indep. normal.

I like very much those plain explanation with example. They are very useful to get an insight in the difficult field of statistics.

My compliments for the easy language and clarity of your description.

Peace

David.

David, THank you for the compliments and for reading my posting. THere are many more like it on my website http://www.mpobrien.com and I post every MOnday evening here in the NOrtheast USA. I will be taking the next 3 weeks off so stay tuned for my next post on October 18th.

IF you have any suggested topics for future posts just let me know.

High quality info here! Keep up the great work. I love the feelings being expressed.

I’ve recently started a blog, the information you provide on this site has helped me tremendously. Thank you for all of your time & work.

Hey very cool web site!! Man .. Excellent .. Superb .. I will bookmark your site and take the feeds additionally¡KI’m happy to seek out numerous useful information right here within the submit, we want work out extra techniques in this regard, thank you for sharing. . . . . .