It’s A Math, Math World (ANOVA Part1)

 

In previous weeks, we learned how to test a single hypothesis of the difference between two population means: i.e., test whether two means u1 and u2 are equal. What if we have more than two populations are we want to see if the means are equal? We want to compare more than 2 population means at the same time. This process is called Analysis of Variance (ANOVA).

Note we could conduct multiple pair-wise tests of the equality of means, but this would multiply the error rate considerably. In the ANOVA case, we test the following hypothesis:

H0: u1 = u2 = u3 = … = uk

HA: not all the means are equal

The methodology is as follows (we are assuming EQUAL sample sizes in this example).This example is from the textbook General Statistics (2000) by Chase and Bown:

A large chemical company uses 4 manufacturing plants to produce the same fertilizer. The plants were built to be equivalent, so the mean output of fertilizer from each plant should be the same and have the same variability. We want to test that the weekly mean output (tons of fertilizer produced) is the same for each plant. This will of course vary week to week, but we are interested in the true mean weekly production for a plant.

H0: u1 = u2 = u3 = u4

Ha: Not all means are equal (at least one is different)

Weekly Production Figures for 5 weeks for 4 Fertilizer Plants (weekly production is in tons)

  PLANT 1 PLANT 2 PLANT 3 PLANT 4
  574 546 580 585
  578 556 570 582
  573 549 577 581
  568 551 575 589
  572 553 573 588
Sample mean 573 551 575 585
Sample variance 13 14.5 14.5 12.5

 

If the sample means are clustered close together, this would tend to support H0.

A great degree of variability among the sample- means would suggest that not all of the population means are equal, thus supporting HA.

The key to testing for equality of several population means is to look at the variability between the sample means. A large amount of variability would suggest that not all of the population means are equal. Therefore, we would reject H0 in favor of HA, otherwise we would not reject H0.

“Large” is a relative term and this variability must be measured in terms of something.  We will define large as being the condition that the variability between the sample- means is large in relation to the variability within the samples. When this is the case, we reject H0 and conclude that the population means are not all the same.

First we assume that the population variance, σ2, is the same for all the plants, whether the means are equal or not. From our sample data, we will calculate 2 estimates:

  • The within-sample estimate of σ2
  • The between sample estimate of σ2

Estimate #1: Within-Sample Estimate

We pool the estimates the estimates of the sample variances by averaging them:

Estimate 1 = (13+14.5+14.5+12.5)/4 = 13.625

Estimate #2: Between-Sample Estimate

Let us assume for the moment that H0 is true, and then we can view the samples of production figures as 4 samples of size 5 from the same population.  The 4 sample means are values of the random variable x_bar.  By the Central Limit Theorem, we know that the standard deviation of x_bar is:

                                σx_bar = sqrt (σ2/m)   or     σ2 = m x (σx_bar)2

We use the sample variance of the 4 values of x_bar which I will call s2x_bar as an            estimate of this variance we have to find.

We first need to find the grand mean of the 4 sample means which is = (573 + 571 + 575                                + 585)/4

= 571

We calculate the sample variance s2x_bar as follows:

Sample Mean Sample mean – grand mean (Sample mean – grand mean)2
573 0 0
551 -2 4
575 2 4
585 0 0
Grand mean = 571   8 = SUM

 

S2x_bar = SUM/(4-1) = 8/3

Estimate 2 = m x (s2x_bar) = 5 x (8/3) = 13.333

We combine the estimates as follows:

F-stat = (Estimate #1)/ (Estimate #2) = 13.625/13.333 = 1.021

The statistic, F-stat, follows an F distribution with df1= k-1 and df2 = n-k degrees of freedom               where:

n= # of data values in all the samples.

k = # of populations

We express the degrees of freedom as an ordered pair df = (k-1, n-k)

In our example F-stat = 1.021 and compare it to the F distribution at α=0.05 and df = (3, 16)

Our critical value is 3.24 (from the F distribution tables), since F-stat < critical value, we fail to reject the H0 and we conclude that there is no difference between the mean output of the 4 plants.

Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World

13 Responses to “It’s A Math, Math World (ANOVA Part1)”

  • bmx games says:

    Nice posts indeed

  • Just commenting on how good the design of your website is, been serching into creating a blog similar to yours and might make mine similar, did you hire a coder or did you create it yourself?

  • Zhongjie Sun says:

    4 sample means which is = (573 + 571 + 575 + 573)/4

    why not =(573+551+575+585)/4=571?

  • jeff daniels says:

    Michael –

    The grand mean of the sample is 571 not 573.

    Best,

    Jeff

  • admin says:

    Thanks, Jeff. I made the correction. Thanks for your eagle eye and fror keeping me honest. All the best, MIke

  • admin says:

    Yes, you are correct. I made the correction. That will teach me to proof-read better :)
    THanks again.

  • Dan says:

    Very good example. Simple and relevant. Perhaps notations from one step to another are a little bit confusing (σ2 to s2x_bar), but it’s OK. Also, being a tutorial, it would be worthwhile to explain why the F statistic takes this form, what does it mean. In the end, are you sure about the number of degrees of freedom and the F value for those dfs?

  • admin says:

    Hi Dan, Thanks for the positive note. I am planning to do another post of ANOVA when sample sizes are not equal and maybe I can cover the topics you mentioned at that time. Very good points! As for the F value and the degrees of freedom, I will check it again because I transcribed this example from a text and I could have made a mistake. It would not be the first time! Take Care and thanks again.

  • Sid says:

    Hello Mr. O’Brien:

    Thank you for posting such useful material on your blog. It has been very helpful to me so far. I have a small suggestion: Could you give a suitable title (we see just the date now) to each blog post so that in the future, when you have many posts on your blog, it becomes easier to search for a topic one wants easily rather than going through all your posts. Thanks again.

    Sincerely,
    Sid

  • Dan says:

    Thank you, Mr. O’Brien, it’s perfect. Waiting for your new post, because approaches like this one really help students to better understand how to use data analysis tools.
    Dan

  • Hey very nice blog!! Man .. Beautiful .. Amazing .. I will bookmark your blog and take the feeds also…

  • clothes sale says:

    Very creative,I like it.

  • I love the way you write and also the theme on your blog. Did you code this yourself or was it done by a professional? I’m very very impressed.