Archive for November, 2010

It’s A Math, Math World (Wilcoxon Part 2)

The following example is from the textbook, General Statistics, by Chase and Brown (2000).

So far, every hypothesis test we have considered had required an assumption of normality or near-normality.  Sometimes, though, the distribution of the population is non-normal or unknown and we use a nonparametric test.  An easy example of such a test is the Wilcoxon Signed-Rank test for testing the value of a median of a population. This is superior to the Sign Test (see last post) because it does not ignore the magnitude of the data as the Sign Test does. Thus, the Wilcoxon test is more sensitive and will more often reject a false null hypothesis.

A necessary condition for this test is that the data must be continuous. Also, we assume symmetry in the distribution of the population.

Last week we looked at the Wilcoxon Signed-Rank Test for a single population median (see post Wilcoxon Part 1).  Today we will look at the difference between 2 medians (a 2 population case).  In this case, the samples are dependent – that is, the data is obtained in pairs.  Also, the data must be continuous and the 2 distributions must have similar shapes.

Example: A college professor claims that a remedial English course will help students whose English skills are deficient. Twenty-five students who failed a pre-test are given the course and then take a post-test. Is the professor’s claim justified at the 5% level of significance?

The professor claims that the pretest scores (X1) tend to be lower than the scores on the posttest (X2). This means that the values of the difference D= X1-X2 will be less than zero (i.e. Median (differences) < 0). Thus, the hypotheses are:

H0: Median (Differences) = 0

HA: Median (Differences) < 0

Level of significance: α = 0.05

Test statistic: We use W+ as a test statistic. From the table below, we see that W+ = 68.5

Critical Region: From the appropriate table, we see that the critical value for a one-tailed test     with α = 0.05 and n=25 is c=101. Thus the critical region consists of values of W+ ≤ 101.

Conclusion: The observed value of 68.5 is in the critical region, so we reject H0. This appears that the professor’s claim appears to be correct. Scores on the pretest appear to be lower than those on the posttest which suggests that the course is effective.

Note on Tied Ranks: In this case, we have values of |D| that are the same. We have that case in the ranks of the elements in positions 11, 12, 13 and 14. IN the case, we average the ranks and get 12.5 and use the common rank for all 4 elements. The next rank would start at 15 and proceed as usual.

PRE-TEST (X1) POST-TEST (X2) DIFFERENCE    (X1-X2) |D| SIGNED RANK
46 76 -30 30 -25
27 36 -9 9 -7
37 53 -16 16 -12.5
34 55 -21 21 -18
20 12 8 8 6
38 50 -12 12 -10
10 36 -26 26 -22
24 18 6 6 4
20 21 -1 1 -1
39 57 -18 18 -15
16 27 -11 11 -9
20 48 -28 28 -23
47 70 -23 23 -19
45 25 20 20 17
40 50 -10 10 -8
46 39 7 7 5
32 51 -19 19 -16
49 33 16 16 12.5
45 69 -24 24 -20
49 52 -3 3 -2
44 60 -16 16 -12.5
45 20 25 25 21
16 12 4 4 3
41 70 -29 29 -24
48 64 -16 16 -12.5

Case of Zero Differences: If any of the values of D are zero, then we use the following procedure.

If there is an even number of zeros, each zero is assigned the average rank for the set, and then half of them are assigned a plus sign and the other half a minus sign.

Ex.  If there are 4 zeros, then we would assign the ranks 1, 2, 3, and 4 giving us an average rank of 2.5. We would end up with the signed ranks: -2.2, -2.5, 2.5, and 2.5.

If there is an odd number of zeros, we discard one of them, reduce the sample size by 1 and proceed as in the even case.

Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World 

Email Marketing You Can Trust

It’s A Math, Math World (Wilcoxon Signed-Rank)

The following example is from the textbook, General Statistics , by Chase and Brown (2000).

So far, every hypothesis test we have considered had required an assumption of normality or near-normality.  Sometimes, though, the distribution of the population is non-normal or unknown and we use a nonparametric test.  An easy example of such a test is the Wilcoxon Signed-Rank test for testing the value of a median of a population. This is superior to the Sign Test (see last post) because it does not ignore the magnitude of the data as the Sign Test does. Thus, the Wilcoxon test is more sensitive and will more often reject a false null hypothesis.

A necessary condition for this test is that the data must be continuous. Also, we assume symmetry in the distribution of the population.

Ex.  A public school official believes that high school seniors in a large school system will tend to score higher, than the national median of 50, on a test.

H0: Median = 50

HA: Median > 50

Alpha = 0.05

SCORE(X) D= DIFFERENCE (X-50) MAGNITUDE (|D|) SIGNED RANK
57 7 7 4
70 20 20 10
42 -8 8 -5
48 -2 2 -1
77 27 27 12
63 13 13 8
45 -5 5 -3
64 14 14 9
59 9 9 6
39 -11 11 -7
73 23 23 11
78 28 28 13
47 -3 3 -2

Each magnitude is ranked from smallest to largest and is affixed with its             corresponding sign.

W+ = sum of all positive ranks = 73

W- = sum of all negative ranks = 18

Suppose for the moment that the population median is actually 50. Since the sample is drawn from a population that is symmetric about the median (by assumption), we expect the sample itself to be roughly symmetrical about the median.  Thus, if we examine the ranks of the magnitudes of the differences (D), the ranks of the data points higher than 50 should be comparable to the ranks of the data points below 50. Thus, the sum of the ranks of the data points on one side of 50 should equal the sum of the ranks of the data points on the other side.

In our example, we have the following sums:

W+ = sum of all positive ranks = 73

W- = sum of all negative ranks = 18

If the true median is 50, we would expect W+ and W- to be of comparable size.

If W+ is much smaller than W-, then this suggests that the data values are spread farther below 50 than above 50; implying Median < 50

If W- is much smaller than W+, then this suggests that the data values are spread farther above 50 than below; implying Median > 50.

In our example, W- = 18 is small relative to W+=73, which seems to suggest that Median > 50

Getting back to our hypothesis test, we could use W- as a test statistic. If W- is “too small” then we would reject H0 in favor of HA. Given a table of Wilcoxon Signed-Rank Test values, we can look up the alpha value and sample size and get a critical value c. When W- ≤ c, W- is too small and we reject the H0.

From the table, when we use alpha=0.05 and n=13, we get c=21.

Since W- ≤ 21, we reject H0. Thus, the median does appear to be greater than 50.

A Look Ahead: Next Week, we will look at the case of tied ranks and zero differences, and also the comparing of two populations using this test.

Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World

Email Marketing You Can Trust

It’s A Math, Math World (The Sign Test)

The following example is from the textbook, General Statistics , by Chase and Brown (2000).

So far, every hypothesis test we have considered had required an assumption of normality or near-normality.  Sometimes, though, the distribution of the population is non-normal or unknown and we use a nonparametric test.  An easy example of such a test is the sign test for testing the value of a median of a population.

Example:

A seed company wants to market a new type of seed that would reportedly produce a greater yield than the old type of seed. Thirteen farmers agree to test grown the new seed in one acre and test grown the old variety on another acre and look at the difference in wheat yield.

BUSHELS OF WHEAT FROM 2 TYPES OF SEED

FARM NEW VARIETY (y1) OLD VARIETY (y2) DIFFERENCE       D= y1 – y2 Sign of D
1 34 27 7 +
2 45 25 20 +
3 30 38 -8
4 30 42 -12
5 48 21 27 +
6 35 22 13 +
7 32 37 -5
8 46 30 16 +
9 41 32 9 +
10 23 38 -15
11 42 26 16 +
12 43 33 10 +
13 65 68 -3

 

We are unsure of the distributions of the two varieties of wheat so we will use the sign test.

y1 = yield from the new variety (in bushels)

y2 = yield from the old variety (in bushels)

D = y1 – y2

Hypotheses:

H0: Median(D) = 0

Ha: Median(D) > 0

Level of significance: α =0.05

Test statistics and Observed value:

                We count the number of values of D that are above zero (plus signs) from above table.

X = number of plus signs = 8

Critical Region:  We perform a right tailed test. Looking at the table of Binomial probabilities, with n=13 and x=8, we see if we use the values 10, 11, 12 and 13 for the critical region then:

Α = P(10) + P(11) + P(12) + P(13) = .035 + .010 + .002 + 0 = .047 which is close to .05, hence the critical region will consist of the following x-values: 10, 11, 12 and 13.

Decision: The observed value (x=8) is not in the critical region so we do not reject the null hypothesis and we conclude that there is not enough evidence to conclude that the new variety of seed is more effective than the old variety.

Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World

Email Marketing You Can Trust