It’s A Math, Math World (Wilcoxon Part 2)
The following example is from the textbook, General Statistics, by Chase and Brown (2000).
So far, every hypothesis test we have considered had required an assumption of normality or near-normality. Sometimes, though, the distribution of the population is non-normal or unknown and we use a nonparametric test. An easy example of such a test is the Wilcoxon Signed-Rank test for testing the value of a median of a population. This is superior to the Sign Test (see last post) because it does not ignore the magnitude of the data as the Sign Test does. Thus, the Wilcoxon test is more sensitive and will more often reject a false null hypothesis.
A necessary condition for this test is that the data must be continuous. Also, we assume symmetry in the distribution of the population.
Last week we looked at the Wilcoxon Signed-Rank Test for a single population median (see post Wilcoxon Part 1). Today we will look at the difference between 2 medians (a 2 population case). In this case, the samples are dependent – that is, the data is obtained in pairs. Also, the data must be continuous and the 2 distributions must have similar shapes.
Example: A college professor claims that a remedial English course will help students whose English skills are deficient. Twenty-five students who failed a pre-test are given the course and then take a post-test. Is the professor’s claim justified at the 5% level of significance?
The professor claims that the pretest scores (X1) tend to be lower than the scores on the posttest (X2). This means that the values of the difference D= X1-X2 will be less than zero (i.e. Median (differences) < 0). Thus, the hypotheses are:
H0: Median (Differences) = 0
HA: Median (Differences) < 0
Level of significance: α = 0.05
Test statistic: We use W+ as a test statistic. From the table below, we see that W+ = 68.5
Critical Region: From the appropriate table, we see that the critical value for a one-tailed test with α = 0.05 and n=25 is c=101. Thus the critical region consists of values of W+ ≤ 101.
Conclusion: The observed value of 68.5 is in the critical region, so we reject H0. This appears that the professor’s claim appears to be correct. Scores on the pretest appear to be lower than those on the posttest which suggests that the course is effective.
Note on Tied Ranks: In this case, we have values of |D| that are the same. We have that case in the ranks of the elements in positions 11, 12, 13 and 14. IN the case, we average the ranks and get 12.5 and use the common rank for all 4 elements. The next rank would start at 15 and proceed as usual.
PRE-TEST (X1) | POST-TEST (X2) | DIFFERENCE (X1-X2) | |D| | SIGNED RANK |
46 | 76 | -30 | 30 | -25 |
27 | 36 | -9 | 9 | -7 |
37 | 53 | -16 | 16 | -12.5 |
34 | 55 | -21 | 21 | -18 |
20 | 12 | 8 | 8 | 6 |
38 | 50 | -12 | 12 | -10 |
10 | 36 | -26 | 26 | -22 |
24 | 18 | 6 | 6 | 4 |
20 | 21 | -1 | 1 | -1 |
39 | 57 | -18 | 18 | -15 |
16 | 27 | -11 | 11 | -9 |
20 | 48 | -28 | 28 | -23 |
47 | 70 | -23 | 23 | -19 |
45 | 25 | 20 | 20 | 17 |
40 | 50 | -10 | 10 | -8 |
46 | 39 | 7 | 7 | 5 |
32 | 51 | -19 | 19 | -16 |
49 | 33 | 16 | 16 | 12.5 |
45 | 69 | -24 | 24 | -20 |
49 | 52 | -3 | 3 | -2 |
44 | 60 | -16 | 16 | -12.5 |
45 | 20 | 25 | 25 | 21 |
16 | 12 | 4 | 4 | 3 |
41 | 70 | -29 | 29 | -24 |
48 | 64 | -16 | 16 | -12.5 |
Case of Zero Differences: If any of the values of D are zero, then we use the following procedure.
If there is an even number of zeros, each zero is assigned the average rank for the set, and then half of them are assigned a plus sign and the other half a minus sign.
Ex. If there are 4 zeros, then we would assign the ranks 1, 2, 3, and 4 giving us an average rank of 2.5. We would end up with the signed ranks: -2.2, -2.5, 2.5, and 2.5.
If there is an odd number of zeros, we discard one of them, reduce the sample size by 1 and proceed as in the even case.
Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World
Email Marketing You Can Trust
It’s A Math, Math World (Wilcoxon Signed-Rank)
The following example is from the textbook, General Statistics , by Chase and Brown (2000).
So far, every hypothesis test we have considered had required an assumption of normality or near-normality. Sometimes, though, the distribution of the population is non-normal or unknown and we use a nonparametric test. An easy example of such a test is the Wilcoxon Signed-Rank test for testing the value of a median of a population. This is superior to the Sign Test (see last post) because it does not ignore the magnitude of the data as the Sign Test does. Thus, the Wilcoxon test is more sensitive and will more often reject a false null hypothesis.
A necessary condition for this test is that the data must be continuous. Also, we assume symmetry in the distribution of the population.
Ex. A public school official believes that high school seniors in a large school system will tend to score higher, than the national median of 50, on a test.
H0: Median = 50
HA: Median > 50
Alpha = 0.05
SCORE(X) | D= DIFFERENCE (X-50) | MAGNITUDE (|D|) | SIGNED RANK |
57 | 7 | 7 | 4 |
70 | 20 | 20 | 10 |
42 | -8 | 8 | -5 |
48 | -2 | 2 | -1 |
77 | 27 | 27 | 12 |
63 | 13 | 13 | 8 |
45 | -5 | 5 | -3 |
64 | 14 | 14 | 9 |
59 | 9 | 9 | 6 |
39 | -11 | 11 | -7 |
73 | 23 | 23 | 11 |
78 | 28 | 28 | 13 |
47 | -3 | 3 | -2 |
Each magnitude is ranked from smallest to largest and is affixed with its corresponding sign.
W+ = sum of all positive ranks = 73
W- = sum of all negative ranks = 18
Suppose for the moment that the population median is actually 50. Since the sample is drawn from a population that is symmetric about the median (by assumption), we expect the sample itself to be roughly symmetrical about the median. Thus, if we examine the ranks of the magnitudes of the differences (D), the ranks of the data points higher than 50 should be comparable to the ranks of the data points below 50. Thus, the sum of the ranks of the data points on one side of 50 should equal the sum of the ranks of the data points on the other side.
In our example, we have the following sums:
W+ = sum of all positive ranks = 73
W- = sum of all negative ranks = 18
If the true median is 50, we would expect W+ and W- to be of comparable size.
If W+ is much smaller than W-, then this suggests that the data values are spread farther below 50 than above 50; implying Median < 50
If W- is much smaller than W+, then this suggests that the data values are spread farther above 50 than below; implying Median > 50.
In our example, W- = 18 is small relative to W+=73, which seems to suggest that Median > 50
Getting back to our hypothesis test, we could use W- as a test statistic. If W- is “too small” then we would reject H0 in favor of HA. Given a table of Wilcoxon Signed-Rank Test values, we can look up the alpha value and sample size and get a critical value c. When W- ≤ c, W- is too small and we reject the H0.
From the table, when we use alpha=0.05 and n=13, we get c=21.
Since W- ≤ 21, we reject H0. Thus, the median does appear to be greater than 50.
A Look Ahead: Next Week, we will look at the case of tied ranks and zero differences, and also the comparing of two populations using this test.
Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World
Email Marketing You Can Trust
It’s A Math, Math World (The Sign Test)
So far, every hypothesis test we have considered had required an assumption of normality or near-normality. Sometimes, though, the distribution of the population is non-normal or unknown and we use a nonparametric test. An easy example of such a test is the sign test for testing the value of a median of a population.
Example:
A seed company wants to market a new type of seed that would reportedly produce a greater yield than the old type of seed. Thirteen farmers agree to test grown the new seed in one acre and test grown the old variety on another acre and look at the difference in wheat yield.
BUSHELS OF WHEAT FROM 2 TYPES OF SEED
FARM | NEW VARIETY (y1) | OLD VARIETY (y2) | DIFFERENCE D= y1 – y2 | Sign of D |
1 | 34 | 27 | 7 | + |
2 | 45 | 25 | 20 | + |
3 | 30 | 38 | -8 | – |
4 | 30 | 42 | -12 | – |
5 | 48 | 21 | 27 | + |
6 | 35 | 22 | 13 | + |
7 | 32 | 37 | -5 | – |
8 | 46 | 30 | 16 | + |
9 | 41 | 32 | 9 | + |
10 | 23 | 38 | -15 | – |
11 | 42 | 26 | 16 | + |
12 | 43 | 33 | 10 | + |
13 | 65 | 68 | -3 | – |
We are unsure of the distributions of the two varieties of wheat so we will use the sign test.
y1 = yield from the new variety (in bushels)
y2 = yield from the old variety (in bushels)
D = y1 – y2
Hypotheses:
H0: Median(D) = 0
Ha: Median(D) > 0
Level of significance: α =0.05
Test statistics and Observed value:
We count the number of values of D that are above zero (plus signs) from above table.
X = number of plus signs = 8
Critical Region: We perform a right tailed test. Looking at the table of Binomial probabilities, with n=13 and x=8, we see if we use the values 10, 11, 12 and 13 for the critical region then:
Α = P(10) + P(11) + P(12) + P(13) = .035 + .010 + .002 + 0 = .047 which is close to .05, hence the critical region will consist of the following x-values: 10, 11, 12 and 13.
Decision: The observed value (x=8) is not in the critical region so we do not reject the null hypothesis and we conclude that there is not enough evidence to conclude that the new variety of seed is more effective than the old variety.
Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World
Email Marketing You Can Trust
It’s A Math, Math World (Contingency Tables & Independence)
In this week’s post, we will be analyzing categorical data with contingency tables. We want to see if 2 or more characteristics are related (dependent) or unrelated (independent). The following examples are from the textbook, General Statistics , by Chase and Brown (2000).
What do we mean by the independence of 2 characteristics? Suppose candidate A and candidate B are running for public office and 75% of the voters favor candidate A while 25% favor candidate B. Consider two characteristics: choice of candidate and gender of voter. These characteristics are independent if the percentage of voters following candidate A and following candidate B are the same for both genders (i.e. 75% of men and 75% of women follow candidate A while 25% of men and 25% of women follow candidate B). If for some reason the percentage favoring candidate A was greater in men, then the characteristics would be related or dependent.
We can create the following contingency table of the 4 possible combinations of the 2 factors:
FAVOR CANDIDATE A | FAVOR CANDIDATE B | |
FEMALES | Female and favor candidate A | Female and favor candidate B |
MALES | Male and favor candidate A | Male and favor candidate B |
Suppose 60% of voters in this election are female.
P (A) = Probability of vote for candidate A = 0.75
P (B) = Probability of vote for candidate B = 0.25
P (F) = Probability of female voter = 0.60
P (B) = Probability of male voter = 0.40
If candidate choice and gender of voter are independent, then
P (FA) = Probability of female votes for candidate A = P (F)*P (A) =0.6*0.75 = 0.45
P (FB) = Probability of female votes for candidate B = P (F)*P (B) =0.6*0.25 = 0.15
P (MA) = Probability of male votes for candidate A = P (M)*P (A) =0.4*0.75 = 0.30
P (MB) = Probability of male votes for candidate B = P (M)*P (B) =0.4*0.25 = 0.10
Otherwise they are dependent.
Example: The following are the results of a survey of 100 college students at Framingham State College and we are testing whether their political views are independent of their views on nuclear power.
The following 2 questions were asked:
1) What label most closely describes your political views (Democrat, Republican or Independent)?
2) What is your opinion on the use of nuclear power for the production of consumer energy (Approve, Disapprove or Undecided)?
Students Political Views vs. Their Opinions on Nuclear Power
DEMOCRAT | REPUBLICAN | INDEPENDENT | ROW TOTAL | |
APPROVE | 10 | 15 | 20 | 45 |
DISAPPROVE | 9 | 2 | 16 | 27 |
UNDECIDED | 8 | 2 | 18 | 28 |
COLUMN TOTAL | 27 | 19 | 54 | 100 GRAND TOTAL |
We want to test the following hypothesis:
H_{0}: The two characteristics are independent
H_{A}: The two characteristics are related
As with the goodness of fit test we looked at in the previous post, we want to calculate the Expected frequencies (E), for each cell of the table, from the Observed frequencies (O).
E (cell) = (row total)*(column total)/ (grand total)
Table of Observed Values (Expected Values)
DEMOCRAT | REPUBLICAN | INDEPENDENT | ROW TOTAL | |
APPROVE | 10 (12.15) | 15 (8.55) | 20 (24.30) | 45 |
DISAPPROVE | 9 (7.29) | 2 (5.13) | 16 (14.58) | 27 |
UNDECIDED | 8 (7.56) | 2 (5.32) | 18 (15.12) | 28 |
COLUMN TOTAL | 27 | 19 | 54 | 100 GRAND TOTAL |
We will use the Chi-Square test of Independence which is as follows:
χ^{2} = ∑ ((O-E)^{2}/E) = 11.10
We want to test at α=0.05 level of significance. We use Chi-Square tables with
df = (# of rows – 1)*(# of columns – 1) = 2*2 = 4
χ^{2 }(0.05, df=4) = 9.488
Since test_statistic = 11.10 > 9.488 = critical_value, we reject the null hypothesis and conclude that the 2 characteristics are related.
Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World.
Email Marketing You Can Trust
It’s A Math, Math World (Chi-Square Goodness-of-Fit Test)
In this week’s post, we will be analyzing categorical data. When dealing with categorical data, we are concerned with frequencies of each occurrence of the values of a random variable. We will be testing hypotheses of multiple proportions (more than 2 at a time). This technique we will use, called the Chi-Square Goodness of Fit Test, works for any set of proportions that add up to 1.
This test involves taking the observed frequencies (O) given in the problem and comparing these to the expected frequencies (E) which we calculate.
We are given, for example:
H0: p1 = 0.3, p2 = 0.2, p3 = 0.5; where 0.3 + 0.2 + 0.5 = 1
There are a total of n onbservartions so the expected frequencies are:
0.3 x n, 0.2 x n and 0.5 x n
If the observed frequencies differ too much from the expected frequencies, we would reject the null hypothesis.
We calculate the test statistic which is Χ^{2} = ∑ ((O-E)^{2}/E) which has the Chi-Square distribution. The properties of the Chi-Square are as follows:
- There is an infinite number of Chi-Square distributions, each one associated with a number called its degrees of freedom. We calculate df = k-1 where k is the number of categories. We use this number to specify which Chi-Square distribution we are using.
- The test statistic has a Chi-square distribution if the sample size is sufficiently large such that the expected value of each category is at least 5.
Ex. (From the textbook, General Statistics (2000), by Chase and Brown). We look at the production of electronic instruments. Four assembly lines are used to produce the same item. Each assembly line is equivalent in theory, so each should have the same rate of items produced that need servicing under warranty.
A decision was made to look at the next 100 instruments returned as defective and see how many came from each plant.
The observed values of returned instruments for plants 1 through 4 are respectively: 53,18,14,15.
Plant 1 operates two shifts per day, while the other 3 plants operate 1 shift per day each.
Carry out the test of equivalence of the assembly lines at 10% level of significance.
Hypothesis:
H0: p1 = 2/5 = 0.4, p2=p3=p4=1/5 = 0.2 (Because plant 1 operates 2 of the 5 shifts while the remaining plants each operate a single shift of the five)
Ha: H0 not true
We have to compare the observed frequencies (O) against the expected frequencies (E) using the Chi-Square Goodness of Fit Test:
Expected Frequencies
Assembly Line | P | NP = E |
1 | 0.4 | (100)(0.4) = 40 |
2 | 0.2 | (100)(0.2) = 20 |
3 | 0.2 | (100)(0.2) = 20 |
4 | 0.2 | (100)(0.2) = 20 |
Calculation of Test Statistic
Assembly Line | O | E | (O-E) | (O-E)^{2} | (O-E)^{2}/E |
1 | 53 | 40 | 13 | 169 | 4.225 |
2 | 18 | 20 | -2 | 4 | 0.200 |
3 | 14 | 20 | -6 | 36 | 1.800 |
4 | 15 | 20 | -5 | 25 | 1.250 |
χ^{2} = ∑ ((O-E)^{2}/E) = 7.457, df = k-1 = 4-1= 3 where k= number of groups. This value, χ^{2}, is the test statistic.
When df=3, χ^{2}(0.10) = 6.251 which we get from a Chi-Squared table or any software package. This is the critical value.
Since test statistic = 7.457 > 6.251 = critical value, we reject the null hypothesis, in a right tailed test, and conclude that the assembly lines are not equivalent.
NOTE: I will be taking the next few weeks off for vacation. I will be returning with my next blog post on Monday, October 18^{th}.
Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World
It’s A Math, Math World (ANOVA Part 2)
Email Marketing You Can Trust
It’s A Math, Math World (ANOVA Part1)
In previous weeks, we learned how to test a single hypothesis of the difference between two population means: i.e., test whether two means u1 and u2 are equal. What if we have more than two populations are we want to see if the means are equal? We want to compare more than 2 population means at the same time. This process is called Analysis of Variance (ANOVA).
Note we could conduct multiple pair-wise tests of the equality of means, but this would multiply the error rate considerably. In the ANOVA case, we test the following hypothesis:
H_{0}: u_{1} = u_{2} = u_{3} = … = u_{k}
H_{A}: not all the means are equal
The methodology is as follows (we are assuming EQUAL sample sizes in this example).This example is from the textbook General Statistics (2000) by Chase and Bown:
A large chemical company uses 4 manufacturing plants to produce the same fertilizer. The plants were built to be equivalent, so the mean output of fertilizer from each plant should be the same and have the same variability. We want to test that the weekly mean output (tons of fertilizer produced) is the same for each plant. This will of course vary week to week, but we are interested in the true mean weekly production for a plant.
H_{0}: u_{1} = u_{2} = u_{3} = u_{4}
H_{a}: Not all means are equal (at least one is different)
Weekly Production Figures for 5 weeks for 4 Fertilizer Plants (weekly production is in tons)
PLANT 1 | PLANT 2 | PLANT 3 | PLANT 4 | |
574 | 546 | 580 | 585 | |
578 | 556 | 570 | 582 | |
573 | 549 | 577 | 581 | |
568 | 551 | 575 | 589 | |
572 | 553 | 573 | 588 | |
Sample mean | 573 | 551 | 575 | 585 |
Sample variance | 13 | 14.5 | 14.5 | 12.5 |
If the sample means are clustered close together, this would tend to support H0.
A great degree of variability among the sample- means would suggest that not all of the population means are equal, thus supporting H_{A}.
The key to testing for equality of several population means is to look at the variability between the sample means. A large amount of variability would suggest that not all of the population means are equal. Therefore, we would reject H_{0} in favor of H_{A}, otherwise we would not reject H0.
“Large” is a relative term and this variability must be measured in terms of something. We will define large as being the condition that the variability between the sample- means is large in relation to the variability within the samples. When this is the case, we reject H_{0} and conclude that the population means are not all the same.
First we assume that the population variance, σ^{2}, is the same for all the plants, whether the means are equal or not. From our sample data, we will calculate 2 estimates:
- The within-sample estimate of σ^{2}
- The between sample estimate of σ^{2}
Estimate #1: Within-Sample Estimate
We pool the estimates the estimates of the sample variances by averaging them:
Estimate 1 = (13+14.5+14.5+12.5)/4 = 13.625
Estimate #2: Between-Sample Estimate
Let us assume for the moment that H0 is true, and then we can view the samples of production figures as 4 samples of size 5 from the same population. The 4 sample means are values of the random variable x_bar. By the Central Limit Theorem, we know that the standard deviation of x_bar is:
σ_{x_bar }= sqrt (σ^{2}/m) or σ2 = m x (σ_{x_bar})^{2}
We use the sample variance of the 4 values of x_bar which I will call s^{2}x_bar as an estimate of this variance we have to find.
We first need to find the grand mean of the 4 sample means which is = (573 + 571 + 575 + 585)/4
= 571
We calculate the sample variance s^{2}x_bar as follows:
Sample Mean | Sample mean – grand mean | (Sample mean – grand mean)^{2} |
573 | 0 | 0 |
551 | -2 | 4 |
575 | 2 | 4 |
585 | 0 | 0 |
Grand mean = 571 | 8 = SUM |
S2x_bar = SUM/(4-1) = 8/3
Estimate 2 = m x (s2x_bar) = 5 x (8/3) = 13.333
We combine the estimates as follows:
F-stat = (Estimate #1)/ (Estimate #2) = 13.625/13.333 = 1.021
The statistic, F-stat, follows an F distribution with df1= k-1 and df2 = n-k degrees of freedom where:
n= # of data values in all the samples.
k = # of populations
We express the degrees of freedom as an ordered pair df = (k-1, n-k)
In our example F-stat = 1.021 and compare it to the F distribution at α=0.05 and df = (3, 16)
Our critical value is 3.24 (from the F distribution tables), since F-stat < critical value, we fail to reject the H0 and we conclude that there is no difference between the mean output of the 4 plants.
Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World
It’s A Math, Math World (The Substitution Principle)
This week’s blog post is a little different than the past few weeks. It is a little less math-intensive, but still very relevant. I will introduce a concept in mathematics that extends from Algebra and Geometry into higher math. It is a concept that eludes some of my best tutoring students. It is the concept of The Substitution Principle.
Basically, the substitution principal states in plain English that:
If quantities have the same value then they are interchangeable.
Sometimes the wording is different or the concept is clouded by the details of the problem, but it merely states that if two quantities are equal, in numeric or algebraic value, then one can take the place of the other in your calculation or mathematical proof.
Think about it in terms of an equation:
Example 1:
Given: x+ y = z
z=5
Conclusion: substitute 5 in for z into the top equation and get
x+y=5
Example 2:
a=b and a=f
We can conclude that b=f
Example 3 (from Geometry):
A = πr^{2}
A=18πy
We can conclude that πr^{2} =18πy and we can solve for r or y.
Example 4 (from a geometric proof):
Given: <1 complementary to <3 (i.e. m<1 + m<3 = 90)
<2 complementary to <3 (i.e. m<2 + m<3 = 90)
Conclusion: m<1 + m<3 = m<2 + m<3
m<1 = m<2
As you can see, this principal has many useful applications.
Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World
It’s A Math, Math World (Hypothesis Tests))
Today’s blog post is really targeted as a “Statistics 101” for the early statistics student or non-statistician. Last week, we summarized the Central Limit Theorem and Confidence Intervals. This week we look at more inferential statistics, particularly hypothesis tests. Let me know your opinions!
Another way to see if 2 population means differ is to use a hypothesis test. To begin, we formulate a statement, or hypothesis, that both means do not differ and use the sample data to determine whether the collected data is consistent or inconsistent with this hypothesis.
We define the Null hypothesis as H0: u1 = u2 (no difference)
And the Alternative hypothesis as H1: u1 ≠ u2 (two-tailed test)
The corresponding one-tailed tests would yield H1: u1 < u2 OR
u1 > u2
We will consider the two tailed test in this case.
We have to determine the test statistic which is the calculated value that we will use to test the null hypothesis. In this case, we will assume the H0 is true:
t-stat = [(ybar_{1} – ybar_{2}) – (u1-u2)] /SE (ybar_{1}-ybar_{2})
We also have the significance level of the test that determines the degree of uncertainty or error in our process. This significance level, in our case it is equal to 0.05, determines, through a probability table, a certain critical value that we compare the t-stat to in order to determine whether to accept or reject the H0.
For example if |t-stat| > critical value, we say that the t-stat falls in the critical or rejection region and we reject the null hypothesis and conclude that the means are not equal.
However, if |t-stat| < critical value, then the t-stat is not in the critical region and we fail to reject the H0 and we conclude that there is not enough evidence to reject the H0.
Example:
In this case, we are considering two independent samples of size ≥ 30 so that we may use the standard normal distribution tables (smaller sample sizes would use the students-t distribution)
α = level of significance = 0.05
A study was conducted on two engines and it was hoped that the new engine produced less noise than their old engine. We want to test whether there is any difference between the noise levels produced by the engines.
SAMPLE | SAMPLE SIZE | SAMPLE MEAN (DECIBELS) | SAMPLE STANDARD DEVIATION (DECIBELS) |
STANDARD MODEL | 65 | 84 | 11.6 |
NEW MODEL | 41 | 72 | 9.2 |
We define the Null hypothesis as H0: u1 = u2 (no difference)
And the Alternative hypothesis as H1: u1 ≠ u2 (two-tailed test)
Let:
xbar_{1} = 84
xbar_{2} = 72
u1 – u2 = 0 (we assume H0 is true)
s_{1 }= 11.6
s_{2 }= 9.2
n_{1} = 65
n_{2 }= 41
SE = sqrt((s_{1})^{2}/n_{1} + (s_{2})^{2}/n_{2}) = sqrt ((11.6)^{2}/65 + (9.2)^{2}/41) = sqrt (2.0702 + 2.0644) = sqrt(4.1646)
= 2.03
T-stat = [(xbar_{1 }– xbar_{2}) – (u1-u2)]/SE
= (84-72 -0)/2.03
= 5.91
The Standard normal critical value is z_{0.025 }in each tail of the distribution. This value, read from the Standard Normal Table is 1.96
Since 5.91 > 1.96, the t-statistic falls in the critical (rejection) region for the H0.
Conclusion: We reject the null hypothesis and conclude that there is a difference in noise output between the 2 engines.
Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World
It’s A Math, Math World (CTL and CI’s)
Today’s blog post is really targeted as a “Statistics 101” for the early statistics student or non-statistician. Last week, we summarized basic probability and the normal distribution. This week we look at inferential statistics. Let me know your opinions!
We start by examining one of the main topics of this section which is the Central Limit Theorem (CLT):
Suppose we have a sample of 10,000 measurements and we take a random sample of 500 of these measurements and calculate a mean, ybar_1. We then repeat this process 299 times and get a total of 300 separate estimates of the mean and we call these ybar_1, ybar_2, …, ybar_300. We can consider these 300 measurements of the sample mean to be a sample of its own kind which has its own mean and variance. We can calculate these 2 quantities and look at the distribution of the sample mean. This is a sampling distribution.
The variability of random samples from the same population is called sampling variability. A probability distribution that characterizes some aspects of sampling variability is a sampling distribution. We make the following conclusions:
- The mean of the sampling distribution (u_x) is equal to the population mean (u)
- The standard deviation of the sampling distribution, or the standard error of the mean, is equal to the population standard deviation (sigma) divided by the square root of the size of each subsample (in this case n=500).
i.e., Standard Error (SE) = sigma/sqrt(500)
If we standardize the sample mean (for instance ybar_1), then
(ybar_1 –u_x)/SE → N(0,1),which is standard normal,
as the sample size gets very large, for any distribution. Thus we can use the standard normal table for probabilities for any distribution we are given.
We will use this concept in more depth when we look at the next 2 ideas.
The process of drawing conclusions about the population, based on observations in the sample is known as statistical inference. To make these conclusions, we have to consider how likely it is that the sample is representative – that it closely resembles – the population.
These decisions can be divided into 2 categories:
- Confidence Intervals (to be examined next)
- Hypothesis Tests (we will look at next time)
A confidence interval is based on a probability, which in simplest terms, is the likelihood of something occurring given that it is repeated a great number of times. It is also the relative frequency represented as a percentage. Probabilities have some very intuitive properties and statistical inference is based on probability. Without getting into a big dissertation regarding probability theory (see last week’s blog post), we can say that every confidence interval had a degree of uncertainty, usually set at 0.05.
The principle underlying a confidence interval is that we want to build a continuous range of values (where the parameter may lie) such that the parameter falls in that range 1-0.05 = 0.95% of the time with repeated sampling.
Let’s take an example:
Consider the population mean which we will call u
We have a collection of sample data represented by y1, y2, y3,…, y1000
We calculate the sample mean which is ybar
Now, ybar is probably not exactly equal to u but we hope it is close. The standard error of the mean, SE, is how far ybar tends to be from u.
The confidence interval: (ybar – 2*SE, ybar + 2*SE) is an approximate 95% confidence interval for u. Don’t worry about where the 2 comes from. It is a constant determined from a probability table (for the Standard Normal distribution) that is rounded up for our purposes. The interpretation of the CI is as follows:
IF under repeated sampling and repeated creation of CI’s from these samples, the true population mean will be contained in 95% of these CI’s.
The error rate of 5% means that the true population mean will not be contained in the remaining 5% of CI when the sampling is repeated a large number of times.
Example: Confidence Interval for the difference u1 – u2 of 2 population means.
CI: (ybar_1 – ybar_2) ± 2*SE(ybar_1 – ybar_2)
This interpretation is that, if the confidence interval included the point 0, then there is no difference between the means u1 and u2, or that u1 = u2
Examples:
(-1.023 < u1-u2 < 3.25)
Since the interval includes the point, 0, we conclude that there is no difference between the means u1 and u2.
(2.14 < u1-u2 <4.25)
Since this interval does not include the point, 0, we conclude that the points are not equal to reach other.
Next week, we continue with an examination of hypothesis tests.
Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World