Archive for October, 2010

It’s A Math, Math World (Contingency Tables & Independence)

In this week’s post, we will be analyzing categorical data with contingency tables.  We want to see if 2 or more characteristics are related (dependent) or unrelated (independent). The following examples are from the textbook, General Statistics , by Chase and Brown (2000).

What do we mean by the independence of 2 characteristics? Suppose candidate A and candidate B are running for public office and 75% of the voters favor candidate A while 25% favor candidate B. Consider two characteristics: choice of candidate and gender of voter.  These characteristics are independent if the percentage of voters following candidate A and following candidate B are the same for both genders (i.e. 75% of men and 75% of women follow candidate A while 25% of men and 25% of women follow candidate B).  If for some reason the percentage favoring candidate A was greater in men, then the characteristics would be related or dependent.

We can create the following contingency table of the 4 possible combinations of the 2 factors:

  FAVOR CANDIDATE A FAVOR CANDIDATE B
FEMALES Female and favor candidate A Female and favor candidate B
MALES Male and favor candidate A Male and favor candidate B

 

Suppose 60% of voters in this election are female.

P (A) = Probability of vote for candidate A = 0.75

P (B) = Probability of vote for candidate B = 0.25

P (F) = Probability of female voter = 0.60

P (B) = Probability of male voter = 0.40

If candidate choice and gender of voter are independent, then

P (FA) = Probability of female votes for candidate A = P (F)*P (A) =0.6*0.75 = 0.45

P (FB) = Probability of female votes for candidate B = P (F)*P (B) =0.6*0.25 = 0.15

P (MA) = Probability of male votes for candidate A = P (M)*P (A) =0.4*0.75 = 0.30

P (MB) = Probability of male votes for candidate B = P (M)*P (B) =0.4*0.25 = 0.10

Otherwise they are dependent.

Example: The following are the results of a survey of 100 college students at Framingham State College and we are testing whether their political views are independent of their views on nuclear power.

The following 2 questions were asked:

1) What label most closely describes your political views (Democrat, Republican or Independent)?

2) What is your opinion on the use of nuclear power for the production of consumer energy (Approve, Disapprove or Undecided)?

Students Political Views vs. Their Opinions on Nuclear Power

  DEMOCRAT REPUBLICAN INDEPENDENT ROW TOTAL
APPROVE 10 15 20 45
DISAPPROVE 9 2 16 27
UNDECIDED 8 2 18 28
COLUMN TOTAL 27 19 54 100 GRAND TOTAL

 

We want to test the following hypothesis:

H0: The two characteristics are independent

HA: The two characteristics are related

As with the goodness of fit test we looked at in the previous post, we want to calculate the Expected frequencies (E), for each cell of the table, from the Observed frequencies (O).

E (cell) = (row total)*(column total)/ (grand total)

Table of Observed Values (Expected Values)

  DEMOCRAT REPUBLICAN INDEPENDENT ROW TOTAL
APPROVE 10 (12.15) 15 (8.55) 20 (24.30) 45
DISAPPROVE 9 (7.29) 2 (5.13) 16 (14.58) 27
UNDECIDED 8 (7.56) 2 (5.32) 18 (15.12) 28
COLUMN TOTAL 27 19 54 100 GRAND TOTAL

 

We will use the Chi-Square test of Independence which is as follows:

χ2 = ∑ ((O-E)2/E) = 11.10

We want to test at α=0.05 level of significance. We use Chi-Square tables with

df = (# of rows – 1)*(# of columns – 1) = 2*2 = 4

χ2 (0.05, df=4) = 9.488

Since test_statistic = 11.10 > 9.488 = critical_value, we reject the null hypothesis and conclude that the 2 characteristics are related.

Like what you read? Get blogs delivered right to your inbox as I post them so you can start standing out in your job and career. There is not a better way to learn or review college level stats topics than by reading, It’s A Math, Math World.

Email Marketing You Can Trust