Elementary Statistics
 |Sofia Home | Content Gallery |
Home
Syllabus
Schedule
Lessons
Assignments
Exams
Resources
Calculator

""

Lesson 11.3 Chi-Square Test of Independence

Test of Independence

In a chi-square hypothesis test of independence, you determine whether or not two factors are independent. The data is displayed in a contingency table.

For example, a bake shop might want to know if gender and pie preference are independent. The two factors are gender and pie preference. A sample contingency table is shown below.

Pie Preference

Gender   Apple Pumpkin Pecan
Male 40 10 30
Female 20 30 10

We generally write the null and alternate hypotheses in sentences.

Example:

Ho: Pie preference and gender are independent.

Ha: Pie preference and gender are not independent.

Back to Top

Notation

The test statistic for a test of independence is:

O = the observed values (data).

E = the expected values (the values you would expect if the null hypothesis were true).

i = the number of rows in the contingency table.

j = the number of columns in the contingency table.

The degrees of freedom df = (i - 1)(j - 1).

The test statistic is a measure of how far the observed values (O) are from the expected values (E) and is either 0 or positive. If the test statistic is large, then the observed values are far from what we would expect if Ho were true. So, we would reject Ho.

The test of independence is right-tailed.  

The following formula calculates the expected value:

"Row total" and "column total" represent the totals of the row and the column of a particular cell in the table.

Example: In the example above, find the expected value for being male and preferring pumpkin pie.

Row total for being male = 40 + 10 + 30 = 80

Column total for preferring pumpkin pie = 10 + 30 = 40

Total number surveyed = 40 + 10 + 30 + 20 + 30 + 10 = 140

We expect 22.9 of the males to prefer pumpkin pie. It is OK for the answer to be a decimal because it is an expected value.

Back to Top

Hypothesis Testing Problems Using TI-83 or TI-84 calculators

Example: The following table provides data from a study of 128 students at De Anza College in Cupertino, California, USA. A researcher wanted to know if their age had anything to do with the perceived difficulty of the Elementary Statistics class at the college. The researcher asked the students if they thought that the elementary statistics course was more difficult, the same difficulty, or less difficult than other math courses they had taken.

Elementary Statistics is

 

 

More difficult
Same
Less difficult
Age (years)
Under 25
22
9
4
25 to 45
18
28
5
Over 45
16
13
13

The researcher wants to know if age of the student and perceived difficulty of the Elementary Statistics course are independent.

Formulate the 2 hypotheses.

Ho: Age of the student and perceived difficulty of the elementary statistics class are independent.

Ha: Age of the student and perceived difficulty of the elementary statistics class are not independent.

Determine the random variable and the distribution for the test.

There are 3 rows and 3 columns.

df = (3 - 1)(3 - 1) = 4

Using the test statistic calculated from the data, calculate the p-value.

TI-83 calculator:

  • Press 2nd MATRX.
  • Arrow over to EDIT, press 1:[A], and press ENTER.
  • Press 3 ENTER 3 ENTER (for the 3 rows and the 3 columns).
  • Enter the data in the table by row, pressing ENTER after each entry.
  • Press 2nd QUIT.
  • Press STAT TESTS.
  • Arrow down to C: X 2 - Test. You should see Observed [A] and Expected [B].
  • Arrow down to Calculate and press ENTER.
  • The test statistic is 16.56 to 2 decimal places. The p-value is 0.0023 to 4 decimal places.

Compare α and the p-value and make a decision.

Assume α = 0.05

Since 0.05 > 0.0023 (α> p-value), we reject Ho.

Because the p-value is so small, the test is strongly against the null hyothesis.

Write an appropriate conclusion. 

We conclude that the age and perceived difficulty of the Elementary Statistics class are not independent.

The perceived difficulty of the class depends upon the age of the student.

Expected Value Calculation:

To calculate the expected value for the 25 - 45 year old age group who feel they have more difficulty, we use the expected value formula:

The 25 - 45 year old age group row has a total of

18 + 28 + 5 = 51.

The more difficult column has a total of

22 + 18 + 16 = 56.

to 1 decimal place. We expect 22.3 students who are 25 - 45 to find the course more difficult.

Example

The next example is a hypothesis test to determine if the factors gender and hiking preference are independent.  Close the window when you are finished viewing the example. You will return here.

Back to Top

Think About It

Do the Try-It examples in Introductory Statistics. Verify the numbers. The calculator instructions follow the example.

This is the last section of this lesson.

Back to Top

 

Up » 11.1 Chi-Square Probability » 11.2 Chi-Square Goodness-of-Fit Test » 11.3 Chi-Square Test of Independence

Content Developed by Susan Dean and Barbara Illowsky, Licensed under a Creative Commons License
Published by the Sofia Open Content Initiative
© 2004 Foothill-De Anza Community College District & The William and Flora Hewlett Foundation