Chi-Square Test

Index  Notes  Labs  Web Quests  Assignments  Quizzes  Links  Student Work

Statistics can be used to determine if differences among groups are significant, or simply the result of predictable error.  The statistical test most frequently used to determine whether data obtained experimentally provide a good fit, or approximation to the expected or theoretical data is the Chi-Square test.  This test can be used to determine if deviations from the expected values are due to chance alone, or to some other circumstance.  For example, consider corn seedlings resulting from an F1 cross between parents heterozygous for color.

A Punnett square of the F1 cross GgXGg would predict that the expected proportion of green:albino seedlings would be 3:1.  Use this information to fill in the expected (e) column and the (o-e) column in the table below

 Phenotype Genotype #Observed (o) Expected (e) (o-e) Green GG or Gg 72 Albino gg 12 Total 84

There is a small difference between the observed and expected results, but are these data close enough that the difference can be explained by random chance or variation in the sample?

To determine if the observed data fall within acceptable limits, a Chi-square analysis is performed to test the validity of a null hypothesis; that there is no statistically significant difference between the observed and expected data.  If the Chi-square analysis indicates that the data vary too much from the expected 3:1, an alternative hypothesis is accepted.

The formula for Chi-square is

X2 = E (o-e)

E

Where o=observed number of individuals

e=expected number individuals

E=the sum of values (in this case, the differences, squared, divided by the

number expected)

a.      this statistical test will examine the null hypothesis, which predicts that the data from the experimental cross above will be expected to fit the 3:1 ratio.

b.      Copy the data from the above table to complete the table below.

 Phenotype Observations(o) Expected (e) (o-e) (o-e)2 (o-e)2 e Green 72 Albino 12 Chi square

c.      Your calculations should give you a value for Chi-square as 5.14.  This value is then compared to the following table.

Critical Values of the Chi-square Distribution

 Degrees Of Freedom (df) Probability(p) 1 2 3 4 5 0.05 3.84 5.99 7.82 9.49 11.1 0.01 6.64 9.21 11.3 13.2 15.1 0.001 10.8 13.8 16.3 18.5 20.5

How to Use the Critical Values Table:

1.      Determine the degrees of freedom for your experiment.  It is the number of categories minus 1.  Since there are two possible genotypes, for this experiment df=1 (2samples – 1).  If the experiment gathered data for a dihybrid cross, there would be four possible phenotypes, and therefore 3 degrees of freedom.

2.      Find the p value.  Under the 1 df column, find the critical value in the probability (p) = 0.05 row: it is 3.84.  What does this mean?  If the calculated Chi-square value is greater than or equal to the critical value from the table, then the null hypothesis is rejected.  In other words, chance alone cannot explain the deviations we observed and there is therefore reason to doubt our original hypothesis (or to question our data collection accuracy.) The minimum probability for rejecting a null hypothesis in the sciences is generally 0.05.

3.      These results are said to be significant at a probability of p=0.05.  This means that only 5% of the time would you expect to see similar data if the null hypothesis were correct; thus you are 95% sure that data do not fit into a 3:1 ratio.

4.      If the calculated value was 7.0, then the null hypothesis would still be rejected, but this time at a probability of p=0.01.  This means that less than 1% of the time would you expect to collect the observed data if the null hypothesis were correct.  Put another way, you would be 99% sure your data do not fit the expected 3:1 ratio.

5.      Since these data do not fit the expected 3:1 ratio, you must consider reasons for this variation.  Additional experimentation would be necessary.  Perhaps the sample size is too small, or errors were made in data collection.  In this example, perhaps the albino seedlings are under-represented because they died before the counting was performed.

Example 2:  In a study of incomplete dominance in tobacco seedlings, the following counts were made from a cross between two heterozygous (Gg) plants:

 Phenotype Genotype Observed Green GG 22 Yellow-green Gg 50 Albino gg 12 Total 84

Complete the Chi-Square Test