Previous Main Page
·
One-Sample
Chi-square Test
·
Binomial
Test
·
Kolmogorov-Smirnov
Test
_____________________________________________________________________________
I.
One-Sample Chi-square Test
The one sample chi-square test compares the observed
and expected frequencies in each category of a variable to test either that all
categories contain the same proportion of values or that each category contains
a researcher-specified proportion of values.
The chi-square formula is given by:
1.1 Example:
The opinions of the sample of 100 catholic registered voters in metro manila
concerning the two-child per family bill are recorded as follows:
Opinion
concerning the two-child per family bill |
Frequency |
Against |
75 |
In
Favor |
15 |
Neutral |
10 |
Total |
100 |
Determine if the proportions of voters who are
"against", "in favor", and "neutral" the two-child
per family bill are equal.
1.2 Answer
to the Question
To answer the question, the following hypotheses are
tested at 0.01 level of significance:
Ho: The
proportions of voters who are "against", "in favor", and
"neutral" of the two-child per family bill are equal.
Ha: The
proportions of voters who are "against", "in favor", and
"neutral" of the two-child per family bill are not equal
1.3 SPSS Output
The SPSS outputs of the chi-square test are as
follows:
1.4
Understanding the SPSS Output
The first table presents the observed frequencies,
expected frequencies, and residuals. The
expected frequency is obtained by dividing the total frequency (N) by the number
of categories (k). In this case, N=100 and k=3, thus, the expected frequency is
100/3=33.33. Residual is simply
equal to observed frequency minus the expected frequency.
The second table presents the chi-square value,
degrees of freedom (df), and asymptotic significance (Asymp. Sig) or simply
called p-value. The chi-square
value is 78.500 with df=2 and p- value of 0.000.
Because the p-value is less than the 0.01 level, the null hypothesis that
the proportions of catholic registered voters who are "against",
"in favor", and "neutral" of the two-child per family bill
are equal is rejected. The
alternative hypothesis that the proportions of catholic registered voters who
are "against", "in favor", and "neutral" of the
two-child per family bill are not equal is supported. Thus, it can be inferred with 99% confidence that the
proportions of catholic registered voters who are "against", "in
favor", and "neutral" the two-child per family bill are not
equal.
1.5
Assumptions
1.
The chi-square test should be used when the data are in discrete
categories and when the expected frequencies are sufficiently large.
2.
When k=2, each expected value should be 5 or larger.
3.
When k>2, no more than about 20% of the expected values should be
smaller than 5 and none should be less than 1.
4.
The chi-square test may be used with data m…Đ×#GET http://clk.about.com/?zi=18/15r&sdn=homework_esl&tm=32gps=79_118_780_408&f=00&su=p451.1.140 ip_&zu=http%3A//esl.about.com/cs/onlinecourse /a/a_ecourses.htm HTTP/1.0
Accept: image/gif, image/x-xbitĐmap, image/jpeg, image/pjpeg, application/vnd.ms-power oint, application/vnd.ms-excel, application/msn"">
The chi-square is insensitive to the effects of order when df >1, and
thus may not be the best test when a hypothesis assumes that the variables are
ordered.
II. Binomial
Test
Binomial test is used in place of One-sample
Chi-square test when there are just two categories in the classification of the
data and when the sample size is so small that One sample chi-square test is
inappropriate.
2.1 Example:
Suppose a sample of 8 newly graduate education students were asked to take the
civil service examination. The
results of the examination are recorded as follows:
Results |
Frequency |
Passed |
2 |
Failed |
6 |
Total |
8 |
Answer the following questions:
(1)
Test the hypothesis that the proportion of passers is equal to the
proportion of non-passers? Use a 0.05 level of significance.
(2)
Is the proportion of passers equal to 25%? Use a 0.05 level of
significance.
2.2 Answer to
Question 1
To answer question (1), the following hypotheses are
tested at 0.05 level of significance:
Ho: The proportion of passers is equal to the proportion
of non-passers. That is, the proportion of passers is 50%.
H1: The
proportion of passers is not 50%.
2.2.1 SPSS
Output for Question 1
The SPSS output of the Binomial Test is presented in
the following table:
2.2.2
Understanding the SPSS Output:
The first column shows that "Exam Results"
is the variable under investigation and the second column reflects that
"Exam Results" is categorized a passed or failed.
The third column with column heading N presents the number of passers and
number of failures. The fourth
column presents the observed proportions(6/8 = .75; 2/8=.25; and 8/8=1.00). The fifth column with column heading "Test
Prop." reflects the hypothesized proportion.
The sixth or last column with column heading "Exact Sig"
reflects the p-value.
Because the p-value of 0.289 is greater than the 0.05
level of significance, the null hypothesis cannot be rejected.
The sample is not sufficient enough to reject the null hypothesis.
Note that the non-rejection of the null hypothesis does not mean that it
would be accepted. The null
hypothesis can only be rejected, not to be accepted.
2.3 Answer to
Question 2
To answer question (2), the following hypotheses are
tested at 0.05 level of significance:
Ho: The proportion of passers is 25%
H1: The proportion of passers is less than 25%
2.3.1 SPSS
Output for Question 2:
The SPSS output of the Binomial Test is presented in
the following table:
2.3.2 Understanding
the SPSS Output:
Note that the first four columns of the SPSS output
are the same with the output in question 1.
This is because the data being analyzed are the same.
The hypothesized proportion for the analysis is 25% or .25.
This is shown in the fifth column of the SPSS output.
The p-value on the other hand is 0.679, very much greater than the level
of significance set at 0.05. These
results lead the researcher not to reject the null hypothesis.
2.3.3 Assumptions
and Power Efficiency
1.
The Binomial Test may be used with data measured in either a nominal or
an ordinal scale.
2.
If a continuous variable is dichotomized and the binomial test is used on
the resulting data, the test may be wasteful of data. In such cases, the
Binomial Test has power-efficiency of 95% for a sample size of 6 (n = 6) ,
decreasing to an asymptotic efficiency of 63% as the sample size increases.
3.
If the data are basically dichotomous, even though the variable has an
underlying continuous distribution, the binomial test may be have no more
powerful and practicable alternative.
Kolmogorov-Smirnov is used to decide if a sample comes from a population with specific distribution. Specifically, the Kolmogorov-Smirnov test can be used to answer the following types of questions:
·
Are the
data from a normal distribution?
·
Are the
data from an exponential distribution?
·
Are the
data from Poisson distribution?
·
Are the
data from uniform distribution?
·
Are the
data from a log-normal distribution?
·
Are the
data from a Weibull distribution?
·
Are the
data from a logistic distribution?
To answer any of the foregoing questions, the following hypotheses are to be tested at a specified level of significance.
Ho: The data follow a specified distribution (e.g., normal, poisson,
exponential…)
H1: The data do not follow a specified distribution.
3.1 Example 1
Consider the following sample: 12, 15, 50, 25, 23, 16, 16, 45, 50, 45, 10, 20, 30, 40,50,60, 70, 80, 80, 80. Are these data come from a normally distributed population?
To answer the question,
the following hypotheses are tested at 0.05 level of significance.
Ho:
The data follow the normal distribution.
H1:
The data do not follow the normal distribution.
3.1.2 SPSS OUTPUT
The SPSS output for K-S Test for normality is presented as follows:
3.3.3 Understanding the SPSS Output
Basically, the SPSS output for K-S Test for normality reports the sample size (N); the normal parameters such as mean and standard deviation; the most extreme values categorized as absolute, positive and negative; the Kolmogorov-Smirnov Z; and the Asymp. Sig (2 tailed).
The most important statistics needed to answer the basic question above
are the Kolmogorov-Smirnov Z and Asymp. Sig (2 tailed) (remember that Asymp. Sig
(2 tailed) is also known as p-value). The
Kolmogorov-Smirnov Z is .777 with associated p-value of .590.
Because the p-value of .590 is greater than the 0.05 level of
significance, the null hypothesis cannot be rejected.
Thus, it can be inferred at 95% confidence that the data are coming from
a normally distributed population.
3.2 Example 2
Consider the data in Example 1: 12, 15, 50, 25, 23, 16, 16, 45, 50, 45, 10, 20, 30, 40,50,60, 70, 80, 80, 80. Are these data come from a Poisson population?
To answer the question,
the following hypotheses are tested at 0.05 level of significance.
Ho:
The data follow the Poisson distribution.
H1:
The data do not follow the Poisson distribution
3.2.2 SPSS OUTPUT
The SPSS output for K-S Test for Poisson Distribution is presented as follows:
The SPSS output for K-S Test for Poisson Distribution provides the same appearance with the output for K-S Test for Normal Distribution, except the Poisson parameter in the second row. The Poisson parameter, or the mean of the Poisson distribution, is 39.3500. Theoretically, the variance of the Poisson parameter is equal to its mean, hence, it is not provided in the output.
The Kolmogorov-Smirnov Z of 1.968 is associated with a p-value of .001. Since the p-value (0.001) is less than the 0.05 level of significance, then the null hypothesis that the data follow the Poisson distribution is rejected. The alternative hypothesis that the data do not follow the Poisson distribution is accepted.
References
Sidney Siegel and N. John Castellan, Jr. Nonparametric
Statistics for the Behavioral Sciences, 2nd edition, McGraw-Hill
International Editions (1998).