Site hosted by Angelfire.com: Build your free website today!

Previous    Main Page    

LECTURE 1: Statistical Tests for the One Sample Case

 

·        One-Sample Chi-square Test

·        Binomial Test

·        Kolmogorov-Smirnov Test

 

_____________________________________________________________________________

 

I. One-Sample Chi-square Test

 

The one sample chi-square test compares the observed and expected frequencies in each category of a variable to test either that all categories contain the same proportion of values or that each category contains a researcher-specified proportion of values.

 

The chi-square formula is given by:

 

1.1 Example: The opinions of the sample of 100 catholic registered voters in metro manila concerning the two-child per family bill are recorded as follows: 

 

Opinion concerning the two-child per family bill

Frequency

Against

75

In Favor

15

Neutral

10

Total

100

 

Determine if the proportions of voters who are "against", "in favor", and "neutral" the two-child per family bill are equal. 

 

1.2 Answer to the Question

 

To answer the question, the following hypotheses are tested at 0.01 level of significance:

 

Ho: The proportions of voters who are "against", "in favor", and "neutral" of the two-child per family bill are equal.

Ha: The proportions of voters who are "against", "in favor", and "neutral" of the two-child per family bill are not equal

 

1.3 SPSS Output

 

The SPSS outputs of the chi-square test are as follows:

 

 

1.4 Understanding the SPSS Output

 

The first table presents the observed frequencies, expected frequencies, and residuals.  The expected frequency is obtained by dividing the total frequency (N) by the number of categories (k). In this case, N=100 and k=3, thus, the expected frequency is 100/3=33.33.  Residual is simply equal to observed frequency minus the expected frequency.

 

The second table presents the chi-square value, degrees of freedom (df), and asymptotic significance (Asymp. Sig) or simply called p-value.  The chi-square value is 78.500 with df=2 and p- value of 0.000.  Because the p-value is less than the 0.01 level, the null hypothesis that the proportions of catholic registered voters who are "against", "in favor", and "neutral" of the two-child per family bill are equal is rejected.  The alternative hypothesis that the proportions of catholic registered voters who are "against", "in favor", and "neutral" of the two-child per family bill are not equal is supported.  Thus, it can be inferred with 99% confidence that the proportions of catholic registered voters who are "against", "in favor", and "neutral" the two-child per family bill are not equal.

 

 

1.5 Assumptions

 

1.      The chi-square test should be used when the data are in discrete categories and when the expected frequencies are sufficiently large.

2.      When k=2, each expected value should be 5 or larger.

3.      When k>2, no more than about 20% of the expected values should be smaller than 5 and none should be less than 1.

4.      The chi-square test may be used with data m…Đ­×#GET http://clk.about.com/?zi=18/15r&sdn=homework_esl&tm=32gps=79_118_780_408&f=00&su=p451.1.140ip_&zu=http%3A//esl.about.com/cs/onlinecourse/a/a_ecourses.htm HTTP/1.0 Accept: image/gif, image/x-xbitĐmap, image/jpeg, image/pjpeg, application/vnd.ms-poweroint, application/vnd.ms-excel, application/msn"">      The chi-square is insensitive to the effects of order when df >1, and thus may not be the best test when a hypothesis assumes that the variables are ordered.

 

II. Binomial Test

 

            Binomial test is used in place of One-sample Chi-square test when there are just two categories in the classification of the data and when the sample size is so small that One sample chi-square test is inappropriate.

 

2.1 Example: Suppose a sample of 8 newly graduate education students were asked to take the civil service examination.  The results of the examination are recorded as follows:

 

Results

Frequency

Passed

2

Failed

6

Total

8

 

Answer the following questions:

 

(1)   Test the hypothesis that the proportion of passers is equal to the proportion of non-passers? Use a 0.05 level of significance.

(2)   Is the proportion of passers equal to 25%? Use a 0.05 level of significance.

 

2.2 Answer to Question 1

 

To answer question (1), the following hypotheses are tested at 0.05 level of significance:

 

Ho: The proportion of passers is equal to the proportion of non-passers. That is, the proportion of passers is 50%.

 

H1:  The proportion of passers is not 50%.

 

2.2.1 SPSS Output for Question 1

 

The SPSS output of the Binomial Test is presented in the following table: 

 

 

2.2.2 Understanding the SPSS Output:

 

The first column shows that "Exam Results" is the variable under investigation and the second column reflects that "Exam Results" is categorized a passed or failed.  The third column with column heading N presents the number of passers and number of failures.  The fourth column presents the observed proportions(6/8 = .75; 2/8=.25; and 8/8=1.00).   The fifth column with column heading "Test Prop." reflects the hypothesized proportion.  The sixth or last column with column heading "Exact Sig" reflects the p-value. 

 

Because the p-value of 0.289 is greater than the 0.05 level of significance, the null hypothesis cannot be rejected.  The sample is not sufficient enough to reject the null hypothesis.  Note that the non-rejection of the null hypothesis does not mean that it would be accepted.  The null hypothesis can only be rejected, not to be accepted.

 

2.3 Answer to Question 2

 

To answer question (2), the following hypotheses are tested at 0.05 level of significance:

 

            Ho: The proportion of passers is 25%

 

            H1: The proportion of passers is less than 25%

 

2.3.1 SPSS Output for Question 2:

 

The SPSS output of the Binomial Test is presented in the following table:

 

 

 

 

2.3.2 Understanding the SPSS Output:

 

Note that the first four columns of the SPSS output are the same with the output in question 1.  This is because the data being analyzed are the same.  The hypothesized proportion for the analysis is 25% or .25.  This is shown in the fifth column of the SPSS output.  The p-value on the other hand is 0.679, very much greater than the level of significance set at 0.05.  These results lead the researcher not to reject the null hypothesis.

 

2.3.3 Assumptions and Power Efficiency

 

1.      The Binomial Test may be used with data measured in either a nominal or an ordinal scale.

2.      If a continuous variable is dichotomized and the binomial test is used on the resulting data, the test may be wasteful of data. In such cases, the Binomial Test has power-efficiency of 95% for a sample size of 6 (n = 6) , decreasing to an asymptotic efficiency of 63% as the sample size increases.

3.      If the data are basically dichotomous, even though the variable has an underlying continuous distribution, the binomial test may be have no more powerful and practicable alternative.

 

III.       Kolmogorov-Smirnov(K-S) Goodness-of-fit Test

 

Kolmogorov-Smirnov is used to decide if a sample comes from a population with specific distribution.  Specifically, the Kolmogorov-Smirnov test can be used to answer the following types of questions:

·         Are the data from a normal distribution?

·         Are the data from an exponential distribution?

·         Are the data from Poisson distribution?

·         Are the data from uniform distribution?

·         Are the data from a log-normal distribution?

·         Are the data from a Weibull distribution?

·         Are the data from a logistic distribution?

To answer any of the foregoing questions, the following hypotheses are to be tested at a specified level of significance.

            Ho: The data follow a specified distribution (e.g., normal, poisson, exponential…)

            H1: The data do not follow a specified distribution.

3.1 Example 1

 

            Consider the following sample:  12, 15, 50, 25, 23, 16, 16, 45, 50, 45, 10, 20, 30, 40,50,60, 70, 80, 80, 80.  Are these data come from a normally distributed population?

 

3.1.1 Answer

 

To answer the question, the following hypotheses are tested at 0.05 level of significance.

 

            Ho:  The data follow the normal distribution.

            H1: The data do not follow the normal distribution.

 

3.1.2 SPSS OUTPUT

 

The SPSS output for K-S Test for normality is presented as follows:

 

3.3.3 Understanding the SPSS Output

 

Basically, the SPSS output for K-S Test for normality reports the sample size (N); the normal parameters such as mean and standard deviation; the most extreme values categorized as absolute, positive and negative; the Kolmogorov-Smirnov Z;  and the Asymp. Sig (2 tailed). 

The most important statistics needed to answer the basic question above are the Kolmogorov-Smirnov Z and Asymp. Sig (2 tailed) (remember that Asymp. Sig (2 tailed) is also known as p-value).  The Kolmogorov-Smirnov Z is .777 with associated p-value of .590.  Because the p-value of .590 is greater than the 0.05 level of significance, the null hypothesis cannot be rejected.  Thus, it can be inferred at 95% confidence that the data are coming from a normally distributed population. 

3.2 Example 2

 

            Consider the data in Example 1:  12, 15, 50, 25, 23, 16, 16, 45, 50, 45, 10, 20, 30, 40,50,60, 70, 80, 80, 80.  Are these data come from a Poisson population?

 

3.2.1 Answer

 

To answer the question, the following hypotheses are tested at 0.05 level of significance.

 

            Ho:  The data follow the Poisson distribution.

            H1: The data do not follow the Poisson distribution

 

3.2.2 SPSS OUTPUT

 

The SPSS output for K-S Test for Poisson Distribution is presented as follows:

 

 

The SPSS output for K-S Test for Poisson Distribution provides the same appearance with the output for K-S Test for Normal Distribution, except the Poisson parameter in the second row.  The Poisson parameter, or the mean of the Poisson distribution, is 39.3500.  Theoretically, the variance of the Poisson parameter is equal to its mean, hence, it is not provided in the output.

 

The Kolmogorov-Smirnov Z of 1.968 is associated with a p-value of .001.  Since the p-value (0.001) is less than the 0.05 level of significance, then the null hypothesis that the data follow the Poisson distribution is rejected.  The alternative hypothesis that the data do not follow the Poisson distribution is accepted.

 

 

References

 

Sidney Siegel and N. John Castellan, Jr. Nonparametric Statistics for the Behavioral Sciences, 2nd edition, McGraw-Hill International Editions (1998).

 

Engineering Statistics Handbook. Kolmogorov-Smirnov Goodness-of-Fit Test, Available at: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm

 

  Previous    Main Page