Analysis of Variance Review Questions
1.
What are the 2
big “R’s” of experimental design, and why are they important?
Randomization – insures that each treatment has an
equal chance of being assigned to each experimental unit. Needed in order to
guarantee unbiased estimates of treatment effects and experimental error.
Replication – each treatment level should be applied
to multiple experimental units. Needed in order to construct an estimate of experimental error.
2.
What is the
between group sum of squares measuring?
What is the within group sum of squares measuring?
The between group sum of squares is measuring the
deviation of the treatment means from the overall mean. That is, it is measuring the “separation”
between the treatment groups. The within
group sum of squares is measuring the deviation of the individual observations
from their respective treatment means.
It is measuring the variability within the treatment groups. That is, it provides a measure of the
homogeneity of the treatment groups.
3.
How can we use
the between group and within group sums of squares to determine whether there
is a difference between the treatment means for the different levels of our
treatment?
If the between group variability is large in
comparison to the within group variability, that would indicate that there is a
true difference between the means for the different levels of our treatment
groups.
4.
The data for this
example come from Exercise 7.4.1 (p.150) in Steele, Torrie,
and Dickey. I have coded the data in SAS
as follows:
WEIGHT
represents the average plant weight in grams of red clover
BREED
represents the level of inbreeding with
1 = no inbreeding, 2 = slight inbreeding, 3 =
moderate inbreeding,
4 = strong inbreeding
I
ran the following SAS code to generate the analysis of variance shown below:
proc glm ;
class
breed ;
model
weight = breed ;
contrast
'contrast 1' breed 1
-1 0
0 ;
contrast
'contrast 2' breed 1
0 -1
0 ;
contrast
'contrast 3' breed 1
0 0
-1 ;
run ;
The GLM Procedure
Class Level
Information
Class Levels Values
breed 4 1 2 3 4
Number of
observations 39
The GLM Procedure
Dependent
Variable: weight
Sum of
Source DF Squares Mean Square F Value
Pr > F
Model 3 56411.78419 18803.92806 23.06
<.0001
Error 35 28545.13889 815.57540
Corrected
Total 38 84956.92308
R-Square Coeff Var Root
MSE weight Mean
0.664005 13.58425 28.55828 210.2308
Source DF Type I
breed 3 56411.78419 18803.92806 23.06
<.0001
Source
breed 3 56411.78419 18803.92806 23.06
<.0001
Contrast
contrast
1 1 9322.84444 9322.84444 11.43
0.0018
contrast
2 1 25844.06349 25844.06349 31.69
<.0001
contrast
3 1 54531.14683 54531.14683 66.86
<.0001
Based
on this code and output, answer the following questions:
a.
Report the Between group sum of squares. Report the Within
group sum of squares.
The Between
group sum of squares is the model sum of squares: 56,411.78
The Within
group sum of squares is the error sum of squares: 28,545.14
b.
Conduct a
hypothesis test to determine whether or not the level of inbreeding has any
effect on the weight.
Ho: All treatment means are the same
Ha: At least one treatment mean is different
F =23.06, p-value<0.0001
At the 5%
significance level, since the p-value is less than 0.05, we reject the null
hypothesis and conclude that there are differences between the mean weights for
the different levels of inbreeding.
c.
The mean of
treatment group 1 (no inbreeding) is 271.56, and the mean of treatment group 2
(slight inbreeding) is 220.67. Construct
a 95% confidence interval for the difference between the mean weight for plants
with no inbreeding and the mean weight for plants with slight inbreeding. Based on this confidence interval, does there
appear to be a significant difference between the average weights for these 2
groups? Why, or why not?
(271.56 –
220.67) +/- (2.0315)*sqrt{[(815.5754)/9] + [(815.5754)/6]}
50.89 +/-
(2.0315)*sqrt(90.62 + 135.93)
50.89 +/-
(2.0315)*(15.05)
50.89 +/- 30.57
So, we are
95% confident that the true difference between the treatment means is between
20.32 and 81.46. And, since this
interval does NOT contain 0, we can say that there is a significant difference
between the average weights for plants with no inbreeding and plants with
slight inbreeding.
d.
What hypothesis
is being tested by contrast 1? Based on
the p-value given for the F-test, what is your conclusion? How does this compare to the conclusion from
part (c)?
Ho: Mno inbreeding - Mslight
inbreeding = 0
Ha: Mno inbreeding - Mslight
inbreeding not = 0
p-value = 0.0018
At the 5%
significance level, we see that 0.0018 < 0.05. Therefore, we reject the null hypothesis and
conclude that there is a significant difference between the mean weight for
plants with no inbreeding and the mean weight for plants with slight
inbreeding.
Part (c) gave
us an alternative way of determining whether or not there is a significant
difference between the 2 treatment means.
Our conclusion was the same – there is a significant difference between
the mean weights for plants with no inbreeding and plants with slight
inbreeding.
e.
How would you
define a contrast to test the hypothesis that the mean for plants with moderate
inbreeding is the same as the mean for plants with strong inbreeding? Report the coefficients for each treatment
level and state the null and alternative hypotheses for your test.
Coefficients:
0 0 1 -1
Ho: Mmoderate
inbreeding - Mstrong inbreeding
= 0
Ha: Mmoderate
inbreeding - Mstrong inbreeding
= 0
Or,
equivalently:
Ho: Mmoderate
inbreeding = Mstrong inbreeding
Ha: Mmoderate
inbreeding not =
Mstrong inbreeding