Analysis of Variance Review Questions

Analysis of Variance Review Questions

1. What are the 2 big “R’s” of experimental design, and why are they important?

Randomization – insures that each treatment has an equal chance of being assigned to each experimental unit. Needed in order to guarantee unbiased estimates of treatment effects and experimental error.

Replication – each treatment level should be applied to multiple experimental units. Needed in order to construct an estimate of experimental error.

2. What is the between group sum of squares measuring? What is the within group sum of squares measuring?

The between group sum of squares is measuring the deviation of the treatment means from the overall mean. That is, it is measuring the “separation” between the treatment groups. The within group sum of squares is measuring the deviation of the individual observations from their respective treatment means. It is measuring the variability within the treatment groups. That is, it provides a measure of the homogeneity of the treatment groups.

3. How can we use the between group and within group sums of squares to determine whether there is a difference between the treatment means for the different levels of our treatment?

If the between group variability is large in comparison to the within group variability, that would indicate that there is a true difference between the means for the different levels of our treatment groups.

4. The data for this example come from Exercise 7.4.1 (p.150) in Steele, Torrie, and Dickey. I have coded the data in SAS as follows:

WEIGHT represents the average plant weight in grams of red clover

BREED represents the level of inbreeding with

1 = no inbreeding, 2 = slight inbreeding, 3 = moderate inbreeding,

4 = strong inbreeding

I ran the following SAS code to generate the analysis of variance shown below:

proc glm ;

class breed ;

model weight = breed ;

contrast 'contrast 1' breed 1 -1 0 0 ;

contrast 'contrast 2' breed 1 0 -1 0 ;

contrast 'contrast 3' breed 1 0 0 -1 ;

run ;

The GLM Procedure

Class Level Information

Class Levels Values

breed 4 1 2 3 4

Number of observations 39

The GLM Procedure

Dependent Variable: weight

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 3 56411.78419 18803.92806 23.06 <.0001

Error 35 28545.13889 815.57540

Corrected Total 38 84956.92308

R-Square Coeff Var Root MSE weight Mean

0.664005 13.58425 28.55828 210.2308

Source DF Type I SS Mean Square F Value Pr > F

breed 3 56411.78419 18803.92806 23.06 <.0001

Source DF Type III SS Mean Square F Value Pr > F

breed 3 56411.78419 18803.92806 23.06 <.0001

Contrast DF Contrast SS Mean Square F Value Pr > F

contrast 1 1 9322.84444 9322.84444 11.43 0.0018

contrast 2 1 25844.06349 25844.06349 31.69 <.0001

contrast 3 1 54531.14683 54531.14683 66.86 <.0001

Based on this code and output, answer the following questions:

a. Report the Between group sum of squares. Report the Within group sum of squares.

The Between group sum of squares is the model sum of squares: 56,411.78

The Within group sum of squares is the error sum of squares: 28,545.14

b. Conduct a hypothesis test to determine whether or not the level of inbreeding has any effect on the weight.

Ho: All treatment means are the same

Ha: At least one treatment mean is different

F =23.06, p-value<0.0001

At the 5% significance level, since the p-value is less than 0.05, we reject the null hypothesis and conclude that there are differences between the mean weights for the different levels of inbreeding.

c. The mean of treatment group 1 (no inbreeding) is 271.56, and the mean of treatment group 2 (slight inbreeding) is 220.67. Construct a 95% confidence interval for the difference between the mean weight for plants with no inbreeding and the mean weight for plants with slight inbreeding. Based on this confidence interval, does there appear to be a significant difference between the average weights for these 2 groups? Why, or why not?

(271.56 – 220.67) +/- (2.0315)*sqrt{[(815.5754)/9] + [(815.5754)/6]}

50.89 +/- (2.0315)*sqrt(90.62 + 135.93)

50.89 +/- (2.0315)*(15.05)

50.89 +/- 30.57

So, we are 95% confident that the true difference between the treatment means is between 20.32 and 81.46. And, since this interval does NOT contain 0, we can say that there is a significant difference between the average weights for plants with no inbreeding and plants with slight inbreeding.

d. What hypothesis is being tested by contrast 1? Based on the p-value given for the F-test, what is your conclusion? How does this compare to the conclusion from part (c)?

Ho: M_no_inbreeding - M_slight_inbreeding = 0

Ha: M_no_inbreeding - M_slight_inbreeding not = 0

p-value = 0.0018

At the 5% significance level, we see that 0.0018 < 0.05. Therefore, we reject the null hypothesis and conclude that there is a significant difference between the mean weight for plants with no inbreeding and the mean weight for plants with slight inbreeding.

Part (c) gave us an alternative way of determining whether or not there is a significant difference between the 2 treatment means. Our conclusion was the same – there is a significant difference between the mean weights for plants with no inbreeding and plants with slight inbreeding.

e. How would you define a contrast to test the hypothesis that the mean for plants with moderate inbreeding is the same as the mean for plants with strong inbreeding? Report the coefficients for each treatment level and state the null and alternative hypotheses for your test.

Coefficients: 0 0 1 -1

Ho: M_moderate_inbreeding - M_strong_inbreeding = 0

Ha: M_moderate_inbreeding - M_strong_inbreeding = 0

Or, equivalently:

Ho: M_moderate_inbreeding = M_strong_inbreeding

Ha: M_moderate_inbreeding not = M_strong_inbreeding