Solutions to Selected Final Exam Review Problems

Chapters 1 & 2:

The following data relating to education were collected for each state.  A subset of the data is listed below.  The REGION is the US Census region, POP is the population of the state in thousands of people, SATV is the average verbal SAT score for the state, and TPAY is the average teacher pay for the state in thousands of dollars.

OBS    STATE    REGION     POP     SATV    TPAY

1     AK       PAC        607     521    49.6
2     AL       ESC       4273     565    31.3
3     AR       WSC       2510     566    29.3
4     AZ       MTN       4428     525    32.5
5     CA       PAC      31878     495    43.1
6     CO       MTN       3823     536    35.4
7     CT       NE        3274     507    50.3
8     DC       SA         543     489    43.7
9     DE       SA         725     508    40.5
10     FL       SA       14400     498    33.3
.     .         .          .        .      .
.     .         .          .        .      .
.     .         .          .        .      .

Is REGION a quantitative or qualitative variable?
qualitative

What type of variable is TPAY?
Quantitative

What is the shape of this distribution?
Skewed to the right

Do there appear to be any outliers in the distribution?
Yes – the point around 30,000

What measure of center should be used for this distribution?
median

What measure of spread would be appropriate for this distribution?
quartiles

In the following side-by-side box plots illustrating the distribution of verbal SAT scores, the census regions have further been collapsed into North, South, Midwest, and West.

Based on the box plot, which region (N, S, MW, or W) tends to have the highest SAT scores?
midwest

Which region has the most consistent verbal scores on the SAT?
north

For the southern region, we can see that the middle 50% of SAT verbal scores is between _495__ and __565___.

Chapter 4:

Continuing with the education data presented above, we want to examine the relationship between verbal scores on the SAT and teacher pay in each state.  The following scatterplot illustrates the relationship between the two variables:

The correlation between teacher pay and verbal SAT score is –0.47.  Based on this information and the scatterplot, what can you say about the relationship between teacher pay and verbal SAT score (shape, strength, direction)?

Shape:  slightly linear
Strength:  fairly weak
Direction:  decreasing trend

From the scatterplot, does it appear that there are any outliers or unusual observations?
Yes – there are some outliers between TPAY= 40 –50 with SATV around 560 and also some outliers between TPAY=30 – 35 and SATV between 480 and 500

What does the intercept tell us for this regression model?
It tells us that even if teachers made no money, students would still score around 624 on the SAT.

How would we interpret the slope for this model?
If teacher pay increases by 1 unit, then SAT scores will decrease by 2.56 units.

What percentage of the variability in verbal SAT scores can be explained by differences in teacher pay?
22.35%

Can we say that increasing teacher pay causes verbal SAT scores to decrease?
No – association does not imply causation.

Chapter 6:

1. Choose an employed person at random.  Government data tell us that the probability that the worker is a woman is 0.46.  The probability that any given woman will hold a managerial or professional job is 0.32.  What is the probability that a randomly selected employed person will be a woman who holds a managerial or professional job?

General Multiplication Law:

Pr(A and B) = Pr(A)*Pr(B|A) = Pr(B)*Pr(A|B)

So, Pr(woman and professional) = Pr(woman)*Pr(professional|woman)
= (0.46)(0.32) = 0.1472

3. Consolidated Builders has bid on two large construction projects.  The company president believes that the probability of winning the first contract (A) is 0.6, that the probability of winning the second contract (B) is 0.4, and that the probability of winning both jobs (A and B) is 0.2.

a. Are the events A and B independent?
No, because Pr(A and B) is not equal to Pr(A)*Pr(B).

b. What is the probability that Consolidated Builders will win either contract A OR contract B?
Pr(A or B) = Pr(A) + Pr(B) – Pr(A and B)
= 0.6 + 0.4 – 0.2 = 0.8

Chapter 9:

1. Why is sampling often preferable to conducting a census for the purpose of obtaining information about a population?
It is often not practical or possible to conduct a census.  Taking a sample is quicker and more cost efficient than conducting a census.  And, good information can be obtained about the population from a properly selected random sample.

2. Why do we generally expect some error when estimating a parameter (such as a population mean) by a statistic (such as a sample mean)?
Sampling variability – the value of the statistic will vary from sample to sample.

3. Explain why increasing the sample size results in a tendency for smaller sampling error when using a sample mean to estimate a population mean.
The variance of the sampling distribution of the mean is inversely proportional to the sample size.

4. Define sampling error (also called sampling variability).
Sampling error refers to the fact that the value of a statistic will vary from sample to sample.

Chapter 10:

1. What exactly is a confidence interval and why is it better to report a confidence interval instead of a single number for estimating the population mean, ??
A confidence interval is an estimate of the population parameter plus a margin of error.  It is better to report a confidence interval rather than a point estimate because the confidence interval allows you to account for sampling variability.

2. Suppose that we have obtained data by taking a simple random sample from a population.  What should we do (e.g., what assumptions should we verify) before we construct a confidence interval from the sample?
Verify that the sample mean has a normal distribution.

3. Suppose that we have obtained data by taking a simple random sample from a population and we intend to find a confidence interval for the population mean.  We will either use a 95% confidence interval or a 99% confidence interval.  Which confidence level will give us a narrower interval for estimating the mean
The 95% CI will give a narrower interval for estimating the mean.

4. The Gallup Organization conducts annual national surveys on home gardening.  Results are published by the national Association for Gardening.  A random sample is taken of 25 households with vegetable gardens.  The size of vegetable gardens is normally distributed.  The mean size of the vegetable gardens from the sample was 643 sq ft.
a. Find a 90% confidence interval for the mean size of all household vegetable gardens in the United States.  Assume that sigma=247 sq ft.

xbar +/- z * sigma/sqrt(n)
643 +/- (1.645)(247)/sqrt(25)
b. Explain in words what the confidence interval from part (a) means.
We are 90% confident that the true mean size of all household vegetable gardens in the US is contained in this interval.

5. A quality-control engineer in a bakery goods plant needs to estimate the mean weight of bags of potato chips that are packed by a machine.  He knows from experience that sigma=0.1 oz for this machine.  Weights of bags are normally distributed.  A random sample of 12 bags has a mean weight of 16.01 oz.
a. Find a 99% confidence interval for the mean weight bags of potato chips.

xbar +/- z * sigma/sqrt(n)
16.01 +/- (2.576)(0.1)/sqrt(12)

2. Radio Advertising Bureau of New York reports in Radio Facts that in 1994 the mean number of radios per U.S. household was 5.6.  A random sample of 45 U.S. households taken this year showed that the average number of radios owned is xbar   = 5.9.  Do the data provide sufficient evidence to conclude that this year’s mean number of radios per U.S. household has changed from the 1994 mean of 5.6?  Assume that the standard deviation of this year’s number of radios per U.S. household is 1.9.  Use the following steps to answer the question.

a. State the null and alternative hypotheses.

Ho:  mu = 5.6
Ha : mu not equal to 5.6

b. Discuss the logic of conducting the hypothesis test (e.g., how will you determine whether you have enough evidence to reject the null hypothesis).
Calculate the test statistic and the p-value.  Compare the p-value to the significance level.  If the p-value is less than the significance level, then reject the null hypothesis.  Otherwise, fail to reject the null hypothesis.

c. Identify the distribution of the variable  ; that is, the sampling distribution of the mean for samples of size 45.
The sampling distribution of the mean will be normally distributed by the Central Limit Theorem.

d. Obtain a precise criterion for deciding whether to reject the null hypothesis in favor of the alternative hypothesis (e.g., pick out a significance level, ?, that you will use for conducting the test.  This can be any value that you would like to use, but most of the time, a 5% significance level is used).
We’ll use 5% for this example.

Test Statistic:  z = (5.9 - 5.6) / (1.9/sqrt(45)) = 0.3/0.28 = 1.07

p-value = 2*Pr(Z>=|z|) = 2 * Pr(Z >= 1.07) = 2 * (1 – Pr(Z<=1.07))
= 2 * (1 – 0.8577) = 0.2846

f. Apply the criterion in part (d) to the problem and state your conclusion.
Since 0.2846 > 0.05, fail to reject the null hypothesis and conclude that the mean number of radios has not changed from the 1994 value.

1. In 1990, the average passenger vehicle was driven 10.3 thousand miles.  A random sample of 500 passenger vehicles had a mean of 10.1 thousand miles for last year.  Assume that the standard deviation is 6.0 thousand miles.  We want to know if the average distance driven last year is different from the average distance driven in 1990.  If mu is last year’s mean distance driven:
a. Find a 95% confidence interval for mu.

xbar +/- z * sigma/sqrt(n)
10.1 +/- (1.96)(6.0)/sqrt(500)
10.1 +/- (1.96)(0.27)
10.1 +/- 0.5292
(9.57, 10.63)

b. Does the value of 10.3 thousand miles fall within your confidence interval from (b)?     yes

c. Use the information from (a) and (b) to determine whether the average distance driven last year is different from the average distance driven in 1990.
Since 10.3 is contained in the confidence interval, we conclude that the average driven last year is not significantly different from the average driven in 1990.

1. Each year, manufacturers perform mileage tests on new car models and submit the results to the EPA.  The EPA then tests the vehicles to determine whether the manufacturers’ claims are correct.  In 1998, one company reported that a particular model equipped with a four-speed manual transmission averaged 29mpg on the highway.  Gas mileage is normally distributed.  Suppose the EPA tested 15 of the cards.  What decision would you make regarding the gas mileage of the car?  Perform the required hypothesis test at the 5% significance level.  (NOTE:  For this sample, xbar  = 28.753 and s = 1.595).
a. State the hypotheses for the test.

Ho:  mu = 29
Ha : mu not equal to 29

b. Calculate the test statistic.  What is the distribution of the test statistic?

t = (28.753 – 29) / (1.595 / sqrt(15) )  = -0.5995  (It has a t-distribution)

c. If the p-value for this hypothesis test is 0.5400, what conclusion would you draw?
Since the p-value is greater than 0.05, you would fail to reject the null hypothesis and conclude that the mileage for the car is acceptable.

1. A manufacturer of panel displays claims that a new manufacturing process has a higher success rate than the current process.  The success rate for the current process is 30%.  A sample of 80 panels created with the new process yielded 32 successes.  At the 1% significant level, does it appear that the new process has a higher success rate than the current process?

State the hypotheses:

Ho:  p = 0.3
Ha : p > 0.3

Calculate the sample proportion:

phat = 32/80 = 0.4

Calculate the test statistic:

z = (0.4 – 0.3) / sqrt((0.3*0.7)/80) = 2

Calculate the p-value:

p-value = Pr(Z>=2) = 1 – Pr(Z<=2) = 1 – 0.9772 = 0.0228

Since 0.0228 > 0.01, fail to reject the null hypothesis and conclude that the new
process does not have a higher success rate than the current process.