Confidence Intervals

Site hosted by Angelfire.com: Build your free website today!

Estimating with Confidence

A confidence interval is just an estimate +/- a margin of error.

For example, the margins of error listed along with public opinion polls are based on confidence intervals.

A level C confidence interval has 2 parts:

an interval calculated from the data in the form estimate +/- margin of error
A confidence level C which gives the probability that the given interval has captured the true population parameter.

That is, a level C confidence interval is an interval computed from the sample such that the interval will contain the true population parameter for (100*C)% of the samples.

So, for example, a 95% confidence interval can be interpreted as:

"We are 95% confident that the true population parameter is contained in the interval."

"In 95% of the samples, this confidence interval will contain the true value of the population parameter."

NOTE:

The population parameter is a fixed, constant value. It is NOT a random variable. Therefore, it is NOT correct to interpret a confidence interval as being the probability that the population parameter is contained in the interval.
It is also NOT correct to interpret the confidence interval as being a statement about the data. It is based on the distribution of the sample mean, not on the distribution of the data. Therefore, you CANNOT make statements like "95% of the data are contained in the confidence interval."

Constructing a Confidence Interval for the Mean

As mentioned earlier, the confidence interval is an estimate of the population parameter +/- a margin of error. Our estimate of the population mean is the sample mean.

The margin of error will be based on sampling error and thus will be derived from the sampling distribution of the sample mean.

Confidence interval for the mean:

Suppose that we have a SRS from a population with mean m and standard deviation s . If the sampling distribution of the sample mean is normal, then a 100*C% confidence interval for m is given by

x-bar +/-( z^*)(s/sqrt(n))

where z^* is the number such that (1-C)/2 of the area from a standard normal distribution is greater than z^*.

Example:

A pharmaceutical company is examining specimens from a batch of product. They’re trying to verify the average concentration of the active ingredient. Taking a random sample of 3 specimens yields an average concentration of 0.8404g. The standard deviation of the specimen readings is known to be 0.0068 g/l. Give a 99% confidence interval for the true mean concentration.

x-bar = 0.8404
s/sqrt(n) = 0.0068/sqrt(3)
z^* = 2.576 (value from p.306 for a confidence level of 99%)

So, the 99% confidence interval will be given by

0.8404 +/- (2.576)(0.0068/sqrt(3))

So, we can say that we are 99% confident that the true mean concentration is between 0.8303 and 0.8505.

Example:

A test for the level of potassium in a person’s blood gives measurements that vary slightly from day to day. The standard deviation for these measurements is 0.2. If 3 measurements were taken on different days and the mean of those 3 measurements was 3.2, what is a 90% confidence interval for the mean blood potassium level?

x-bar =3.2
s/sqrt(n) = 0.2/sqrt(3)
z^* = 1.645 (value from p.306 for a confidence level of 90%)

So, the 90% confidence interval will be given by

3.2 +/- (1.645)(0.2/sqrt(3))
3.2 +/- 0.19

So, we are 90% confident that the mean blood potassium level is between 3.01 and 3.39.

Properties of confidence intervals:

Fact: The higher the confidence level, the wider the confidence interval will be.

You want a high degree of confidence but also a small margin of error. So, how can you make the margin of error smaller?

Margin of error = m =z^*s/sqrt(n)

To decrease the margin of error, you can:

Decrease z^* - The smaller this value is, the lower the confidence level will be, so this may not be a good option.
Decrease s -You don’t really have control over this one, so this is not really an option.
Increase n - Increasing the sample size is the best way to decrease the margin of error while still holding the confidence level at the desired level.

To obtain a desired margin of error, m, your sample size should be

n = {(z^*s)/m}²

Note that it may be very costly or totally impractical to take a sample of the required size.

Also note that it’s the size of the SAMPLE, not the size of the POPULATION, that determines the margin of error. As long as the population is much larger than the sample, the population size has no influence on the sample size needed.

Example:

Returning to the previous potassium level example:

How large of a sample would be needed in order to estimate the mean blood potassium level to within +/-0.01 units with 90% confidence?

n = {(1.645)(0.2)/0.01}²= (32.9)²= 1082

NOTE: The sample size should be an integer value, so always round up to the nearest integer.

Assumptions needed for constructing confidence intervals:

The sample MUST be a simple random sample (or pretty close to it).
The formula for a CI assumes that the sampling distribution of the sample mean is at least approximately normally distributed. This can be verified by checking to see if (i) the population is normally distributed or (ii) the sample size is greater than 30 and applying the CLT.
You must know the population standard deviation.
Be aware that confidence intervals can be strongly affected by outliers in the data.

These conditions may not be fully met in practice. However, you should use them as guidelines for applying the CI formula.