Site hosted by Build your free website today!
Chapter 1

Statistics: a science that attempts to make sense of data and to provide information so you can make informed decisions

Two types descriptive and inferential

Descriptive statistical methods are used to describe your data through graphical methods and numerical summaries.

Inferential statistical methods use your data to make generalizations about a large group based on a subset of that group and to assess the reliability of your inferences.

Data: the pieces of information that you gather; usually organized into rows and columns. Rows represent the individuals or experimental units being examined. Columns represent the variables or characteristics that are recorded about each unit.

Data can be split into two major types: quantitative and qualitative (categorical).

Quantitative data are numeric and can be:

Qualitative data are non-numeric and specify which of a finite number of discrete categories a unit belongs to.

When you receive a data set, you should ask yourself the following questions:

Population: the larger universe in which youre interested; all possible individuals to which you wish your conclusions to apply. You may or may not be able to enumerate the entire population.

Its often impossible or at least impractical to collect data from the entire population, so its common practice to select a sample.

Sample: a representative subset selected from the population.

In order to insure that the sample is representative, probability theory is used to help select a random sample that will be representative of the entire population. By taking a random sample, good results can be obtained from a small subset of the population and the amount of error in the results can be quantified.

Reasons for Sampling

Engineering Applications Chapter 2

Goal: to understand the data describe it, summarize it, answer questions about it.

Begin by examining each variable by itself
Examine the data graphically
Produce numerical summaries
Look at relationships between the variables

Frequency Distribution: lists the values that occur in the data set and tells you how often the value occurs in the data set

Qualitative Variables

Use bar charts or pie charts to illustrate the frequency distribution graphically
The heights of the bars in the bar chart represent the frequency of occurrence of the category.
The widths of the pie slices in the pie chart represent the frequency of occurrence of the category.

Quantitative Data

Use histograms to illustrate the frequency distribution graphically


Shapes: For smaller data sets, stem and leaf plots are useful for describing distributions and give more information than a histogram. To construct a stem and leaf plot:

Sort the data from smallest to largest
Select the leading digits (stems) from the data
List those digits in numeric order in a column, listing each unique stem only once
Draw a vertical line
Write each of the final digits (leaves) for each number to the right of the appropriate stem.


Drying times for different formulations of paint

2.5  3.0  3.3  4.0  6.0  2.8  4.2  4.4  5.0  5.0  3.6  5.6  4.8  4.9  6.1  3.5  4.5  5.2  4.5  6.5

Stem and Leaf Plot for drying time data

2 | 5 8
3 | 0 3 5 6
4 | 0 2 4 5 5 8 9
5 | 0 0 2 6
6 | 0 1 5

Once you have examined the data graphically, you may want to calculate some numerical summary measures to describe the center and spread of the data.

Sample mean: simple average of all of the observations


Mark McGwires home run record

1987 49     1993 9
1988 32     1994 9
1989 33     1995 39
1990 39     1996 52
1991 22     1997 58
1992 42     1998 70

The mean is 37.8.

** The mean is very sensitive to outliers **

So, we need another measure of center that is not so sensitive to extreme values.

Median: the midpoint of the distribution. 50% of the values are less than the median, and 50% of the values are greater than the median.

To calculate the median:

Sort the observations from smallest to largest
If you have an odd number of observations, then the median is the middle observation
If you have an even number of observations, the median is the average of the 2 middle observations.

Example using the McGwire homerun data:

9  9  22  32  33  39 39  42  49  52  58  70

The median is the average of the 2 center numbers: 39