Site hosted by Angelfire.com: Build your free website today!

Chapter 2 Notes

Remember the list of statistics test grades you looked at in Chapter 1? Here they are again:

92

80

64

96

96

84

76

72

76

84

92

88

72

80

44

64

80

76

56

72

72

88

60

76

It still isn’t easy to figure out how the class is doing. Now it’s time to try organizing our data to make it more meaningful. A first step is to create an ordered array of scores; this simply means arranging the scores in order from highest to lowest. Let’s try it:

96

84

76

72

96

84

76

64

92

80

76

64

92

80

72

60

88

80

72

56

88

76

72

44

That helps, doesn’t it? Now we can scan the list of scores and get at least a general idea how the class did. It’s even relatively easy to see that there were quite a few high grades in comparison to the low ones, although there were some pretty low ones too. But there is much more we can do to help ourselves out.

Now is a good time to answer Comprehension Check 2.1 on page 24; answer is on page 53.

TABULAR DATA DISTRIBUTIONS

Ungrouped Frequency Distributions

You will note in the ordered array above that each individual test’s grade appears; if there were two students who got a particular score (for example, 96), then the score appears twice, once for each student. We can condense our data a little bit by using a frequency distribution.

In a frequency distribution, we use two columns. The first one, labeled X (X is one piece of raw data, in this case a test score), lists the scores that were achieved. Note that each score appears only once. The second column, labeled f (for frequency), tells how many times each score was achieved. So where 96 appears in the score column, a 2 is next to it in the frequency column, indicating that there were two tests with a score of 96. Here’s our frequency distribution for the statistics test scores:

X

f

           

96

2

           

92

2

           

88

2

           

84

2

           

80

3

           

76

4

           

72

4

           

64

2

           

60

1

           

56

1

           

44

1

           
 

24

         

 

When you make an ungrouped frequency distribution from a set of raw data, there’s a way to check yourself to make sure you haven’t missed any scores. Count the number of scores in the original set of data. This is N, the number of cases (or number of tests scored). Now total the frequency column. This total should equal N. In our frequency distribution N=24 AND our f column also adds up to 24. We must be doing something right!

Setting up a frequency distribution puts your data into a more compact format, which is easier to handle. It also enables you to perform some simple operations which will give you more information about the data.

Try Comprehension Check 2.2 on page 26; answer is on page 53.

Ungrouped Percentage Distribution

The first of these operations is making an ungrouped percentage distribution. This is a column with the heading, %, which lists the percentage of the total number of cases (N) at each score. You may already know how to find what per cent of N is represented by each frequency; if not, use the handy formula below:

So, to find the per cent for a score of 96, which has a frequency (f) of 2:

You complete this operation for every frequency in the frequency distribution and fill out the per cent column with your results:

           

X

f

%

     

96

2

8.3

     

92

2

8.3

     

88

2

8.3

     

84

2

8.3

     

80

3

12.5

     

76

4

16.7

     

72

4

16.7

     

64

2

8.3

     

60

1

4.2

     

56

1

4.2

     

44

1

4.2

     
 

24

100.0

     
           

You might have noticed that this column doesn't really give us any new information. We already knew the frequency of each score, and since per cent is based directly on frequency, this might seem like a lot of trouble to go to when you don't really find out anything you didn't already know. So why go to the trouble?

Here's an example that might explain things. Suppose you wanted to compare this class's test grades with another class's grades. If the other class had 42 students, and there were three students who had 96% and 4 who had 92%, how do we know which class had more of these high grades relative to the size of the class? Here, comparing the frequencies isn't much help, since you'd expect the class of 42 to have more scores at any one grade—the question is, how many more.

Now a percentage distribution is handy. If you calculate percentages for each score in the class of 42, you can compare the per cent who got 96 or 92 or whatever grade with the per cent who got that grade in the other class. This would give you a much more usable way to make comparisons. (By the way, if you need the practice, convert these frequencies (3 for a score of 96 and 4 for a score of 92) to per cents. Answers are at the end of the next section.

Now take a shot at Comprehension Checks 2.3 and 2.4 on pages 26 and 27; answers are on page 53.

Ungrouped Cumulative Frequency Distribution

There is more we can do with these data. The ungrouped cumulative frequency distribution gives the number of cases scoring at and below a particular score. For example, since 96 is the highest score anyone got on the statistics test, then everyone (24 cases) scored at or below 96. This means that the entry in the cumulative frequency (fc) column for 96 will be 24. Then when we look at the next highest score, which is 92, everyone except the 2 who scored 96 scored at or below it. So the entry for 92 in the cumulative frequency column will be 24-2 or 22.

The trick to filling out this column in your ungrouped distribution is to look at the cumulative frequency for a score, subtract from it the frequency for that same score, then enter the result below it under cumulative frequency for the next lower score. And here's a handy quick way to check your work after filling out this column. Look at the cumulative frequency for the bottom score; it should match the frequency for this score. Makes sense, doesn't it? The number of cases who scored at or below this score (since it's the lowest one) is the number of cases who scored AT this score. So, for the bottom entry in the distribution, fc should equal f.

Here's our ungrouped frequency distribution with the cumulative frequencies finished:

X

f

%

fc

   

96

2

8.3

24

   

92

2

8.3

22

   

88

2

8.3

20

   

84

2

8.3

18

   

80

3

12.5

16

   

76

4

16.7

13

   

72

4

16.7

9

   

64

2

8.3

5

   

60

1

4.2

3

   

56

1

4.2

2

   

44

1

4.2

1

   
 

24

100.0

     
           

It is easy to get messed up in computing cumulative frequency. This usually results from getting off by a line when subtracting (even easier to do when you have a string of frequencies all the same as we do here). It will help you if you place a sheet of paper under each line once you've filled in the cumulative frequency. Then subtract the frequency you can see (above the paper) from the cumulative frequency you can see (also above the paper—they're on the same line). Move the paper down one line to enter your answer and subtract on that line before moving the paper down again. Keeps you in order and prevents mistakes.

And don't forget to check when you're done that the last entry in the frequency column matches the last entry in the cumulative frequency column. If these match, you're probably just fine.

Answer Comprehension Check 2.5 on page 27; answer is on page 53.

(Answers to percentage problems posed for the class of 42 students: The 3 who received a score of 96 compose 7.1% of the class; and the 4 who received a score of 92 compose 9.5% of the class.)

Ungrouped Cumulative Percentage Distribution

And that's not all! You can also convert these cumulative frequencies to per cents to construct an ungrouped cumulative percentage distribution. The process is fairly straightforward if you've got the conversion to per cent thing down.

The formula used earlier for converting frequencies to per cents can be modified for use here:

Here's our distribution with the cumulative percentage (%c) column filled out:

X

f

%

fc

%c

 

96

2

8.3

24

100

 

92

2

8.3

22

91.6

 

88

2

8.3

20

83.3

 

84

2

8.3

18

75.0

 

80

3

12.5

16

66.7

 

76

4

16.7

13

54.2

 

72

4

16.7

9

37.5

 

64

2

8.3

5

20.8

 

60

1

4.2

3

12.5

 

56

1

4.2

2

8.3

 

44

1

4.2

1

4.2

 
 

24

100.0

     
           

Note that a good way to check your work here is to be sure your last entry in the cumulative percentage column matches your last entry in the per cent column. If they match, you've probably done things right.

You might wonder why you can't do cumulative per cent the same way as you did cumulative frequency, that is, start with 100, then subtract the per cent entry on each line. This works great and is a good way to check your work ONLY IF the per cents come out to whole numbers without the need to round a decimal. If you get fractions of a per cent (as you do in this distribution), subtracting introduces error because the per cents you're subtracting are not themselves perfectly accurate.

Do Comprehension Check 2.6 on page 28; answer is on page 53.

Percentile Rank. Another name for cumulative per cent is percentile rank. A percentile rank is the per cent of the group who scored at or below a particular score; and that's a good definition for cumulative per cent as well. This comes up because it is common for scores on standardized tests to be reported as percentile ranks. If you took the Iowa Test of Educational Development or Stanford Test in school, the scores you received were percentile ranks.

Unfortunately most parents, when little Johnny brings home the piece of paper with his test scores, don't understand what they're seeing. If Johnny received a 60 in reading, they're probably thinking Johnny had only 60% of the questions right, barely passed, and should study much harder than he does. Now it's hard to say whether Johnny needs to study more, but that 60 means his score was as good as or better than 60% of kids in his grade all over the country. That's not bad at all! So the folks don't have to worry about Johnny as much as they thought. And they probably shouldn't feel too bad about misunderstanding percentile ranks either, because many of Johnny's teachers don't really understand them either.

When you're looking at a frequency distribution you can easily identify the percentile rank of a particular score simply by looking through the cumulative frequency distribution for the information.

Let's look back at our frequency distribution:

X

f

%

fc

%c

 

96

2

8.3

24

100

 

92

2

8.3

22

91.6

 

88

2

8.3

20

83.3

 

84

2

8.3

18

75.0

 

80

3

12.5

16

66.7

 

76

4

16.7

13

54.2

 

72

4

16.7

9

37.5

 

64

2

8.3

5

20.8

 

60

1

4.2

3

12.5

 

56

1

4.2

2

8.3

 

44

1

4.2

1

4.2

 
 

24

100.0

     

If you want to figure out the percentile rank for one of the listed scores, you simply look it up in the cumulative frequency column. So the percentile rank for a score of 80 is 66.7; and the percentile rank for a score of 56 is 8.3.

What do you suppose we do when we need the percentile rank for a score that doesn't appear on our distribution? For example, what is the percentile rank for a score of 62? To figure this out, you need to imagine that all the scores from 60 to 64 were filled in on the distribution. Try that; your work should look like this:

X

f

%

fc

%c

 

64

2

8.3

5

20.8

 

63

0

0

3

12.5

 

62

0

0

3

12.5

 

61

0

0

3

12.5

 

60

1

4.2

3

12.5

 

Now it's easy to see that the percentile rank for a score not listed in the distribution is the percentile rank of the first score listed below it. That's because you need to take out those who scored at 64 in order to figure out the per cent scoring 63 or less.

Now try Comprehension Check 2.15 on page 35. The answer is on page 55.

Percentiles. Sometimes what you need to know is what score has a certain percentile rank. That score is a percentile. This is simply a matter of looking the percentile rank up on the cumulative percentage distribution and identifying the score at that percentile rank.

Now what often happens is we're interested in a percentile rank that isn't listed in our cumulative percentage distribution. For example, let's say we want to know what score is at the 85th percentile (meaning it has a percentile rank of 85) in our distribution. 85 is not listed in the cumulative percentage frequency; we have 83.3 and 91.6. When this happens, we use the score at the next higher percentile rank. To find the 85th percentile in our distribution, we look at the cumulative per cent of 91.6 and use the score associated with it, which is 92. This means that 92 is the 85th percentile. (You can show yourself why this is so by listing all the missing scores again and filling in the cumulative percentage column for each one. It will quickly become evident why we look to the next higher score to find percentile rank.)

Now you're ready to answer Comprehension Check 2.17 on page 37. Answer is on page 55.

Grouped Data Distributions

Sometimes, even with a frequency distribution, you still have a whole lot of individual pieces of information—too many to make sense of. The distribution we've been working on borders on too much, but now picture an ungrouped freqency distribution for data which contains 97 possible scores. You'd fill up a pretty large sheet of paper and a bunch of time making an entry for each of these scores, then filling out all the columns for each one. Worse yet, with that many details, you just can't wrap your brain around all that information. One way to reduce the details to manageable proportions is to group your data.

Class Intervals. This is done by organizing scores into categories and computing all those numbers (frequency, per cent, cumulative frequency, cumulative per cent) for each category instead of for each individual score. Each category or group is called a class interval. Now there are some basic rules for constructing class intervals.

Interval Width. First, all class intervals must be the same size, called interval width (i). Interval width is simply the number of possible scores included in the class interval. So if you build a class interval for scores from 21 to 26, i=6—the number of scores included.

Please note, you cannot find the interval width by subtracting the lowest score from the highest score, for example, 26-21=5, but the number of scores (count them) in the interval (21, 22, 23, 24, 25, 26) is 6. Instead, interval width is computed by subtracting lowest score from highest score, then adding 1 to the result. This accounts for the fact that the scores on both ends of the class interval are included in interval width. When you subtract, one of them gets excluded. We're just adding that excluded score back in.

Do Comprehension Check 2.11 on page 31; the answer is on page 54.

Setting up Class Intervals. Another rule is that class intervals are always listed from highest to lowest, just as scores in a frequency distribution are.

How do you decide how big your class intervals should be? If they're really narrow, then you won't have simplified the data much. This won't really help in reducing the amount of detail to a manageable level. On the other hand, if they're super-wide, then you'll lose so much detail that you can't tell much about your cases any more. Let's go back to the statistics test scores for a couple of examples.

Note that the range of possible scores on the statistics test is 0-100, so if you choose an interval width of 2, then your class intervals would be 99-100, 97-98, 95-96, etc., all the way down to 1-2. There would be 50 of them, most with a frequency of 0 (since no one scored 20 or 21 or 22 or a lot of other scores).

Even if you restricted yourself to scores people actually got, giving you a range from 44-96, your intervals would then run 95-96, 93-94, 91-92, etc., all the way down to 43-44. You would have 27 class intervals, still with a lot of frequencies of 0. This wouldn't help much, would it?

If you instead chose a really big interval width, say i=25, you'd have only 4 class intervals with frequencies of 15, 8, 1, and 0. Now it's pretty easy to see that knowing 15 students scored between 76 and 100, 8 scored between 51 and 75, 1 scored between 26 and 50, and none scored between 1 and 25 isn't very useful information. In fact it's not worth the time you took to gather the data.

So what we need to do is find a happy medium—an interval width big enough to reduce the detail in our data, but not so big that we don't have any information left. The usual guideline is that about 10 class intervals is a good number. So one way to do things is to divide up the scores into about 10 groups and go from there. Another helpful guideline is that you should try not to have too many intervals with frequencies of 0.

Now on things like test scores, there are generally few scores below a certain grade, maybe 50 or so. So if you list all the scores that it is possible to get, you'll probably get quite a few zeroes. But if you concentrate on the range between 50 and 100 where the scores actually fell, you probably won't see too many zeroes if you used right-sized intervals.

Traditionally grade distributions are divided up into interval widths of 10. This just happens to produce 10 class intervals; but we won't bother listing the ones below the lowest score; they all have frequencies of 0. Let's try that:

X (i=10)

f

%

fc

%c

       

90-99

4

16.7

24

100.0

       

80-89

7

29.2

20

83.3

       

70-79

8

33.3

13

54.2

       

60-69

3

12.5

5

20.8

       

50-59

1

4.2

2

8.3

       

40-49

1

4.2

1

4.2

       
 

24

100.0

           

A couple of things deserve mention here. On is that the per cent column may not always add up to exactly 100%; this is because many of the figures in the column were rounded, and the rounding produces small inaccuracies. As long as it adds up to a number close to 100, you're probably all right.

Another is that you'll notice the scores go only to 99. If we were to include 100, our class intervals would be 91-100, 81-90, 71-80, etc. Since this breaks grades in a way we're not used to using—generally As go down to 90 and Bs go down to 80, etc., using the intervals we did looks more familiar to us. As long as no one scored 100 on the test, this works fine. (You should also note that when grades are assigned on the 90-80-70-60 grading scale, class intervals are NOT all the same size. The As contain 11 scores (90-100), while the other grade categories down to D contain only 10. Then, of course, Fs contain lots more than 10 possible scores.)

One other thing to note about our grouped frequency distribution is that you have lost some information from the original test scores in constructing the grouped distribution. You can no longer say exactly how many students scored 92 or 76 or 60. This information has been lost by compiling the scores in each of these class intervals, and now all we can say for sure is how many scores are in the group (class interval) which includes 92 or 76 or 60. In statistics you pay a price for everything you do to make things easier; in this case the cost of simplifying and organizing data is lost information.

If we put the discussion of ideal interval width in the perspective of this "cost of doing statistical business" concept, the issues related to interval width become clearer. What we are looking for when we set up a grouped distribution is an interval width small enough that not too much information is lost (so our cost isn't too high), but big enough to simplify the raw data (so that what we're buying is worth the cost).

Now look at Comprehension Checks 2.7 and 2.8 on page 29. Also try 2.12 and 2.13 on page 32. Answers for all of these are on page 54.

Apparent and Real Limits. There are a couple of other things to examine about grouped distributions. First is the issue of limits to the class intervals. These limits are supposed to serve as guidelines for putting scores into intervals correctly. For example, if you have a class interval defined by 43-45, then you know what to do with a score of 43 (It goes in this interval.) or a score of 47 (It goes in the interval above this one.).

So the limits of the class interval look like 43 at the bottom and 45 at the top. Because this is what the limits look like, we call 43 the lower apparent limit and 45 the upper apparent limit. Apparent means "what it looks like."

Now test scores generally come in whole numbers, but test scores aren't the only thing a statistician might be interested in. Suppose what we're measuring is average age of customers at different stores. When you average things, sometimes you don't get whole numbers. So what would you do with an average age of 45.3? Our apparent limits don't help us at all with this because this class interval ends at 45 and the next one doesn't start until 46.

Now most of you would say, "I would round down from 45.3 to 45 and put this into the 43-45 class interval." That's a good choice, but it means the real limits of the interval are not the numbers you see in the class interval definition, 43 and 45. The real limits go up a half-unit from 45 (That's why 45.3 is included in this interval.) and down a half-unit from 43 to 42.5.

So class intervals, in addition to the apparent limits we already defined, have real limits, which give us more information about how to place data into them. The upper real limit for our 43-45 class interval is a half-unit above 45, or 45.5. The lower real limit for this interval is a half-unit below 43, or 42.5.

Real limits give us specific instructions about what to do with those in-between numbers like 45.3. Knowing the upper real limit to our class interval is 45.5 makes it easy to decide what to do with 45.3.

One note about real limits. The upper real limit of a class interval is always the same as the lower real limit of the next higher class interval. Here's an example:

46-48: lower real limit = 45.5

43-45: upper real limit = 45.5

So you might have wondered what we do with a score of 45.5. The general rule here is the same one you learned in elementary school when you learned to round: when you're exactly at the half-way point, round up. So 45.5 goes into the higher class interval; everything even slightly below it (say, 45.499) goes into the lower class interval.

Try Comprehension Check 2.9 on page 30; the answer is on page 54.

Midpoints. One of the drawbacks of grouped frequency distributions is that it's hard to know what to do when you need a single score that represents the class interval. An interval of 90-99 contains 10 scores, not just one. When you need a single representative score, you use the midpoint of the class interval. The midpoint is the score exactly in the middle of the interval. It's easy to find with the following formula:

UL is the upper limit; LL is the lower limit. You'll find that it doesn't matter whether you use apparent or real limits to find the midpoint; both will give you the same answer as long as you use the same kind for both limits. In other words, you may use the apparent upper and apparent lower limits OR you may use the real upper and real lower limits in finding midpoint. You may NOT use the apparent lower and real upper limits; that would cause trouble. I prefer to use apparent limits for this, just because the math tends to be easier.

Now try Comprehension Check 2.10 on page 30; the answer is on page 54. You may want to try it using apparent limits, then repeat the process using real limits, just to prove to yourself that it doesn't matter which you use.

Percentile Ranks. It takes a little more work to find percentile ranks from a grouped distribution. That's because we no longer know exactly how many cases scored each score. This means that the best we can do is to estimate percentile ranks from the information we have.

Look once more at our grouped frequency distribution:

X (i=10)

f

%

fc

%c

       

90-99

4

16.7

24

100.0

       

80-89

7

29.2

20

83.3

       

70-79

8

33.3

13

54.2

       

60-69

3

12.5

5

20.8

       

50-59

1

4.2

2

8.3

       

40-49

1

4.2

1

4.2

       
 

24

100.0

           

What is the percentile rank of a score of 75? Well, we know the score is in the class interval 70-79 (called the critical interval) and we know the percentile rank of the top of the interval (54.2). What we don't know is how to figure out exactly where between 54.2 and 20.8 (cumulative percent of next lower interval) the percentile rank is.

Eyeballing things, I'd say the percentile rank of 75 is about halfway through the interval, but just where is hard to say. The method for estimating is based on an assumption, which may or may not be close to right; the assumption is that the cases in the critical interval are evenly distributed among all the possible scores. Since we have no way to know if this assumption is right, we know that the best we will be able to do is an estimate. The formula for estimating is as follows:

All the numbers for this formula come from the critical interval except %cb. This represents the cumulative percent for the interval BELOW the critical one. So you can look at the distribution to fill in the blanks in the formula. %c for the interval below the critical interval (60-69) is 20.8. X is the score we're trying to figure out percentile rank for; this number has to appear in the original question. In this case it is 75. LRL is the lower real limit for the critical interval; LRL=69.5. Interval width (i) = 10. % = 33.3. So to estimate percentile rank, substitute these numbers for the symbols in the formula:

A good way to get an idea whether your answer is even in the ballpark is to see whether your answer fits within the limits you figured out earlier. We decided that the percentile rank for a score of 75 had to be between 20.8 and 54.2. Since our answer fits within those boundaries, we seem to be on the right track.

Now you're ready for Comprehension Check 2.16 on page 35; answer is on page 55.

Percentiles. Percentiles are also more work from a grouped distribution. The same kinds of problem presents itself here because we don't know just how many cases scored at any particular score. The method for solving the problem is also similar; we find the critical interval which contains the percentile we want and estimate based on the assumption that cases are evenly distributed among all the possible scores. The formula used is simply a variation on the one we just learned:

To find the 85th percentile, we first must identify the critical interval. To do this we use the same sort of procedure used to find percentiles in an ungrouped distribution. We look in the cumulative per cent column for 85; if 85 doesn't appear, we use the next higher cumulative per cent. This method identifies the critical interval as 90-99. Now we can fill in the formula with numbers and solve. PR=85; %cb=83.3; i=10; %=16.7; LRL=89.5.

Time for Comprehension Check 2.18 on page 39; answer is on page 55.

Distributions for Noncontinuous Data

Noncontinuous data may be nominal or ordinal scale. Since there are some things you can do with these and others you can't, let's look at a quick run-down of the possibilities.

Nominal Scale Data. Let's look at an example. Suppose you recorded brand preference for cola drinkers; here are your data:

Favorite Brand

f

%

   

CocaCola

30

42.9

   

Pepsi Cola

32

45.7

   

RC Cola

1

1.4

   

Sam's Choice Cola

4

5.7

   

Other

3

4.3

   
 

70

100

   

So here we have a frequency distribution and a percentage distribution. To make a cumulative frequency and cumulative percentage distribution, we'd have to decide which brand is highest, second highest, etc. so that we could figure out the number of cases at or below each one. Can't do it. Because nominal scale categories aren't numerical, they don't have a numerical relationship; this means cumulative distributions would be silly. So we've done about all we can here.

Note too that grouped distributions don't make sense for nominal scale data. You wouldn't put two or more of the colas into a group because there's no logical basis for grouping them, except that it is common to combine categories with very low frequencies into a category called "Other" or "Miscellaneous."

The order in which you list nominal scale categories is not terribly important, but there are some common practices. Note that the categories here are in alphabetical order; another possibility is to place them in order from highest frequency to lowest, in which case Pepsi would be first, then Coke, etc. Whichever method you use, it is usual to put categories like Other, None of the Above, Non-Affiliated last.

Ordinal Scale Variables. Ordinal scale data comes in two varieties. Rank-order data is one. We do not make frequency distributions with rank-order data because it doesn't provide us with any useful information. For example, think about a very small graduating class with just 6 students:

Class Rank

f

%

   

1

1

16.7

   

2

1

16.7

   

3

1

16.7

   

4

1

16.7

   

5

1

16.7

   

6

1

16.7

   
 

6

100.2

   

Anything helpful here? Because with rank-order data, there is just one at each rank, a frequency distribution simply wastes your time and paper. Don't bother.

It does make sense to organize other kinds of ordinal scale data with frequency distributions. Suppose we have the numbers of students in each class at East High School:

Class

f

%

fc

%c

Senior

105

23.3

450

100

Junior

112

24.8

345

76.7

Sophomore

110

24.4

233

51.8

Freshman

123

27.3

123

27.3

 

450

99.8

   

Here, because there is a numerical relationship among our categories--that is, freshmen are at a lower level than sophomores and juniors are at a higher level than either one--we can make cumulative frequency and percentage distributions too. We can even group the categories if we wish. A common way to group this kind of data follows:

Class

f

%

fc

%c

Upperclassmen

217

48.1

450

100

Underclassmen

233

51.7

233

51.8

 

450

99.8

   

Try Comprehension Check 2.14 on page 34; answer is on page 55.

GRAPHING DATA

For many people it's easier to make sense of data if we can see a picture rather than just a table. For this reason, it is usual to use graphs to show the overall idea our data give us. Think about the daily newspaper; often a graph is used to show us trends in rainfall or growth in the gross national product or decreases in flu cases. They're especially helpful in showing overall trends, change over time, or comparisons among various groups,

Bar Graphs

Bar graphs are used to show frequency information for nominal scale data. For each nominal scale category the graph will show a bar whose height (or length) corresponds to the frequency of the category. Thus longer bars show higher frequencies, shorter ones show lower frequencies.

Histograms

Histograms look just like bar graphs except the bars for the categories are continuous, not separate as they are in a bar graph. They are used for ordinal and continuous scale data. This is meant to indicate that nominal scale categories are independent and non-continuous, whereas other categories are not independent, but numerically related.

Frequency Polygon

Frequency polygons use a line that moves from one category to the next to show frequency. Its general shape resembles that of the histogram for the same data.

Graphs: General Information

Note that I've not shown you examples of each type of graph; examples appear in your textbook on pages 40-42. The scores listed across the bottom of a graph should be listed in ascending order, in other words, the lowest scores should be to the left, highest ones to the right.

After examining the data presented in a graph, you can draw some very general conclusions based on the overall shape of the graph. Each of these general conclusions can be made more specific with particular mathematical procedures you'll be learning over the course of the semester, but the general conclusions help to put us on track in the first place.

Central Tendency. This enables us to describe the typical score in a distribution. Central tendency is indicated in a general way on a graph by where you note a frequency peak.

Variability. This enables us to describe the degree of spread seen in scores. Variability is indicated in a general way on a graph by the width of the graph.

Skew. A normal distribution is perfectly symmetrical with a peak in the center and a specific degree of taper to each side. A normal distribution is drawn on page 45 in your textbook. Skew is the degree of tilt to one side or the other seen in non-normal distributions. Some distributions are pretty nearly normal; others are far off.

Normal distributions have a skew of zero. Those whose peaks run to the right of a normal one, meaning there are more high scores and fewer low ones than in a normal distribution, are said to be negatively skewed. Those whose peaks run to the left of a normal one, meaning there are more low scores and fewer high ones than in a normal distribution, are said to be positively skewed.

Modality. A mode is a most frequent score. These are easy to spot on graphs because they show as peaks. A graph with only one peak is called unimodal; one with two or more peaks is called bimodal or multimodal. Finding more than one peak may alert us to some characteristic of the group under study which is important to our data gathering effort. It's always good to eyeball the data on a graph with an idea of finding any unusual or important tidbits of information like these.

Do Comprehension Checks 2.19, 2.20, and 2.21 on pages 39-41; answers are on page 56.

CONCLUSION

You've finished Chapter 2. I hope you've done the Comprehension Checks as you went along, either while reading the chapter or while working through this lesson. The next thing to do is to work your way through the Review Exercises found at the end of the chapter on pages 52 and 53. There's no substitute for practice. When you feel ready; request the Chapter 2 Worksheet. It will provide additional practice and will give you an opportunity to preview the sorts of questions you'll be asked to answer on the test.

When you've finished the worksheet and checked your answers, you may request the test.