Site hosted by Angelfire.com: Build your free website today!

How to Organize and Categorize Your Data

Data can be placed in two seperate categories, Simple, and discrete. This section will explain how to chategorize data into one of these two categories, and then how to organize your data after you have determined which type it corresponds to.

Simple Discrete Data

Simple discrete data is data whose values belonging to it are distinct and separate, like if they can be counted (1,2,3,...). Such ways to organize this data is to place the data into frequency tables, frequency polygons, and histograms. When this data is seperated into intervals it becomes known as grouped discrete data.

Frequencey Tables

To create a frequency table, the range of the data is the first element needed. This range is broken into groups that span the range equally. Then the data must be sorted into these categories. These categories make up the cells in the first column of the table. Each data value then must go into the group that corresponds to it. The number of data entries that fit within each category is placed in the second column of the table.

Here is an example of a frequency table for the data set {1,2,3,4,5,6,7,8,9}

1-2 2
3-4 2
5-6
2
7-8 2
9-10 1

Histograms

Simple data can also be organized using something called a histogram. Histograms are very similar to bar graphs. The chategories used to organize data in your frequency table are used as the scale on the x-axis, while the number of data entries in each category is used as the scale along the y-axis. The bars of the graph are then drawn to the appropriate height, as in a bar graph.

Continuous Data

Data is continuous when the data may take on any value within a finite or infinite interval. This data is able to be counted, ordered, and measured. This data can be organized into frequency density tables, and frequency density histograms.

Frequency Density Tables

A frequency density table is similar to a frequency table, except that it's intervals do not have to be equal. The entire process for making one of these tables is the same as a regular frequency table.

Frequency Density Histograms

A frequency density histogram is very similar to a regular histogram. However, unlike a regular histogram, the width of each interval is not equal. The area that each interval occupies on the x-axis is based on it's size. Larger intervals take up proportionally more space than smaller intervals. The frequency density is then shown using squares on the graph. For example, if the sclae on the y-axis is 1, and there are 3 data points in a given interval, that interval must have 3 squares in it. Because the width of the intervals varies these squares must not be one on top of the other. They may be placed next to each other horizontally, or stacked on top of each other vertically, but they must all be within the interval.

Cumulative Frequency

Frequency can be measured cumulatively. This is is a relatively simple thing to do. Beginning with the first interval, and continuing consecutively through the rest of the table, the number of data points in an interval are added to the sum of the data points in all of the intervals before it. This new number is the cumulative frequency of the interval. Cumulative frequency can be used to make cumulative frequency tables, cumulative frequency histograms, and more complex cumulative frequency curves. Cumulative frequency curves are graphed on an x/y-coordinate axis. The frequency is graphed on the y-axis, and the intervals are on the x-axis. This creates a curve that allows you to see trends in the data.

Measurments of Central Tendency

Simple Discrete Data has several central measurements. These measurements include the mean(average), median(value halfway through ordered data set),quartiles 1 and 2(the values occuring 1/4 of the way and 3/4 of the way through the set respectively) and mode(most frequently occuring value in the ordered data set). These measurements are necessary to use almost all statistics equations and formulas.

In order to calculate the mean of a data set find the sum of the data points and divide by the number of data points in the set. Example: The mean of the set {1,2,3,4,5,6,7,8,9} is 5. This is calculated by adding 1+2+3+4+5+6+7+8+9=45, and then dividing 45/9=5.

In order to calculate the median of the set you find the middle term of the set. Example: The median of the set {1,2,3,4,5,6,7,8,9} is 5, because it is the middle term.

In order to calculate the quartiles the data must first be arranged ascending order by magnitude. Then you use the following formula to find the quartiles: the jth quartile, where j=1,2, or 3; Qj equals the ((j(n+1))/4)th value. If this value does not equal a whole number, then you must interpolate between the two values. Example: The find 1st quartile of the set {1,2,3,4,5,6,7,8,9} perform the operation Q1=((1(9+1))/4)=2.5. This means that Q1 equals the 2.5th term of the set. Because this is between two terms it is necessary to interpolate between the two terms. Because it is the 2.5th value that means that Q1 is halfway between the 2nd and 3rd values, 2 and 3. To calculate this value divide the difference between the two terms by 2, and add that number to 2, the second value. This gives you 2.5 as Q1.

The central measures of Grouped Discrete and Continuous Data include the approximate mean(average group), modal group(most often occuring group), 50th percentile(the data point that half of the data is greater than, and half of the data is lower than).

In order to calculate the approximate mean of a grouped data set add the all of data points, and divide that by the total number of groups. This will give you the number of a term. The group that contains this term is the group that is the approximate mean. Example: To find the approximate mean of this set of data:

0-5 2
5-10 3
10-15 2

Find the sum of the number of terms, which equals 7. 7/3= 2.33. This term falls in the second group, so the approximate mean is the group 5-10.

Finding the Modal Group is simple. The modal group is the group with the most terms in it. In the previous example the modal group is the group 5-10, because it has 3 terms in it.

To find the 50th percentile all you have to do is calculate the second quartile, the operation for which was described above.

Measures of Dispersion

There are several values to measure the dispersion of data. These values include the range, the interquartile range, and the standard deviation. These include the Range, the Interquartile Range, and the standard deviation.

The range is found by calculating the difference between the greatest number of a data set and the least number of a data set. Example: The range of the set {1,2,3,4,5,6,7,8,9} is 8; found by 9-1=8.

The interquartile range(IQR) is found by calculating the difference between quartile 1 and 3. Example: The IQR of the set {1,2,3,4,5,6,7,8,9} is 5, because Q3=7.5 and Q1=2.5, so the difference is 5.

Standard deviation can be found using a complicated formula, but is easiest done using a graphing calculator.

Back to Probability and Statistics

Andrew Nelson Created this page as well