* ======================================================================= * File: Sampling_Distribution_Demo.SPS . * Date: 7-Sep-2007 . * Author: Bruce Weaver, bweaver@lakeheadu.ca . * ======================================================================= . * Draw a large number of samples of a fixed size from a population, * compute the mean & SD for each sample, and write the sample * statistics to a data file. Then generate descriptive stats * and a plot of the sample means. If the number of samples * is sufficiently large, this should give a pretty good * approximation of the sampling distribution of the mean. * ----------------------------------------------------------- . * The routine for drawing k random samples of size SAMPSIZE * from a population of POPSIZE is based on a routine written * by David Marso, and posted to usenet by David Nichols, former * Senior Support Statistician at SPSS. His post can be viewed * at the link given below. * http://groups.google.com/group/sci.stat.consult/msg/710ea4ab83ddf24a?dmode=source . * ----------------------------------------------------------- . * Use macros to define the input file and variable of interest . define datafile ()'C:\Program Files\SPSS\1991 U.S. General Social Survey.sav' !enddefine. define myvar ()age !enddefine. /* the variable of interest . * Use macros to create constants giving the number * of samples and the sample size. define nsamples () 10000 !enddefine. /* the number of samples . define sampsize () 16 !enddefine. /* the sample size . define temp ()'C:\temp\' !enddefine. /* Temporary folder - change if necessary . define bootdata () temp + 'bootdata.sav' !enddefine. * The population size (i.e., number of cases in the file) is * also needed, but it will be worked and defined with a * macro below. * Open the data file . GET FILE= datafile . select if not missing(myvar). exe. COMPUTE ID=$CASENUM . exe. SAVE OUTFILE = bootdata. * Plot the population data, and get the parameters. graph histogram myvar. means myvar / cells = count mean min max var stddev. * ------------------------------------------------------- . * NOTE: Bear in mind that SPSS always assumes one has a * sample, not a population; so the SD reported above is * not quite correct. It was computed with n-1 in the * denominator rather than N, so it will be slightly * too large. To obtain the correct population SD, square * the reported SD and then multiply by (N-1). This will * give you the sum of squares. Divide the SS by N to * get the population variance, then take the square * root to get the population SD. * ------------------------------------------------------- . * Here is a bit of trickery from Raynald's SPSS Tools website * to get number of cases in the file, and define it in a macro . RANK VARIABLES=ID (A) /N INTO N. EXECUTE. DO IF $casenum=1. - WRITE OUTFILE='C:\Temp\N_macro.sps' /'DEFINE popsize()'N '!ENDDEFINE.'. END IF. EXECUTE. INCLUDE FILE='C:\Temp\N_macro.sps' . * Now create the k random samples (with replacement) . INPUT PROGRAM . LOOP SAMP=1 to nsamples. LOOP V = 1 to sampsize. COMPUTE ID=TRUNC(UNIFORM(popsize)) + 1. END CASE. LEAVE SAMP. END LOOP. END LOOP. END FILE. END INPUT PROGRAM . SORT CASES BY ID . exe. * Now bring in records from the BOOTDATA file . MATCH FILES / FILE * / TABLE BOOTDATA / BY ID . exe. SORT CASES BY SAMP. * Use OMS to write the sample means to a file . * First OMS command just suppresses Viewer output. OMS /DESTINATION VIEWER=NO /TAG='suppressall'. OMS /SELECT TABLES /IF COMMANDS = ["Means"] SUBTYPES = ["Report"] /DESTINATION FORMAT = SAV NUMBERED = TableNumber_ viewer = no OUTFILE = "C:\temp\sample means.sav". means myvar by samp / cells = count mean min max var stddev. OMSEND. * Now open file of sample means . Get FILE = "C:\temp\sample means.sav". select if (VAR1 NE "Total"). exe. rename var (Std.Deviation = SD). var lab mean 'Sample mean' SD 'Sample SD' N 'Sample size' . graph histogram(normal) mean / title = "Distribution of Sample Means". means mean / cells = count mean min max var stddev. * Note the mean of the sample means is very close to * the mean of the original population. If we drew all * possible samples (with replacement) of a given size, the * the mean of the sample means would be exactly equal to * the original population mean. * Notice too that the SD of the distribution of sample * means is approximately equal to the Population SD * divided by the square root of the sample size. If * we had all possible samples of that size, the * match would be exact. graph histogram SD. means SD / cells = count min max mean median . * Notice that SD varies considerably from sample to sample. * But the mean and median values from this distribution of * sample SDs is very close to the SD of the original population. * ======================================================================= .