* ======================================================================= 
*  File:    Sampling_Distribution_Demo.SPS .
*  Date:    7-Sep-2007 .
*  Author:  Bruce Weaver, bweaver@lakeheadu.ca .
* ======================================================================= .

* Draw a large number of samples of a fixed size from a population,
* compute the mean & SD for each sample, and write the sample
* statistics to a data file.  Then generate descriptive stats
* and a plot of the sample means.  If the number of samples
* is sufficiently large, this should give a pretty good 
* approximation of the sampling distribution of the mean.

* ----------------------------------------------------------- .
* The routine for drawing k random samples of size SAMPSIZE
* from a population of POPSIZE is based on a routine written
* by David Marso, and posted to usenet by David Nichols, former
* Senior Support Statistician at SPSS.  His post can be viewed
* at the link given below.

* http://groups.google.com/group/sci.stat.consult/msg/710ea4ab83ddf24a?dmode=source .

* ----------------------------------------------------------- .

* Use macros to define the input file and variable of interest .

define datafile ()'C:\Program Files\SPSS\1991 U.S. General Social Survey.sav'  !enddefine.
define myvar ()age !enddefine.		/* the variable of interest .

* Use macros to create constants giving the number
* of samples and the sample size.

define nsamples () 10000 !enddefine.	/* the number of samples .
define sampsize ()    16 !enddefine.	/* the sample size .

define temp ()'C:\temp\' !enddefine.	/* Temporary folder - change if necessary .
define bootdata () temp + 'bootdata.sav' !enddefine.

* The population size (i.e., number of cases in the file) is
* also needed, but it will be worked and defined with a 
* macro below.

* Open the data file .

GET FILE= datafile .
select if not missing(myvar).
exe.

COMPUTE ID=$CASENUM .
exe.
SAVE OUTFILE = bootdata.

* Plot the population data, and get the parameters.

graph histogram myvar.
means myvar / cells = count mean min max var stddev.

* ------------------------------------------------------- .
* NOTE:  Bear in mind that SPSS always assumes one has a
* sample, not a population; so the SD reported above is 
* not quite correct.  It was computed with n-1 in the 
* denominator rather than N, so it will be slightly
* too large.  To obtain the correct population SD, square
* the reported SD and then multiply by (N-1).  This will 
* give you the sum of squares. Divide the SS by N to 
* get the population variance, then take the square
* root to get the population SD.
* ------------------------------------------------------- .

* Here is a bit of trickery from Raynald's SPSS Tools website 
* to get number of cases in the file, and define it in a macro .

RANK VARIABLES=ID  (A) /N INTO N.
EXECUTE.
DO IF $casenum=1.
- WRITE OUTFILE='C:\Temp\N_macro.sps'  /'DEFINE popsize()'N '!ENDDEFINE.'.
END IF.
EXECUTE.
INCLUDE FILE='C:\Temp\N_macro.sps' .

* Now create the k random samples (with replacement) .

INPUT PROGRAM .
LOOP SAMP=1 to nsamples.
LOOP V = 1 to sampsize.
COMPUTE ID=TRUNC(UNIFORM(popsize)) + 1.
END CASE.
LEAVE SAMP.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM .
SORT CASES BY ID .
exe.

* Now bring in records from the BOOTDATA file .

MATCH FILES / FILE * / TABLE BOOTDATA / BY ID  .
exe.
SORT CASES BY SAMP.

* Use OMS to write the sample means to a file .
* First OMS command just suppresses Viewer output.

OMS /DESTINATION VIEWER=NO /TAG='suppressall'.

OMS
 /SELECT TABLES
 /IF COMMANDS = ["Means"]
     SUBTYPES = ["Report"]
 /DESTINATION FORMAT = SAV NUMBERED = TableNumber_ viewer = no
  OUTFILE = "C:\temp\sample means.sav".

means myvar by samp / cells = count mean min max var stddev.


OMSEND.

* Now open file of sample means .

Get FILE = "C:\temp\sample means.sav".

select if (VAR1 NE "Total").
exe.

rename var (Std.Deviation = SD).
var lab
 mean	'Sample mean'
 SD	'Sample SD'
 N	'Sample size'
.

graph histogram(normal) mean /
 title = "Distribution of Sample Means".
means mean / cells = count mean min max var stddev.

* Note the mean of the sample means is very close to
* the mean of the original population.  If we drew all
* possible samples (with replacement) of a given size, the
* the mean of the sample means would be exactly equal to
* the original population mean.

* Notice too that the SD of the distribution of sample
* means is approximately equal to the Population SD
* divided by the square root of the sample size.  If
* we had all possible samples of that size, the 
* match would be exact.

graph histogram SD.
means SD / cells = count min max mean median  .

* Notice that SD varies considerably from sample to sample.
* But the mean and median values from this distribution of
* sample SDs is very close to the SD of the original population.

* ======================================================================= .