* ======================================================================= 
*  File:    number_records.SPS .
*  Date:    17-Mar-2010 .
*  Author:  Bruce Weaver, bweaver@lakeheadu.ca .
*  Notes:   How to number the records for each subject in a file.
* ======================================================================= .

* Thank you to Richard Ristow for giving me the nudge I needed to
* update this file.  The original version was written before 
* AGGREGATE was capable of writing new summary variables to
* the working data file.

new file.
dataset close all.

DATA LIST LIST /id (f5.0) date(date11) y(f5.0).
BEGIN DATA.
1 8-Aug-2002 56
1 9-Aug-2002 59
2 26-Jan-2002 67
2 18-May-2002 75
2 21-Aug-2002 63
3 12-Aug-2002 88
4 1-Apr-2002 45
4 4-Jul-2002 55
4 31-Oct-2002 53
4 11-Nov-2002 49
4 25-Dec-2002 57
4 31-Dec-2002 46
4 1-Jan-2003 52
END DATA.

var width date(11).

* Use MATCH FILES to flag FIRST record for each subject.

sort cases by id date(a).
match files
 file = * /
 by id /
 first = rec1 /
 last = lastrec.
exe.

list.

* Variable REC1 = 1 on the first record for an ID, 0 otherwise.
* Variable LASTREC = 1 on the last record for an ID, 0 otherwise.

numeric recnum(f5.0).
do if rec1.
-  compute recnum = 1.
else.
-  compute recnum = lag(recnum) + 1.
end if.
exe.

list.

* There are (at least) 2 ways to record the total number of records
* per ID on each row for that ID.  One way is sorting by ID and 
* descending RECNUM, then using the LAG function, as follows.

sort cases by id recnum(d).
numeric numrecs1 (f5.0).
do if lastrec.
-  compute numrecs1 = recnum.
else.
-  compute numrecs1 = lag(numrecs1).
end if.
exe.

sort cases by id recnum(a).
list.

* Another way, which may be better for LARGE files (because it
* eliminates the sorting in descending order, and may be a bit
* faster therefore), is to use AGGREGATE.  

AGGREGATE
  OUTFILE=* MODE=ADDVARIABLES /
  BREAK=id /
  numrecs2 = MAX(recnum).

* In older versions of SPSS, AGGREGATE could not write a new
* variable to the working data file like this, so one had to
* write the new variable (TOTRECS) out to another data file,
* and then merge the files via MATCH FILES.  Code to do that
* is shown below, but is commented out.

*aggregate outfile = 'c:\max recnum.sav'
 /presorted
 /break = id
 /numrecs3 = max(recnum).

*match files file = *
 /table = 'c:\max recnum.sav'
 /by id.
*exe.

var lab
 numrecs1	'Total # of records [LAG method]'
 numrecs2 'Total # of records [AGGREGATE method]'.

list.

* Check for agreement of the 2 methods.

crosstabs numrecs1 by numrecs2 .


* ======================================================================= .