## Statistics notes

1. prob_hyp.pdf. Probability & hypothesis testing.
2. nonpar.pdf. Nonparametric (mostly rank-based) tests. Click here for further note concerning calculation of T for the Wilcoxon signed ranks test.
3. categorical.pdf. Tests for categorical variables (e.g., chi-square, Fisher exact, etc). Assumptions for chi-square tests are summarized here.
4. z_and_t_tests.pdf. Notes on z- and t-tests. See also Assumptions for t-tests and the Unequal Variances t-test.
5. anova1.pdf. One-way ANOVA. (Updated on 7-Jul-2003.)
7. anova2.pdf. Two-way ANOVA. This shorter version is recommended to students in the BHSc stats class.
8. linreg.pdf. Simple linear regression. Here is a brief note on the coefficient of determination.
9. multreg.pdf. Multiple regression. Also, here is a note on rules of thumb concerning how many variables you may safely include in a regression model.
10. anova_regression.pdf. Similarities between one-way ANOVA and linear regression.
11. ancova.pdf. Analysis of Covariance.
12. pcafa.pdf. Principal components analysis and factor analysis.
13. Note on odds ratios in multinomial logistic regression.
14. A t-test for the difference between two non-independent Pearson correlations.
15. Errata and clarifications for Biostatistics: The Bare Essentials, 2nd Edition. Note that in more recent printings of the 2nd Edition than I have, some of these problems may have been fixed.
Notes by other folks
Newsgroup posts & responses to questions.
• Flavours of mean. In this newsgroup post, Donald Burrill demystifies some of the less well understood means statisticians sometimes report (e.g., the geometric and harmonic means). [text version]
• The 95% confidence interval for the mean. Includes a nice example given by David Howell in his book Statistical Methods for Psychology.
• ANCOVA versus analysis of change scores, posted to newsgroup sci.stat.edu by Dave Krantz back in 1997.
• Normality and regression: a series of posts that appeared in sci.stat.consult and sci.stat.edu in May 2000.
• How to check for departure from linearity in linear models: A series of newsgroup posts.
• STEPWISE REGRESSION.
• Three comments on stepwise regression from Rich Ulrich's Stats FAQ. Stepwise regression (or discriminant function, or logistic, also, all-possible-subsets) has been beaten to death in discussions on the .stats Usenet groups. A lot of posts are strongly negative, from multiple points of views. This comes as a surprise to users whose introduction has come partly from stat-packages which make the options seem easy and appealing. Included here are three posts that cover a good range of objections, and offer pertinant references.
• Stepwise regression (along with other topics) is also covered in Mike Babyak's excellent article on overfitting regression models (Psychosomatic Medicine 66:411–421, 2004).
• Jerry Dallal's notes on Using the Bootstrap to Simplify a Multiple Regression Equation. Dallal finds (via simulations) that "Whatever peculiarities in the dataset that led [a particular set of variables] to be the chosen ones in the stepwise regressions also make them the favorites in the bootstrap samples." He concludes, therefore, that using boostrap methods does not solve the problems with stepwise regression.
• A rank-based alternative to between-within ANOVA.
• Commentary on the hypothesis testing controversy By Alan McLean. In April 2000, Alan McLean posted to one of the stats newsgroups a couple of messages concerning the hypothesis testing controversy. I find his commentary to be very sensible and refreshing, which cannot be said for much of what one reads concerning this topic.