Introduction and Results of the Model

The statistical analyses on the following pages are an attempt to take us beyond the realm of any single grower's anecdotal experience or subjective perception, and to add some scientific rigor to the analysis of the significant influences on crop yield. There are many fields or variables contained in the YOR database, and we are applying powerful statistical analytic techniques to try to tease apart these possible influences on crop yield to determine what really matters and what doesn't.

To accomplish this, we have undertaken a series of statistical analyses, beginning with some basic, preliminary, exploratory analyses and concluding with a final, definitive model that shows what matters most in determining crop yield. One might ask why we didn't just skip the preliminary analyses, and instead go directly to a definitive model. There are two reasons. First, there are some statistical principles of good analysis that we must follow. One of these principles says that a valid and reliable model cannot be built unless we have at least 10 times as many data points (grow reports) as we have variables that are being tested in the model. This causes an immediate problem for us, because there are potentially dozens of candidate predictors of crop yield that exist in or can be derived from the YOR database, including various aspects of lighting, growing medium/method, nutrient type and delivery method, plant spacing and plant support method (e.g., trellis, ScroG, etc.), and so forth. But we have only about 160 grow reports. So we may have as many candidate predictor variables as we have grow reports, thus violating the minimum 10-to-1 ratio rule.

So we have to go through a sort of "elimination tournament" not unlike an NCAA sports tournament. First we examine small numbers of possible influences on crop yield in separate analyses. We eliminate candidates in these preliminary analyses that appear to have nothing to do with crop yield. Then we take the candidates which survive these preliminary tests and appear to have some influence on crop yield, and we test them against one another in a final round of modeling, always making sure to have at least a 10:1 ratio of grow reports to potential predictors at each stage of our analysis.

So in the early rounds of analysis, we do simple correlations between one predictor at a time and crop yield, to see if the predictor seems to have any influence on crop yield. We do a large number of these correlations, one for each predictor, and remove from future consideration any variable that appears to have only a very weak correlation with crop yield. After this first elimination round, we then take small batches of the surviving candidates and force them to "fight it out" with one another in a series of preliminary regression models. Only a handful of variables survive this round of modeling, and these remaining candidates then are tested against one another in a final regression model, from which only four useful variables ultimately survived as statistically significant predictors of crop yield.

The second reason that we first do all the individual correlations is that we may see that some things that many growers assume are important influences actually are not. Thus, even before we get to the regression modeling phase, we can perhaps debunk a few myths, as you well see later on when we review these simple correlations.

Moon Doggie's Analyses
	Introduction and Results of the Model
Next -	Original & Created Fields and Variables
	Correlations (analyses of field variables)
	Final Analysis & Conclusions
	Statistical Glossary