Statistical Calculations Applied to Genetics
Richard Cryberg
June 2, 2009
There are a great variety of different types of statistical calculations available online. Most of these are really not applicable to genetics problems and figuring out which one to use can be very difficult for the nonstatistician. Historically, genetics people have tended to use relatively few statistical calculations. More often than not the only thing used is chisquared. The main advantage of chisquared is that it is a very simple calculation. The main disadvantage is that it gives you relatively little information and even worse is often misused in genetics applications, even by the professional genetics people. Most other types of statistical calculations require considerably more arithmetic than chisquared. All of this arithmetic really is not a problem these days because the calculators are available online and all you have to do to use them is plug in a few numbers and then punch calculate.
The purpose of this explanation is to provide directions on how to use binomial statistical calculators. But, before you can use such calculators, you need to know a few definitions.
Probability: In just about any type of statistical calculation you are likely to encounter probability. Probability, when the topic is statistics, is always symbolized as p. When you see p in a statistical equation it must always have a value from 0 to 1. It is always stated as the decimal form. For example, let's say some event has a one in four chance of happening. Stated as a fraction this would be 1/4. Stated as a probability (p) it would be 0.25. Likewise, for an event with a one in eight chance of happening p would be 0.125. I think it is obvious why p cannot have a value less than zero as zero means it never happens. It’s hard for anything to happen less often than never. Even winning the lottery has a p of greater than 0; just not very much greater than 0. Perhaps, it is less obvious why it can't have a value greater than one. The easiest way to understand the maximum value of one is to realize that no event can happen more often than the number of times the test was performed.
Number of Tests: This is also a very common input requirement for statistical calculations. It is invariably symbolized as n. For example, let's say you're doing some genetic test and raised a total of 10 birds and you wish to do a statistical calculation on the results. In this case n would equal 10.
Number of Positive Results: In any experiment that we run we get either zero or some number of positive results. For instance, you might be running some tests on recessive red, and three of the young birds you raised came out recessive red. These three young birds would be the positive results from the experiment. So, if you raised 20 birds and 3 came out red n would be 20 and the positive results would be 3.
Using a Binomial Calculator: My current favorite binomial calculator is at the following web address:
http://www.stat.tamu.edu/~west/applets/binomialdemo.html
This particular calculator not only gives you number results but also gives you a bar graph output that allows you to easily visualize results. Any bar which is red is part or all of the number presented as the answer. It also allows you to calculate several different kinds of results depending on your needs. When you first log into this calculator there is a drop down box with five choices and exactly will be the choice showing. I will talk about when to use some of these various choices in the examples below. It is always possible to use only exactly for any calculation. But for some types of calculations you would have to run a lot of exactly calculations and then sum the results manually. The other options save you the trouble of having to do this by automatic summing. 
Example One: Let's start with a really simple test case. Let’s say you have a hen you suspect of being heterozygous recessive red. So you mate that test hen to a homozygous recessive red cock and raise young. How many consecutive blues do you need to raise before you can reasonably conclude that test hen is not heterozygous recessive red? Obviously if you raise one red young in the first nest the test is over because you have proven the test bird is heterozygous recessive red. But if all you are raising is blues when can you stop?
Well, p = 0.5. That is because for each egg the hen produces there is one chance in two that the egg will get the red gene and one chance in two it will get the wild type gene. But, every sperm is going to carry the recessive red gene. So we take the probability of the egg getting the red gene (0.5) and multiply it times the probability of the sperm getting the recessive red gene (1.0). The product of this multiplication is the probability of getting a red young bird (0.5).
The number of positive results is going to be 0. If you got one red young the test is over so all that is left is 0. Unfortunately, this particular statistical calculator does not have the box well labeled where you have to enter this number. It is the box in the center of the bottom line of the calculator.
That leaves n as an unknown number. But you do have to put something into the calculator for n or it will not work. So you make some reasonable guess at n. Just to get things going let’s try putting n in as 10 and see what happens. This is a good point to mention that drop down box with the choices I mentioned above. As this time you are interested in exactly how often you would get 0 positive results out of 10 tries so you do not have to do a thing about the box. So you just left click on calculate and an answer appears. The answer is 0.001 which is the probability of getting zero positive results out of 10 tests. You can convert this answer to a % by simply multiplying by 100. So 0.1% of the time you would get zero reds out of 10 tries. Is this good or bad?
In science you never prove anything to 100%. Not even things like the Theory of Relativity are considered proven to the 100% confidence level. Yes, every experiment ever done fits that theory within experimental error. But that does not mean someone will not do an experiment tomorrow that does not fit. So, you always have to make some judgment as to how confident you need to be to be comfortable with yourself and make others comfortable. The standard that is fairly accepted in genetics testing is when you have proven something to the 95% confidence level you are on ok ground. Depending a bit on the particulars of the test I generally shoot for something more like 98%.
In the above example we calculated that the chance of seeing zero positive results out of 10 tests was 0.1%. This means we are 99.9% confident that our original assumption that the hen was heterozygous recessive red is wrong. Well, 99.9% confident is more confident than we need to be. So let’s try some smaller number of tests.
Four tests are not quite enough as you can only be 93.7% (100%  6.3%) confident that you are right. That is a bit below the 95% confidence line so to stop there is thin ice. But 5 tests puts you well over 95% and 6 puts you clear up to 98.4%. So the calculation says you can safely stop at five or six.

~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
Now, this is a really simple case that is easy to calculate on any hand calculator. That is if you remember how to do the calculation. The beauty of the binomial calculator is it will give you exactly the same answer and you do not have to remember how to do the math.
Example Two:
Let’s take the same pair of birds above and say you got some red youngsters. Now we all know how recessive red works. It is a simple autosomal recessive so in the above test you should have gotten about half reds and half blues. But just pretend for the moment you do not know how recessive red works. So, you raised 10 young birds from the mating and you got three red birds and seven blue birds. The first thing you need to do is to make up some model for this genetic situation to use to test the data. The question you want answered is "this result statistically reasonable." If you do not know how recessive red works you have to guess a model and then let the calculator tell you if the data fits your guess within the 95% confidence limits. So, let’s guess that 1/2 of the gametes from the blue bar should carry the recessive red gene, while all gametes from the recessive red bird should carry the recessive red gene. Thus, p would equal 1/2 times 1 or 0.5. We now have both p (0.5) and n (10).
You also have to enter the number of positive results. In this case we raised three red birds so the number of positive results is 3.
If we just put our p and n from this example into the calculator and don't change the word exactly in that drop down box it will calculate the probability of getting exactly 3 positive results out of 10 total test results when we believe p = 0.5. I suggest you stop here and go through the exercise of entering this data and clicking the calculate button on the calculator so that you can better visualize what I am saying. In the graphical portion of the output, you will see a bar graph which follows a bell shaped curve with the tallest bar at 5. The bar at 3 should be in red. And just to the left of the compute button your answer will appear which should be 0.1172. This 0.1172 is the chance of getting exactly 3 positives out of 10. Perhaps this will make more sense if you convert it to a percent by multiplying by 100. This amounts to an 11.72% chance. You can also look at the black bars and determine how often other results would be likely to happen. For example, you have about a 25% chance of getting exactly 5 red birds.
Most of the time the probability of getting exactly some particular result is not very interesting. Often you are more interested in knowing if the result you got is reasonable based on an assumed model of the genetics involved. For many genetics results the most interesting option is the at most option. If you click at most on the dropdown box and use a n = 10, p = 0.5 and three positive results then click calculate you will get 0.1719. This gives you the total probability of getting 0, 1, 2 or 3 positive results; in this case 17.19%. In English what this means is that if you ran this test many times, always producing 10 young in each test, 17.19% of the time you would get three or fewer red youngsters. The reason you are interested in the at most number is because you raised 0, 1, 2 and 3 red birds. You had to raise 0, 1 and 2 red birds or you could never have raised 3 birds. So you need to sum the probabilities of raising 0, 1, 2 and 3 birds to get the probability you are really interested in. And by clicking at most that is what the calculator calculates. A usual criteria is you do not want to reject the guessed model unless you are outside the 95% confidence interval that it is correct. The 95% confidence interval extends from 2.5% to 97.5%. So any result within that range says the model is acceptable.
On the other hand if you only got two young recessive reds you are right on the borderline of being able to reject the idea and if you got 1 or 0 you should reject the idea and try some other idea for how the genetics is working. However, I would caution you that anytime you have less than 5 results in a class and a 1 result change in the data would change the conclusion you are best off to raise a few more birds. This is particularly true if you have only raised a small number of total birds. 
It is less of an issue if you have 3 in a class selected from 100 birds raised than if you have 3 in a class out of 10 birds raised. At any rate it is always easy to check. Just enter one bird less or one bird more in the positive results and see if that changes the conclusion. If it does the answer is raise more birds. If not you can be comfortable you are not in too much trouble.
It is a bit hard to understand why you need to add the probabilities of all possible smaller results so it is worth spending a little more time on this aspect. This might make more sense if we talked about very large test numbers. Say the number of test cases is 10,000. If p = 0.5 the chances of getting exactly 4999 positive results is quite small at 0.8%. Yet 4999 is really very close to the expected 5000 and if you did this test would be quite elated to be so close. So the exactly answer does not tell you much at all. But if you change that exactly to at most you sum the chances of getting all results smaller or equal to 4999 and find the answer is p = 0.496 or 49.6%. Obviously 49.6% is not nearly small enough to reject the idea and in fact nearly dead on the best possible result of 50%.
Example Three:
In this example I will illustrate how to try to get some approximate count on the number of genes involved in some complex trait. Let’s consider the number of tail feathers on a show type Fantail as an example. You take a decent quality Fantail that has say 32 tail feathers and mate it to wild type. The young range from 14 to 18 tail feathers. Now what do you do? Well, one very old rule of thumb is to mate that F1 back to the parent it least resembles. In fact that is one of the very few rules Hollander included in his thesis. As 14 to 18 tail feathers is a whole lot closer to the count on wild type than the count for a show Fan the next mating according to this rule would be to mate some of the F1s back to show Fans. Now I am going to make up some data for this mating out of little more than pure guess so anyone who has actual data please bear with me for the purpose of illustration.
Let’s say you raised 25 young from this mating. The number of tail feathers was as follows: 14, 16, 16, 18, 20, 20, 20, 21, 21, 22, 22, 22, 22, 22, 23, 24, 24, 24, 26, 26, 26, 26, 26, 29, and 33. Here you are interested in the extreme values that are most like either the F1 or the Fantail parents. For purposes of this analysis all those values in the middle are important only in terms of number of tests. To make matters worse, in this case how you decide which values are like the parents is also subject to some argument. But, as I am writing this thing I will say there are four cases that came out like the F1s and two cases that came out like the Fantail. Yes, this choice is a bit arbitrary. If you happen to be off by 1 bird in the count of either of the end classes it will fairly seldom impact the results from a good model. Remember the purpose is to get an approximate count on the minimum number of genes. Such a count can give great guidance in how to set up future tests. It can also tell you some trait is so complex you simply want to stop now as it is more than you wish to tackle.
To calculate a statistical model we have to make some guesses again. So let us guess that all the genes that give supernumerary tail feathers are codominants. This may be a wrong guess and only a lot of further testing of alternate models, and perhaps more breeding experiments would tell you if it is true or not. As a first guess we all know a Fantail’s tail count is fairly complex genetically. So assume that four genes are involved. Every one of the offspring is going to get a full complement of all four of these genes from the Fantail parent. But the offspring only have a 0.5 probability chance of getting each of the mutant genes from the F1 parent. So the probability of getting all four mutant genes from the F1 parent is 0.5 X 0.5 X 0.5 X 0.5 = 0.0625. So we can put p = 0.0625 and n = 25 and number of positive results = 2 into the calculator for the young birds that came out with a Fantail count. The result we want to know is the sum of 0, 1 and 2 positive results so we put at most in the drop down box and click calculate. The answer we get is 0.7968 or 79.7% of the time using the model we picked we would get the observed result or less. 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~
So let’s try it with the ones that came out like the F1s. We need to change the number of positive results from 2 to 4. When we click calculate we get 0.9823. In other words 98.2% of the time with the model we picked we would get at most 4 young like the F1s. This high a percentage says the model is not fitting the data very well.
Maybe a three gene model will work better? If you check a 3 gene model with the above data you get 0.3796 and 0.8047 for the probability of each of the above results. This is a much better statistical fit with the data as all events are well within the 95% confidence range.
There is another trick you can pull that can add some light to the interpretation. When you have a class of results with less than five examples you are on a bit of thin ice. The reason is a change of only one in the number of positive results is a significant change. In the calculations above both of the positive classes had less than five cases. One way to beat this problem would be to raise a lot more birds. But that takes a lot of time and feed. Another way is to simply combine two small classes. You combine them by adding both numbers of positive examples and the probabilities of each event together. That gives us a new class of 6 positive results with a probability of 0.25 for the three gene model. If you put these numbers in the calculator you find that getting at most 6 positive results should be expected with a probability of 0.5621 or 56% of the time. This is quite close to 50% so three is a decent guess on the number of genes involved. If you try the summing trick with the four gene model you will find that getting at most 6 positive results is 0.9703 which is uncomfortably high.
You should also try a two gene model and see if it fits. I will leave that test to the reader to do on his own. If it does give a decent fit then you cannot make a choice based on this data. All you can say is it is either a two or three gene process fits this data.
In the above cases I made the guess that all the important genes in tail feather count are either weak codominants or recessives. It is pretty obvious that most are unlikely to be pure dominants based on the pretty low feather count on F1s. But it is entirely possible that one might be a pure dominant that only adds two or three feathers by itself and the rest are recessive. So you could also try models where you used two genes one of which is recessive and one dominant and see if that gives any hints. There might be some dominant in the mix that only adds two or three feathers. Any change from dominant to recessive in your assumed model is going to change the probabilities of seeing those birds most like the parental types. You can work out those probabilities either with math or with a punnet diagram as you wish. In fact if you fool with enough models sooner or later you will come up with one that is a perfect fit for the data. I am sure I could find a model that is a near perfect fit for that made up data. I can be sure because you always can find a model that will fit. In such a case how do you decide which you should use for further tests?
In science there is a principal named Occam’s Razor. Occam’s Razor says you should pick the simplest model that works as your first try. Nonscientists (and not infrequently scientists as well) call this the KISS principal. KISS stands for “keep it simple stupid.” Basically KISS is identical to what Occam’s Razor says. So in either case your first pick to design further testing should be based on the simplest model that gives a decent statistical fit. Now, as you do further testing that simple model may fail and you are forced to go to a more complex model. But it is in general easier to design tests for the simplest model. So it is efficient to start there and only get more complex when the data forces you in that direction. Of course you could raise a lot more birds so the statistical calculations would be more precise. But frankly that is often close to a waste of time if you have 25 or more already from a F1 mated back to a parent type or 50 from F1 X F1 matings. 
The reason is it is usually faster to use a crude model to help design the next tests you do. Unfortunately in the above data not a lot stands out that allows me to suggest a great next test. There might be merit in picking some of the intermediate back crosses and mating to Fantails. For instance there seems to be a cluster at about 22 tail feathers. That cluster may represent a particular genotype. If you do that kind of mating you really need to raise a bunch of birds from each mating so you can treat the data of individual pairs. The only practical way to do this with pigeons is to foster eggs.
Another test you could do to help understand the situation would be to mate an F1 back to wild type. This test will ignore all pure recessives so only give a count on dominants and codominants. Some combination of two or three different tests can often help a great deal in defining the overall problem. If that mating of F1 X wild type says there is only one dominant or strongly codominant gene involved you might consider cutting that one gene out by itself as dominants or strong codominants are pretty easy to isolate. If you cut out such a dominant and made it homozygous you could then mate it to a Fantail to make a new class of F1s that are heterozygous for all the tail feather genes except the dominant. Now mate those new F1s back to pure Fantails. This mating would get rid of that dominant from the mix and allow you to better count the number of weak codominants and recessives.
I picked this tail feather count example for a reason. Any number of people have done breed crosses between Fantails and other breeds to introduce a new gene into Fantails. So the kind of data that would be interesting has been generated over and over. Yet, to my knowledge not one person has ever reported more than superficial results! We generate a ton of genetic data every year on pigeons and do not record it and keep it. So the knowledge gets lost for everyone. Would it help in making a better Fantail if we knew more about the genetics of the tail? Maybe or maybe not, but as long as the data is getting generated anyhow what harm is there in writing it down and preserving it? The knowledge sure would not hurt anyone. It would also be interesting to know something about the fan shape. Is the fan shape mainly due to crowding due to all the feathers present? Or are there, as I suspect, genes that promote the fan shape even with much lower than wanted numbers of tail feathers? And before the Fantail guys accuse me of picking on them, the exact same situation exists with any number of other breeds and their particular traits. The net result is 99% of all data generated each year that could be used to understand genetics is lost or at least not shared.
Summary:
I have heard people say you needed to raise some large number of birds to prove anything. Numbers as big as 100 from a single test. Well, that is nonsense. A lot fewer birds than that can quite often tell you a whole lot. Sometimes one bird is enough. In the first example if the first bird you raised came out recessive red you are done. And you sure do not need to raise 100 consecutive non reds to prove the bird under test is not hetero recessive red. Statistics can be a great help in knowing when you have enough data. It can also be a great help in understanding what the data is telling you about probable multigene models and give you some guidance in when you should pull the plug on a project and move on to something where you can be more productive. The advantage you gain from statistical analysis is it can minimize the total number of birds you have to raise to prove some point. The other big advantage is it allows you to assign a number that shows how confident you are of the truth of the model. The models you get from such analyses are not always going to be right. Sometimes the simplest model will suggest fewer genes than are actually involved due to Occam’s razor being wrong. So you can be comfortable that the particular situation is at least as complex as the simplest model you can find. 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.