More Monkey Business

or "Answers to questions about mathematics and monkeys that may well have been asked."

A constrained rant by The Famous Brett Watson, 28-Jul-2000.

"Ford!" he said, "there's an infinite number of monkeys outside who want to talk to us about this script for Hamlet they've worked out."
-- Douglas Adams, "The Hitchhiker's Guide to the Galaxy"

Introduction

Be warned, this document gets into some mathematics that can be intimidating to those who balk at fractions and algebra! I do try to make the practical implications of all my mathematics clear, and explain the formulae by analogy where I can. Some of the numbers involved in calculations about monkeys can be mind-numbingly big, so in addition to writing them out in full I write them in scientific notation. Scientific notation is a means for writing very big or small numbers. For instance, the number 123,456,789 can be written in scientific notation as 1e8, where the "1" represents the first digit of the number, and the "e8" means that eight other digits follow. With really big numbers like this, the magnitude of the number, which is the number of digits it has, is usually more important than the digits themselves. You can be more accurate than this with scientific notation, but I normally reserve that trick for the very small numbers. For example, the number 0.00000012345 can be written as 1.2345e-7, where the "e-7" part means that the decimal place has to be shifted 7 places to the left. I could also have written it 1e-7, or 1.23e-7, because these mean the same thing with slightly less accuracy.

Infinite Monkeys

In the Hitchhiker's Guide to the Galaxy quote with which I opened this rant, Arthur Dent tells Ford Prefect that an infinite number of monkeys want to discuss a script for Hamlet they've worked out. Has Douglas Adams provided a profound and vivid insight into monkey math, or has he just flipped his lid? Quite possibly both, I'd warrant. Although the universe is big -- really big -- it's not infinite. It was thought, at one time, that the universe was infinite. These days we are pretty sure that it's finite both in size and age, so there's no possibility of having to actually deal with an infinite number of monkeys and typewriters. You can speculate that there are an infinite number of universes if you like, but there's no supporting evidence for that view. This infinity stuff is strictly a mathematical conjecture (as if the previous "experiment" wasn't).

The concept of infinity introduces all sorts of counterintuitive oddities into mathematics, and we should be grateful that we don't have to deal with infinities in our day to day work -- unless you happen to be a mathematician, that is. Mathematicians can get into all sorts of obscure arguments about different kinds of infinity. Since we'll be discussing discrete objects such as "monkeys" and "documents", we'll be dealing with a class of infinity that mathematicians give the odd name countable infinities. When dealing with infinity, you must keep in mind that infinity is a concept, not a number: you can't expect mathematical operations on infinity to behave like mathematical operations on numbers. Asking "what is infinity times two?" is a bit like asking "what is Wednesday times two?", or "what is purple times two?". Bear this in mind as I reveal the following brain-twisting facts.

So what happens if you have an infinite number of monkeys typing away? Do we get a script for Hamlet as Mr Adams suggests? Yes, we do! In point of fact, we get every combination of letters possible with the given typewriter, and that in infinite quantities. So not only do we get Hamlet, we get Shakespeare's complete works, The Hitchhiker's Guide to the Galaxy, this document, and incomprehensibly vast quantities of random garbage. (Note that this document may also qualify as garbage, but I object to it being described as "random".) An infinite number of monkeys typing randomly will rapidly produce every possible written work.

If the monkeys are typing at a reasonable rate, then it will take a while to produce Hamlet by simple merit of the fact that it takes a while to type, but a microscopically small percentage of your infinite monkeys will manage to produce it without errors on the first attempt. This is where we start to realise that infinity is a bit strange, because any percentage of infinity greater than zero is also infinity. To demonstrate, let's say I have an infinite number of jelly beans laid out in a neat row. Because I'm a generous kind of guy, I say you can have some of my jelly beans -- after all, I have more than I need. I don't want to give you the whole lot, however, so I say you can have every fifth jelly bean. So you take the fifth, the tenth, the fifteenth, the twentieth, and so on until you start to realise that this is never going to end and you have an infinite number of jelly beans too, despite the fact that you're only taking one in five. What's even weirder is that despite the fact we both have an infinite number of jelly beans, I still have four times as many as you. This disparity needn't bother either of us, because we could both share our jelly beans with everyone else in exactly the same way, and they would wind up with an infinite number of jelly beans too. Like I said, infinity is weird, and we should be glad that we don't have to deal with it on a regular basis.

The infinite copies of Hamlet that we produce using infinite monkeys are like these jelly beans, but rather than one in every five being Hamlet, the ratio is slimmer. So much slimmer, in fact, that I couldn't possibly give you an analogy, so let's make it a little easier. Instead of a neat row of jelly beans, imagine that we are standing on the shore of a great ocean of jelly beans that stretches to infinity. You pick up a jelly bean and note that it has some letters written on it in very small print -- 41 letters to be precise. I then tell you that the jelly beans have been labeled by monkeys, and each jelly bean has a random typewritten phrase on it. Further, I offer you all the jelly beans that have the phrase "TO BE OR NOT TO BE, THAT IS THE QUESTION" written on them, and truthfully assure you that there are an infinite number of them. After some wading through jelly beans looking for one that you can claim as your own, you start to realise that I've played a little trick on you. There may be infinitely many jelly beans with the right phrase written on them, but finding them is going to be something like looking for a needle in a haystack the size of the galaxy.

That's probably the most interesting and useful thing to know about the infinite monkeys scenario: you'll get any output you care to look for, but the ratio of useful to non-useful material remains a problem. In some sort of "practical" sense, having an infinite number of monkeys doesn't really help even though it theoretically gets results. It's therefore probably no great loss that the universe is finite, and we have to settle for finite numbers of monkeys.

Monkey Recruitment Advice

Let's say that you have a document you want to type, and you've decided to employ monkeys to do it. For some reason or other, you aren't worried about the quantity of useless monkey-output that you'll get along with this document; your only concern is that at least one of the monkeys types the document within a reasonable time frame. Maybe you're doing it for a bet. This poses an interesting problem: given a particular target document, how many monkeys do we need to have a reasonable chance of producing the document in a given time frame? Monkeys may work for peanuts (or bananas), but you still want to minimise your costs, and so employ the minimum number of monkeys.

Our first step in solving this problem is to measure the document itself. For the sake of convenience, we'll be so kind as to rig up the typewriters so that they automatically partition the monkey-output into documents which are the same size as the one we want. If we let the monkeys choose a random length in addition to pressing random keys, it would make the problem that much harder -- and believe me, it's hard enough already. So let's say that this document is L characters long, and it can be typed on our standard simplified 32-key monkey typewriter. The typewriter, as you'll recall from the "to be or not to be" experiment, has the 26 letters of the English alphabet, a space bar, and five punctuation marks (period, comma, question, exclamation, and apostrophe, if you must know).

The number of such documents that can be produced is simply 32 to the power of L, and only one of those is the one we want, assuming we will tolerate no errors. But how many monkeys do we need? Well, monkeys and time can be traded off against each other: half as many monkeys working twice as long will produce the same quantity of output. Instead of talking about the number of monkeys, let's talk about the number of trials we need to ensure a certain chance of success, where one trial is simply one document that a monkey has produced at random. You can then compute the number of monkeys you need based on the amount of time you have and how long it takes a monkey to perform one trial.

The relationship between the chances of success, the number of trials, and the number of possible documents follows an interesting function curve. We can best understand this function curve by looking at some simpler problems first. The simplest of all probability problems is the humble coin toss. Imagine you have one coin, and you want it to turn up heads when tossed. How many times do you have to toss it to have a good chance that it comes up heads at least once? The easiest way to compute this is by calculating the chances of repeated failure. There's a 50% chance that you'll fail to get heads on each attempt, so your chances of continued failure diminish by half on each attempt. Putting it in the positive, your chances of success in one attempt are 50%, 75% in two attempts, 87.5% in three attempts, and so on. You can never be 100% sure that you'll toss a head in any number of attempts, but you can get arbitrarily close.

Let's make matters a little more complex and imagine that we want to get several coins to land heads simultaneously. In point of fact, we want ten coins to land heads when we toss them together. Each coin can land one of two ways, and there are ten coins, thus a total of two to the power of ten ways the coins can land, which equals 1024. Only one of these combinations is the one we want. The chances of repeated failure are much higher here, since there is a roughly 99.9% chance of failing on any given attempt. Let's tabulate some possible results here.

Chances of ten coins all landing "heads" when tossed
Number of trials	256	512	1024	2048	4096
Chance of success	22.13%	39.36%	63.23%	86.48%	98.17%

It comes as no particular surprise that your chances of success increase with the number of attempts, but perhaps the figure associated with 1024 is interesting. If there are 1024 different possible outcomes and you have 1024 trials, what are the chances that you'll see any one particular combination? Or equivalently, what percentage of the 1024 distinct combinations are you likely to see during those 1024 trials? According to our calculations here, the answer is 63.23%. This function is not centered around 50%, and it is not symmetric.

To cut a long story short, it turns out that for large numbers this graph very closely approximates the formula P = 1-(1/(e^R)), where P is the probability of success, R is the ratio of number of trials to number of possible outcomes, and e is a popular mathematical constant having a value of about 2.71828. I call this the Magic Monkey Formula, and variations on it provide just about every answer you'd want to know about large numbers of monkeys hammering randomly on typewriters for long periods of time. Memorise this formula and impress people at parties with your vast knowledge of the subject! (Note: this author is not actually known for being the life and soul of a party, so take the advice about party conversation with a grain of salt.)

Using this formula, we would compute a probability of 63.21% for the 1024/1024 example above, so you can see that it's quite accurate even for numbers as small as a thousand. Working backwards from probability to ratio, we obtain the formula R = ln(1/(1-P)), where ln is the natural logarithm function (and P is greater than or equal to zero, and less than one). Let's make use of this new-found knowledge to find some key ratio/probability pairs.

Chance of success for particular trial/outcome ratios
Chance of success	0.1%	1%	5%	20%	50%	80%	95%	99%	99.9%
Trial/outcome ratio	0.001	0.01	0.05	0.22	0.69	1.61	3.00	4.61	6.91

Note a very interesting property of this function: for low trial/outcome ratios, the trial/outcome ratio approximates the probability of success. That is to say, if your number of possible outcomes is large (say greater than a thousand), and the number of trials is a small percentage of the number of outcomes (say five percent or less), then the probability of success is approximately equal to that percentage figure. This makes sense if you think about it: it's like randomly putting dots on a blank page. When you first start the page is mostly blank and each dot you put on the page is 100% effective at colouring in its part of the page. As the page starts to fill up, each new dot has an increased chance that it will overlap some previous dot, and therefore be at least partly wasted. That is the effect we see here.

Returning now to the monkeys problem, we can see exactly how many trials we will have to perform in order to have a particular chance of success. If we want to have a 50% chance of success, then we will need to have a trial/outcome ratio of 0.69. In other words, to have a 50% chance of typing a document that is L keystrokes long, our monkeys will have to perform 0.69*(32^L) trials.

Example: "Watson, come quickly!"

For the sake of demonstration, let us assume that our target document is the phrase "WATSON, COME QUICKLY!" -- a mere 21 keystrokes -- and that a trial only takes one second. Using our new-found knowledge, how many monkeys will we need to have a 50% chance of getting this within twenty-four hours? Well, there are 40,564,819,207,303,340,847,894,502,572,032 (4e31) possible documents that our monkeys could type of length 21 (computed by 32^21), which we multiply by 0.69 to compute the number of trials we need for a 50% chance of success, giving us a mere 28,117,390,063,466,213,804,330,352,586,381 (3e31) trials. At one trial per second, one monkey can perform 86,400 (9e4) trials in a day, so we will need 325,432,755,364,192,289,401,971,673 (3e26) monkeys.
Hiring this many monkeys will pose logistical problems, as the surface area of the earth is only about 510,000,000,000,000 (5e14) square metres, meaning we will have to fit approximately 638,103,441,890 (6e11) monkeys per square metre (never mind the fact that most of the Earth's surface is ocean). And it doesn't really matter how much you skimp on the banana budget: the total cost is sure to make the US national debt ($6e12 at time of writing) look like peanuts. One gram of banana mash per monkey would translate to a bowl of banana mash several times bigger than the moon.

General Formulae for Monkey Projects

The Magic Monkey Formula that was revealed in the previous section can be generalised to answer a wide range of questions. In the original "to be or not to be" scenario, we took a problem and then started guessing how many monkeys we needed, followed by a calculation as to what our chances of success were. Now, as shown in the "Watson, come quickly!" example, we can choose an arbitrary chance of success and determine the number of monkeys we need to achieve it.

Another question we might want to ask is, "given a particular number of monkeys, what size of problem can we solve in a given time frame?" This is an interesting question, because we can take the largest imaginable number of monkeys, make them work at the fastest imaginable speed, and measure the size of the problem that they could feasibly solve. This figure can then act as an upper limit to the size of problem we can talk about solving with a random approach. Anything harder than that would involve more resources than we can reasonably hypothesise, and hence becomes a strictly mathematical question rather than a practical one.

In the previous section I introduced a variation of the Magic Monkey Formula R = ln(1/(1-P)), where R is the ratio of the number of trials to the number of possible outcomes, and P, the probability of success, was constrained to be zero or more, and less than one. Let's change our terminology slightly and use "combinations" rather than "possible outcomes", and denote this by C, and use T to denote the number of trials. Hence, this formula can also be expressed T/C = ln(1/(1-P)).

In order to answer the "size of problem" question, we need to express the formula in terms of C, which is what really encapsulates the size of the problem. Reworking the formula yet again, we can write C = T / ln(1/(1-P)), but we have to place another restriction on P. Whereas before it was fair for P to be equal to zero, we now require that it must be strictly greater than zero, and strictly less than one. You can see this for yourself if you perform the steps necessary to transform the formula, but it's also obvious if you think about it in terms of the question we are asking: if you want a zero percent chance of success, then the only way to do it is not to try!

Note that with this arrangement of the formula, you get your answer in terms of the number of possible combinations that could be tried. If you want to convert that figure back down to a "document length" figure, then you need to use a logarithm function. Specifically, if there are K distinct keys on the typewriter, then the length (L) of the document is given by L = log(C) / log(K). In this formula, it doesn't matter if the log function is log to the base ten or the natural logarithm: the answer will still be the same. This is one of the interesting properties of the logarithm function. Note that this formula is equivalent to "log base K of C", but in most practical situations you'll only have access to natural logarithm and/or log base ten, so the formula that I've given is the one you're most likely to actually use.

Example: a Universe of Monkeys

According to Mr Sunspot, the number of atoms in the universe could be as high as 6e79. In my "to be or not to be" problem, I used a figure of 17 billion years for the age of the universe, and it's an eminently fair figure as far as popular scientific theory goes, but this time I'll pad my estimate a little bit and say twenty billion years (2e10) as the upper limit, citing Dr Richard Muller as a reference. The hard part here is coming up with an estimate of how quickly we can realistically churn through our trials, but for the sake of argument I'll use a unit of Planck time (1e-43 seconds) as the smallest conceivable time to execute a trial. Planck time is not something you're likely to hear about outside the realm of quantum physics, and it's hard to appreciate how small it is: there are roughly as many Planck times in a picosecond as there are picoseconds in two hundred billion years, if that helps. A picosecond is one "trillionth" (1e-12) of a second. I can't picture a Planck time, myself: I just use numbers and forget that it's supposed to be representative of anything real.
So with that introduction to astronomical figures out of the way, let's imagine that every atom in the universe is acting as a monkey with a typewriter on our behalf, able to produce a trial document in one Planck time, and having twenty billion years in which to operate. On top of that, let's say that we only want to have a 0.001% chance of success: really slim odds, but it could happen. The number of Planck times in twenty billion years is about 5e60, and multiplying by the number of atoms in the universe gives us a total of 3e140 trials. We can take a shortcut here and invoke the "small probability rule" mentioned in the previous section: when the probability is small, we can use the probability itself instead of the ln(1/(1-P)) formula, because the difference is negligible. Thus, in this case, we divide the number of trials directly by our probability of success, giving a final C value of 3e145. Thus, we can expect a 0.001% success rate at attempting to solve a problem of this magnitude, given every atom in the universe churning through trials at a rate of one per Planck time over the course of twenty billion years.
Moving along to our second formula, we translate the number of combinations back into a document length. Taking our 32-key monkey-typewriter again, what length of document is this number of combinations equivalent to? Referring to the formula, we see that the length is given by log(3e145)/log(32). If you use a base-ten log function here (recall that we can use any base, so long as we use the same one to compute both parts), then there's a short-cut to get a rough answer for the log(3e145) part: the value of log(1eN) is N, so log(3e145) will be somewhere between 145 and 146. Considering the second part of the formula, log(32) is about 1.5, so a quick back-of-an-envelope calculation gives us a document length of 97 keystrokes. If you take the long way around and be as precise as you can, then the answer works out to be approximately 96.633 keystrokes. Not that you can have a partial keystroke: it simply means that 96 keystrokes is "within budget", so to speak, and 97 is "over budget".
From this example, it's clear that any speculation involving something more complex than a 100-keystroke document (as produced by our 32-key monkey-typewriter) arising purely as the result of random processes is utterly ludicrous. In such a case, one must hypothesise some additional factor, be it an appeal to miracle, intelligence, an infinite number of universes, or some special directing mechanism (like a ratchet) that lets us break the problem down into smaller steps.

Allowing for Error

So far we've maintained the fairly rigorous requirement that our document be produced without errors. A single typo was enough to disqualify the entire document. For some purposes this is a reasonable requirement, but in other cases we should relax it a bit. Allowing a bit of leeway in our target document will increase the odds that we get a satisfactory result, and it's pretty obvious that we need every break we can get if we're to have any success with these monkeys.

Allowing for errors is simply a matter of adjusting our formulae to allow for multiple target documents instead of just one. We've previously defined R = T/C, and this definition is true for where we have precisely one correct alternative in all the possible combinations. To make the "one" explicit, we could write it R = T * 1/C. If we introduce A as the number of acceptable target documents, then the formula becomes R = T * A/C.

This is a very minor change to the overall formulation, and is precisely equivalent to multiplying the number of trials by the number of acceptable documents, but bear in mind that this is an approximation that only works under specific conditions. The formula is only accurate when there are a large number of trials, and a large number of possible outcomes. When multiple target documents are introduced, this should be considered the same as reducing the number of possible outcomes. That is to say both T and C/A should be large (I suggest a thousand or more). If C/A gets too small, then the problem is becoming trivial and it should be solved by the traditional approach of computing the chance of repeated failure, not the Magic Monkey Formula.

With that caveat out of the way, we must make a decision about how many documents should be considered acceptable. The range of acceptable variation will depend very much on the nature of the task and who is making the requirements. Also, some kinds of errors will be more severe than other kinds of errors: all typos are not created equal. But rather than get terribly specific about certain kinds of errors, let us use a general model of errors which, whilst lacking some important detail about errors under specific circumstances, gives us a good feel for how allowing errors affects the entire problem. To this end, I propose to model errors as a percentage of the document. For example, a 1% (or one in one hundred) error rate means that up to 1% of the total keystrokes in the document can be wrong. We don't care how wrong that 1% of keystrokes is, nor do we care which 1% of the keystrokes are wrong, and we also include all documents that have an error rate less than our limit.

The mathematics for computing this kind of situation is somewhat more complex than we've encountered so far. In the case of our 32 key typewriter, any particular keystroke can be wrong in one of 31 ways. If the document has exactly one error, then we multiply the number of keystrokes by 31 to produce the number of possible documents with exactly one error. To compute the number of documents that have exactly two errors, we take the number of ways we can choose two keystrokes (independent of the order) and multiply it by 31 to the power of two. In general, the number of possible documents that contain exactly E errors is the number of different ways you can choose E letters multiplied by 31 to the power of E.

Fortunately for us, this idea of "the number of different ways you can choose R items from a group of N" has already been figured out by someone else: it is called the combination function. The formula for the number of ways that r items can be chosen from a set of n is given by C(n,r) = n!/(n-r)!r!. The exclamation marks denote the factorial function, n!, which is equal to one when n equals zero, and equal to the product of all the numbers between one and n for values of n greater than zero.

I'd love to describe these functions in more detail for you, because they are most fascinating and useful, but it would take far too long. Maybe some other time. For now, let's apply the knowledge we've just acquired to solve a monkey problem with errors.

Example: a Bit of Leeway

Let's suppose that we want to get our monkeys to produce the phrase "WATSON, COME QUICKLY!", (21 keystrokes) as in a previous example, but this time we're going to be a bit more relaxed about the output: we'll allow up to three typing errors (an error rate of one in seven, or about 14%). How much does this improve our chances of getting the desired output, given other conditions are the same as before?
Well, rather than one target document, we now have several, and we've seen a formula for this, so the question we need to answer first is "exactly how many acceptable target documents are there?" Given that we allow up to three typing errors, we must compute the number of documents that have exactly none, exactly one, exactly two, and exactly three typing errors, then add these together for our total.
Exactly none is simple: there is only one such document. Exactly one is not hard: there are 21 keystrokes to choose from, and each one of these can be wrong in one of 31 ways, so there are 21 * 31 = 651 such documents. Exactly two is a little harder: we must first compute the number of ways we can select two keystrokes from our set of 21. This is done with the function C(21,2). Using a calculator, this gives me the answer 210. Each of the keystrokes can be wrong in one of 31 ways, so there are 31 * 31 = 961 possible ways that each of these combinations can be wrong. Thus, there are a total of 210 * 961 = 201,810 documents that have exactly two typing errors. The formula for exactly three is similar: C(21,3) = 1330, and there are 31 * 31 * 31 = 29,791 ways in which each combination can be wrong, for a total of 1330 * 29,791 = 39,622,030 possible documents with exactly three errors. Adding these together gives 1 + 651 + 201,810 + 39,622,030 = 39,824,492 possible documents with up to three errors.
The original "Watson, come quickly!" problem called for a 50% chance of success, and we determined that the trial/outcome ratio had to be 0.69 in order to achieve this. That is, R = 0.69. Using our new formula for R, R = T * A/C, we can see that we need to find the number of trials, T, when the formula is 0.69 = T * 39,622,030 / 40,564,819,207,303,340,847,894,502,572,032. Note that the value of C/A is still huge, so the technique is still valid. Solving for T gives us T = 706,418,254,012,712,250,862,644 (7e23).
This has made our original problem substantially easier. Now we will only need 8,176,137,199,221,206,607 (8e18) monkeys, reducing our monkey accommodation requirements to 16,105 monkeys per square metre of the Earth's surface.

In case you want to experiment with the effect of increasing the error rate further, here is a table of values that I've pre-computed. The E column gives the maximum number of errors that appear in the document, and the A/C column gives the corresponding value for A/C. This applies only to a document with 21 keystrokes, as for the above example.

`A`/`C` Values by Max. Error Count for a 21 Keystroke Document
E	A/C	E	A/C	E	A/C
0	2.465190329e-32	7	8.006727193e-17	14	2.298221608e-6
1	1.607304094e-29	8	4.358523348e-15	15	3.368795788e-5
2	4.991073644e-27	9	1.959382788e-13	16	3.985936420e-4
3	9.817495253e-25	10	7.322705180e-12	17	3.725674880e-3
4	1.372395535e-22	11	2.282524791e-10	18	2.664556785e-2
5	1.449881210e-20	12	5.935604972e-9	19	1.388324124e-1
6	1.201722142e-18	13	1.284241700e-7	20	4.866116305e-1

Bear in mind that the values for error rates of seventeen or more will not be well suited to the monkey-formula, since the value of C/A becomes less than a thousand, and the approximation method starts to break down.

The Normal Distribution

There's another way we can model this problem which may give us some further insight. Think in terms of typing the document as being a test, and let us give each outcome a score. For each correct keystroke we award one point, and for each incorrect keystroke we award nothing. From this, we can set an acceptable minimum score: an error rate of 1% as discussed previously becomes a score of at least 99%.

In order to compute the number of acceptable documents, now we need only compute the number of documents in the set of all possible documents that reach our minimum acceptable score. Fortunately, this kind of computation has already been well addressed in the field of statistics, and the answer can be approximated using the normal distribution, also known as a bell curve because of its roughly bell-like shape.

In order to use the normal distribution, we need to perform some statistical analysis of our data. In particular, we need to know the mean, and standard deviation of our data. I don't want to explain these terms in any depth right here -- an introduction to statistics is beyond the scope of this document -- but the mean is just the average value, and the standard deviation is a measure of the spread of values. For lots of good, detailed information about all this kind of thing (including how to use the normal curve in the way I am doing here), see Statistics by Freedman, Pisani, and Purves (ISBN 0-393-09076-0).

If we are treating our keystrokes as having a score of either one (for a correct keystroke) or zero (for an incorrect), then in each case there will be thirty-one incorrect possibilities and one correct one. Thus, the average score per keystroke when struck at random will be 1/32. Similarly, the standard deviation (for which I won't give the formula) will be about 0.1740.

The mean and standard deviation describe the kinds of scores that we will get from a monkey on our typewriter in a completely general way. From these two values, we can compute two other important values which will be specific to a document with a particular number of keystrokes: the expected value, and the standard error. The expected value is simply the mean multiplied by the number of keystrokes; the standard error is the standard deviation multiplied by the square root of the number of keystrokes.

The expected value and standard error tell us how our scores will be distributed. The bell curve will be centred around the expected value, and the standard error will determine how spread out it is. The mathematics for computing the area under a normal curve is extremely nasty, so it is usually done by looking up a normal table (where someone else has done the nasty math already). The normal table will be expressed in terms of standard deviations above (or below) the mean, and this corresponds in our case to standard errors above (or below) the expected value. For example, the normal table tells us that the area two standard deviations either side of the mean covers about 95% of the area under the curve, so we can expect 95% of all possible outcomes to be within two standard errors of the expected value.

It's a bit hard to explain the process any more clearly than this, so let's run through an example to get more acquainted with it. Bear in mind that this is an approximation method, and it's more accurate when there is a large number of events, such as more than 100, especially on a very skewed set like this (with one success to thirty-one failures). Hence it is not necessarily the best approach for monkeys and typewriters where the documents rarely get any bigger than one sentence. Our example will therefore consider a 100 keystroke document to give it a chance to be moderately accurate.

Example: Finding an Error Rate With the Normal Distribution

The normal distribution is most readily applied to problems where we have a large known number of keystrokes, such as a hundred or more. We've already seen that it's absurd to talk about such events happening in practice when there's only one acceptable outcome, but if we allow a large margin for error, it comes back into the realms of possibility. So let's examine some of the properties of a one hundred keystroke document "with errors" using the normal distribution.
The first thing we have to do is compute the expected value and standard error. When there are 32 possible keystrokes, one of which scores one point and the rest of which score nothing, we would expect an average score of 3.125 in a one hundred keystroke document. That is, the "average" document would have 3.125 correct keystrokes (pretty poor show, huh?). There's no such thing as a fraction of a keystroke, but there's also no such thing as a perfectly average document: we're talking about the combined properties of all possible documents. The standard error for this situation is the square root of 100 times the standard deviation of the set (which we previously determined to be 0.1740). Thus, the standard error is 1.740.
We can now use a normal table to determine the acceptable outcome to possible outcome ratio for any particular error rate. First, you must determine how many standard errors above the expected value your required score lies. If you want 50 of the keystrokes correct (that is, a score of at least 50), then this is approximately 27 standard errors above the expected value. The ratio of acceptable documents to possible documents is then given by the area underneath the curve from that point to infinity. This value is the A/C portion of our R = T * A/C equation.
For instance, let's say that we can afford one thousand trial attempts -- one monkey working for long enough to produce one thousand attempts. Given a document one hundred keystrokes in length, the chances that the monkey will produce the entire document are basically nil, but how good will the best attempt be (to a confidence level of 50%)? The confidence level of 50% tells us that R is 0.69, and we are given T equal to one thousand. Thus, the value of A/C must be 0.00069. Looking up our local friendly normal table, we find that this value corresponds to 3.2 standard deviations above (or below) the mean. We choose the value 3.2 standard errors above our expected value, which is 8.693, as this represents the best outcome.
Thus, there is a 50% chance that our monkey will achieve a score of 8.693 or better on the best of one thousand attempts. To put it another way, if you took a large number of monkeys, and let them all type a thousand documents of one hundred keystrokes each, then about half of them would achieve a personal best score greater than 8.693.

If we performed this test using the formulae discussed previously rather than the normal curve method, we would reach an answer of between nine and ten, so you can see that the normal approach is close, but not perfect. This is partly because our data is so skewed: it would work much better if our monkey were simply tossing a coin rather than being given a choice of 32 keys to hit. The accuracy of the technique would also improve with a longer document, but increasing the document length has its own problems.

By the way, the typical normal table only goes up to about five standard deviations above the mean, but if we want to throw billions of monkeys at a problem for billions of years, then the typical normal table simply isn't precise enough for our needs. For example, the problem of getting fifty out of a hundred keystrokes right involves computing the value twenty-seven standard deviations above the mean.

As I mentioned earlier, the formula for computing probabilities on the normal curve is a veritable mathematical horror story, but for your convenience, I have attempted to compute a table of Probabilities for Insanely Large Numbers of Standard Deviations Above the Mean. I really can't vouch for the integrity of these numbers, although I've done all my computations to a thousand significant figures and believe that I have the formula right. Caveat lector! In the following table, the #SD column gives the number of standard deviations above the mean, and the Area column gives the area under the normal curve above this point (which we use as the value A/C). I only take it up to 29 SDs above the mean because at that level you've exceeded the complexity of the "universe of monkeys" problem, and you might as well call success at that level a miracle.

Probabilities for Insanely Large Numbers of Standard Deviations Above the Mean
#SD	Area	#SD	Area	#SD	Area
0	5.000000000e-1	10	7.619853024e-24	20	2.753624119e-89
1	1.586552539e-1	11	1.910659574e-28	21	3.279278018e-98
2	2.275013195e-2	12	1.776482112e-33	22	1.439892435e-107
3	1.349898032e-3	13	6.117164400e-39	23	2.330637006e-117
4	3.167124183e-5	14	7.793536819e-45	24	1.390392119e-127
5	2.866515719e-7	15	3.670966199e-51	25	3.056696706e-138
6	9.865876450e-10	16	6.388754401e-58	26	2.476063316e-149
7	1.279812544e-12	17	4.105996202e-65	27	7.389481007e-161
8	6.220960574e-16	18	9.740948919e-73	28	8.123869470e-173
9	1.128588406e-19	19	8.527223953e-80	29	3.289785267e-185

Lessons From the Normal Curve

The normal distribution tells us a very important thing about the nature of our problem: the expected value increases in direct proportion to the number of keystrokes, but the standard error increases in proportion to the square root of the number of keystrokes. The important upshot of this fact is that the larger the number of keystrokes in the document you wish to type, the greater the likelihood it will be relatively close to the expected value.

For example, about 99.99% of all values fall within plus or minus four standard errors of the expected value. In the case of a one hundred keystroke document, that covers the range from zero correct keystrokes to ten correct keystrokes, which means that 99.99% of the possible outcomes fall in only 10% of the possible scores. If the document is ten thousand keystrokes (still much smaller than this document), then the expected number of correct keystrokes is 312.5, and 99.99% of all the possible outcomes will be between 243 correct keystrokes and 382 correct keystrokes, which accounts for only about 1.4% of all the possible scores. Increased document lengths result in an increased chance of getting average-looking results, and a decreased chance of getting interesting-looking results.

This needn't be all bad news. If you actually want an extremely error-ridden document, or are happy with error rates of thirty-one in thirty-two, then you will get exactly what you want! Alas, in the real world such high error rates are rarely if ever useful: a document with such a high error rate is effectively impossible to distinguish from any other document; when a document is that riddled with errors, you can't tell whether it's supposed to be Hamlet or Macbeth. Thus, the shorter documents are likely to be the more "interesting" ones by merit of the fact that they are likely to deviate further from the average.

The other point to note about the normal curve is that it works both for typewriters and coin tosses. The number of keys on the typewriter has an effect on the range of outcomes, but it doesn't change the nature of the problem. There's no fundamental difference between a typewriter with two keys, thirty-two keys, or a PC keyboard with 104 keys: we would use the same kind of formula to compute the outcomes in all cases, and the normal curve is a good approximation for all of them.

Conclusion

Let's summarise the key points of the mathematics discussed in this document.

We let P represent the probability of an event, expressed as a number between zero and one. Zero means that the event will never happen, and one means the event will always happen. Sometimes we disallow the value one or zero where we can't guarantee certain success or failure.
We let K represent the number of keys on the typewriter.
We let L represent the number of keystrokes required to type a document.
The number of distinct documents that can be produced in such a scheme is K^L. We call this figure C, the number of combinations.
If you have a value for C, then a means of converting it back into values of L and K is to choose a value for K (such as the value "32" that I normally use) and compute L with L = log(C) / log(K). The log function can be log to the base ten, or the natural logarithm, or any other logarithm so long as it is the same in both instances.
We let T represent the number of trial attempts we make at producing the document.
We let R represent the ratio of trials to possible outcomes, T/C.
The chance of getting the keystroke that you want for a particular document on any given keystroke is P = 1/K.
The chance of getting all the keystrokes right for a particular document of length L is P = (1/K)^L. This can be hard to compute unless you have an arbitrary precision calculator, as the number can be very small.
When attempting to produce a particular document of L keystrokes using monkeys and typewriters, the chances of getting at least one correct document in a total of T trials is P = 1-(1-(1/K)^L)^T. This can also be difficult to calculate due to the presence of very small numbers.
Fortunately, we have the magic monkey formula which gives us a very simple but accurate way to estimate probabilities when both C (which is K^L as you'll recall) and T are large, say both greater than a thousand. (The larger they are, the more accurate the final estimate.) This formula is P = 1-(1/(e^R)) where "e" is a mathematical constant having a value of about 2.71828, and R is T/C, as mentioned above. This is an approximation, but it's a good one. All of the following formulae which describe themselves as "an approximation" are variations on this formula, having the same requirement for large values of T and C in order to ensure accuracy.
When T/C is small (say 0.05 or less), then the probability computed by the magic monkey formula will be very close to T/C. I call this the small probability rule. This makes it extremely easy to find a measure of the probability under special circumstances: when the number of trials is large (say greater than a thousand), and the number of possible outcomes is at least ten times greater, then the probability of success is very close to T/C. What could be easier?
Again, if both T and C are large, we can estimate the necessary R for a given chance of success P using the formula R = ln(1/(1-P)), where "ln" is the natural logarithm function. In this instance, there is no solution when P = 1. This is an approximation.
More commonly, this formula is expressed in terms of a given problem size C and a given chance of success P to determine the number of trials T required: T = C * ln(1/(1-P)). Again, this cannot be solved for P = 1, and it is an approximation.
Alternatively, you could compute the size of problem you can solve (to a degree of certainty P) with a given number of trials T by solving the equation for C: C = T / ln(1/(1-P)), with the restriction that P must be strictly greater than zero and strictly less than one. This is an approximation.
We let A represent the number of documents in the set of all possible documents that we consider acceptable outcomes. In the previous examples, this has taken the value "one" implicitly.
One means of choosing a value for A is to compute the number of documents that have a limited number of mistakes. For example, the number of documents that have exactly one mistake is (K - 1) * L. The general formula for the number of documents with exactly N errors is (K - 1)^N * (L!/((L-N)!*N!)), where N! represents the factorial of N.
The chances of getting at least one acceptable document in a total of T trials is P = 1-(1-A*(1/K)^L)^T. This is exact, but hard to compute where large numbers are involved.
Fortunately, the magic monkey formula also generalises to the case where we have many acceptable outcomes. In this case the formula for R becomes R = T * A/C. In order for this variation to be an accurate approximation, not only should T and C be large, but A should be much smaller than C. My rule of thumb is that T should be at least one thousand, and C should be at least a thousand times bigger than A.
Thus, to compute the number of trials T needed to obtain at least one acceptable outcome with certainty P (with given A and C), use the formula T = C/A * ln(1/(1-P)). The value of P must be less than one (absolute certainty is not possible), and the accuracy of the approximation depends on C being much larger than A.

The normal curve can also be used to find approximate answers to monkey typing problems, but in general it is neither easy to use, nor as accurate as the magic monkey formula. The properties of the normal curve do tell us several important things about the nature of the problem, however.

The number of keys K on the typewriter does not significantly change the nature of the problem. It affects the average number of keystrokes expected to be right, but the distribution of possible outcomes still approximates the normal distribution.
The longer the target document, the less likely it is that a significant proportion of the keystrokes will be correct. You are more likely to find a ten keystroke document that is 90% correct than a one hundred keystroke document that is 90% correct, and so on.
99.99% of all outcomes fall within four "standard errors" of the average score. Any outcome more than 28 "standard errors" away from the average is basically miraculous. Some people would consider this definition of "miraculous" to be conservative.

Author: The Famous Brett Watson mailto:famous@nutters.org
Revision: 2
Date: 2000-07-28
IPRs: This document is Public Domain (P) 2000. Other works quoted in this document retain their original copyrights, and are included here only on a "fair use" basis.
Acknowledgements: the author wishes to thank those people who took the time to ask interesting questions and make thoughtful comments about the original "Mathematics of Monkeys and Shakespeare" essay.