Site hosted by Angelfire.com: Build your free website today!
Evaluation order and tail recursion
ML uses "call-by-value" evaluation order. The general form of an application is exp1 exp2. To
evaluate such an expression the system performs the following steps:
  1. exp1 is evaluated to produce a function. In practice, exp1 is often the name of a defined
      function, such as fact, and so there is no evaluation to do.
  2. exp2 is then evaluated to produce a value.
  3. The function body is then evaluated after performing pattern matching on the argument
      value.
The important thing to note is that the argument is always evaluated before the function body.
   Using call-by-value, it is easy, if tedious, to expand out the application of a function to
investigate the run-time behaviour. For example, given the definition of fact from Practical 1,
we can simulate the execution of fact 3 as follows:
      fact(3)   3 * fact(3 - 1)   3 * fact(2)
                      3 * (2 * fact(2 - 1))   3 * (2 * fact(1))
                      3 * (2 * (1 * fact(1 - 1)))   3 * (2 * (1 * fact(0)))
                      3 * (2 * (1 * 1))                      3 * (2 * 1)
                      3 * 2                      6
Note how the multiplications get delayed until the recursion "bottoms out". We need to store
these pending calculations somewhere (e.g. on the stack) and so the amount of storage required
to evaluate fact n will be proportional to n. We can make the function more efficient (but
more obscure) by exploiting the associativity of  . For example, we know that
                    3 * (2 * fact(1)) = (3 * 2) * fact(1) = 6 * fact(1)
If we perform this rearrangement at each step then we won't need to use the stack to save the
pending multiplications. The following version of the factorial function performs this optimi-
sation:      fun fact n =
           let fun facti(n,p) = if n = 0 then p else facti(n-1, n*p)
                in facti(n,1)           end
Constrast the evaluation of this version of fact with the original:
                              fact(3)   facti(3, 1)
                                           facti(3 - 1, 3 * 1)
                                           facti(2, 3)
                                           facti(2 - 1, 2 * 3)
                                           facti(1, 6)
                                           facti(1 - 1, 1 * 6)
                                           facti(0, 6)
                                           6
This recursion is of a special form. When each recursive call terminates the caller terminates
as well. These are known as tail-recursive, or iterative calls. In the original version of factorial
the caller had to perform a multiplication after each call before it could terminate.
   Tail-recursive function calls can be handled very efficiently. For example, a function tail-
recursively calling itself can be compiled into a simple jump, with the new arguments overwriting
the old ones on the stack. In some cases it is important to make a function tail-recursive (e.g.
by adding another argument as we did in the factorial case) as the depth of recursion, and
hence stack size, may grow very large otherwise. However, in most cases this is not an issue,
and the clarity of a direct solution is to be preferred. Also, sometimes, a clever compiler may be
able to automatically convert a function into a more efficient tail-recursive version for us. So,
fortunately, it is not often necessary for programmers to have to think up obscure tail-recursive
definitions themselves.
   You may find it helpful to expand out the evaluation of fastfib and gcd from Practical 1
on some small examples to get a better feel for what the system is doing in these cases. Are
either of these functions tail-recursive?Proving properties by induction
Mathematical induction is used to prove that a property  (n) holds for all natural numbers
n. We prove that the base case holds, namely  (0) is true. We then prove that  (k) implies
 (k + 1) for all k. This is known as the inductive step. If we can prove both of these then we
can deduce that  (n) is true for all n. The inductive step often looks as if we are cheating.
However, think of it as a kind of macro. Suppose we want a proof that  (5) is true. We know
that  (0) is true from the base case. We can then "macro expand" our inductive proof step for
the case where k = 0 to get a proof that  (1) holds. Using the inductive step again, this time
with k = 1, gives us a proof that  (2) holds. We can obviously repeat this process until we get
a proof of  (5). Generalising this argument, we can expand out the inductive proof to produce
a non-inductive proof of  (n) for any given n.
   Because induction may seem a bit confusing at first sight, it is worth stating explicitly the
general pattern for proofs by induction. In general, to prove by induction that  (n) holds for
all natural numbers n, one must:   1. (Base case.) Prove that  (0) holds.
   2. (Step case.) Prove that, for all k,  (k) implies  (k + 1).
      To do this:
          * First, assume that  (k) holds (where k is a name chosen to stand for an arbitrary
             unknown number, it is irrelevant what name is used). This assumption is called the
             induction hypothesis.
          * Then prove that  (k+1) holds (using only known facts and the induction hypothesis).
This pattern will be followed closely in the example below.
    Mathematical induction is an extremely useful proof technique. In this course we will use it
for two main purposes: to prove the correctness of ML functions, and to verify the solution of
recurrence equations. Recurrence equations will appear later in the course. Here we illustrate
the use of mathematical induction in correctness proofs, by proving that the optimized factorial
function defined earlier in this note does indeed compute factorial.
    The first thing to do is to identify the property to prove by induction. The function fact is
itself defined in terms of the auxiliary function facti, by fact(n) = facti(n,1). So to prove
that fact is correct we must show that facti does the right job. In fact what we must do is
prove that, for all n and p, it holds that facti(n, p) = p   n! (n.b. the implicit bracketing here
is p   (n!)). This we do by mathematical induction on n. Thus the property,  (n), that we
prove by induction is: for all p, facti(n, p) = p   n!. As always, we must show that both the
base case and the step case hold.
   1. (Base case.) We must show that, for all p, it holds that facti(0, p) = p   0!.
      However, 0! = 1 so p   0! = p and indeed we have the definition facti(0, p) = p.
   2. (Step case.) Assume that, for all p, it holds that facti(k, p) = p k! (this is the induction
      hypothesis). We must show that for all p, facti(k + 1, p) = p   (k + 1)!.
      Take any p. By the definition of facti, we have that facti(k+1, p) =facti(k, (k+1) p).
      However, it follows from the induction hypothesis that facti(k, (k+1) p) = (k+1) p k!.1
      So far we have calculated facti(k + 1, p) = (k + 1)   p   k!. But clearly (k + 1)   p   k! =
      p   (k + 1)   k! = p   (k + 1)!, which is what we wanted to show.
We have proved that, for all n, p, it holds that facti(k, p) = p   k!. It is now immediate that
fact is correct, as we have fact(n) = facti(n,1) = 1   n! = n!.
    In one respect, the above example is quite tricky: some ingenuity is required in finding the
appropriate statement  (n) to prove by induction. In most examples you will encounter in this
course, the choice of  (n) will be more obvious. However, in every other way, the example is
typical. In particular, observe how closely the proof fits into the general template given earlier.
   1Note that it does not matter that we are using (k + 1)   p where p was written in the induction hypothesis,
because the induction hypothesis is stated for all values p. If you find this confusing then convince yourself that
we could have written the induction hypothesis equivalently as: for all q we have facti(k, q) = q   k!.
Complete induction
The form of induction used above is the most basic method for proving properties of functions
on natural numbers. However, it is often too basic, as it only applies to verify properties  
in which the definition of  (n) is in terms of  (n - 1) (with a separate case for 0). Often the
definition of  (n) is in terms of  (m) where m is smaller than n, but not necessarily exactly one
smaller. A little thought should convince you that the macro-expansion argument still works
if our inductive step only requires us to prove that  (k) holds under the stronger assumption
that  (m) is true for all m < k. Indeed in this version we don't need an explicit base case, as
instantiating k to 0 requires us to prove  (0) under the assumption that  (m) is true for all
m < 0, which is vacuous as there are no natural numbers less than zero. Therefore the case of
 (0) is proved as one instance of the inductive case.
   We refer to this variant of mathematical induction as complete induction. Thus to prove
by complete induction that  (n) holds for all natural numbers n, one proves that for all k, if,
for all m < k,  (m) holds then  (k) holds. A template for such proofs is:
   * First, assume that, for all m < k,  (m) holds (where k again stands for an arbitrary
      unknown number). This assumption is again called the induction hypothesis.
   * Then prove that  (k) holds (using only known facts and the induction hypothesis).
   Let's try using this technique on an ML example. Consider the fastpower example from
Practical 1:2      fun fastpower (x,0) = 1
       | fastpower (x,n) = if n mod 2 = 0
                            then fastpower (x*x, n div 2)
                            else x * fastpower (x*x, n div 2);
We use complete induction to prove that for every natural number n, it holds that, for all x,
fastpower(x, n) = xn. Complete induction is necessary as the recursive calls break down the
problem by dividing n by two, so the simple form of mathematical induction is not directly
applicable.
   To begin the proof, we assume the induction hypothesis: for all m < k, it holds that, for all
x, fastpower(x, m) = xm. We must show that, for all x, fastpower(x, k) = xk. Because there
are 3 clauses in the ML definition of fastpower(x, k), the proof now splits into 3 cases.
  1. If k = 0 we must prove that fastpower(x, 0) = x0. But x0 = 1 and fastpower(x, 0) = 1
      as required. (Note that using complete induction doesn't stop us considering the case
      where k = 0 separately from the case where k > 0.)
  2. If k > 0 and k is even. Then fastpower(x, k) = fastpower(x x, k/2). However k/2 < m,
      so, by the induction hypothesis, fastpower(x   x, k/2) = (x   x)k/2 = x2 k/2 = xk, as
      required.  3. If k is odd. Similar to the case for k even.
  2We ignore the problem of dealing with large integers.