 |
Problem
Solutions
Summary

| Problem |
|
Suppose all we know is how far apart the species are as measured by the number of differences in their characters, that is, the entries in the difference matrix. How can we reconstruct the phylogenetic tree?
First, we can convert the differences to times. The conversion is given by the formula

as explained in the page on mutations.
Of course, we might not be able to reconstruct the actual tree, since the mutations are random and need not reflect the actual distances between species. So a better question is: how can we reconstruct the most likely phylogenetic tree? This question could be made precise using statistical theory, but there is no guarantee that the problem is tractable. We can come up with some promising algorithms that should give trees that aren't too far away from the most likely tree.
|

| Solutions |
|
A solution: the "minimum" reconstruction method.
It seems reasonable that the two species that share the greatest number of characters are the most closely related. That is, the smallest entry in the mutation matrix indicates which two species diverged most recently. Also, the next smallest entry should indicate which two species diverged just before that. And so forth. That's the idea of the algorithm, but it needs a little clarification. Suppose species 1 and 2 are closest with 4 differences in their characters, and species 1 and 3 are next closest with 6 differences. Then since we conclude that species 1 and 2 diverged most recently, it won't be species 1 and 3 that diverged just before that, rather it will be the ancestral species of 1 and 2 that diverged from species 3 just before that.
Two other solutions: the "average" and "maximum" methods.
There are two algorithms that are very similar to the minimum reconstruction method. They start out exactly the same by joining the two species that share the most characters. To explain these methods, suppose that species 1 and 2 are closest. Name their ancestral species as species 6. With the minimum method, we effectively determined that the distance between species 6 and any other species such as species 3 was the minimum of the distance from 1 to 3 and the distance from 2 to 3. With the maximum method, instead take the distance from species 6 to species 3 to be the maximum of these two distances. And, of course, for the average method, take the average of those two distances.
These three algorithms go by various names. The one using averages is often called "weighted pair group method using arithmetic averages," WPGMS. (There is a closely related algorithm called the "unweighted pair group method using arithmetic averages," UPGMS.) The algorithm using minimum distance is called a "single linkage method," and the one using maximum distance is called a "complete linkage method." All three algorithms use a nearest-neighbor technique to construct a tree.
You can select which of these three reconstruction methods to use in the control panel. It's surprising how similar the trees resulting from these three algorithms are. You can see for yourself by selecting a different algorithm.
|

| Summary |
|
Here you have the actual phylogenetic tree, a distance matrix based on a set of mutations, and the reconstructed tree. When the two trees are isomorphic, then they will be painted all black. Differences are painted in red. Of course, some differences don't count. For instance, whether a descendant species is drawn to the left or the right is irrelevant. Distances are ignored, too.
A node (dot) in the actual tree corresponds to a node in the reconstructed tree if they have exactly the same set of extant species as descendants. Those are the nodes that are drawn in black. All others are drawn in red.
|

 |
|
|