Indo-European Resources: the Comparative Method

I found the following off of a page here at UT (apparently written by one Kathleen Hubbard), which I think is a fairly concise and well-worded summary of the basis for Indo-Eurpean linguistics (and for that matter, linguistics in general). I reprint it here because of this and other factors.

Okay, in 1786 Sir William Jones announced to the Asiatick Society of Calcutta that Sanskrit had to be related to Greek and Latin, touching off what would come to be known as the Neogrammarian move from philology (the comparison of texts) to what we now consider linguistics.

If you were to see a whole huge raft of cognates like the following, you might come to the same conclusion (Avestan is an ancestor of Persian, it's the language of the Zoroastrian texts):

Sanskrit Avestan Greek Latin Gothic English [notes]
pita patr pater fadar father
padam poda pedem fotu foot
bhratar phrater frater broar brother // = Eng. "th"
bharami barami phero fero baira bear 'I carry'
jivah jivo wiwos qius quick 'living'
sanah hano henee senex sinista [senile] senile in Eng. is not tech. a cognate
virah viro wir wair were(wolf) 'man'
tris tres ri three
deka dekem taihun ten
satem he-katon kentum hund(ra) hundred

Now, cognates mean "pair/set of words descended from a common ancestor", not just words that happen to look like each other -- i.e. "coffee" is not a cognate of kaffe, kahawa, cafe, etc.; that's an instance of lots of borrowing of the same word by various languages. What we're talking about here are historically related words. When we know we've got cognates, we can talk about reconstruction.

Reconstruction revolves around the notion that sound change is mechanical and exceptionless. If a proto-/p/ becomes /f/ in a daughter language, it does so in regular fashion (that's the heuristic you have to use). If there are exceptions, there must be some other conditioning factor. Using this assumption, we can conclude that some common ancestor produced Sanskrit /bh/, Avestan /b/, Greek /ph/ (which is NOT /f/, it's aspirated /p/ at the stage we're talking about), Latin /f/, and Germanic /b/. Now the question is, what was that common ancestor?

The way we decide what segment must have been there in the proto- language involves things we know independently about how sounds behave, based partly on how sounds alternate synchronically in languages (i.e. rules that operate to change one sound to another in different contexts during a single stage of a language), partly on what we know about acoustics and articulation of speech sounds (which tells us what directionality is more or less likely), and partly on experience. Pure gold for the historical linguist is ATTESTED (written) ancient forms.

For instance, we know that the modern Romance languages (French, Italian, Spanish, Portuguese, Romansch, Rumanian, etc.) are descended from Latin. And we have lots of attested Latin to work with -- so we have clear, unambiguous examples of how some sound changes have worked. Likewise in other language families where ancient texts are preserved (i.e. ancient religious texts in Semitic etc.) So we have some real-life models on which to build our guesses.

So anyway, you reconstruct Proto-Indo-Iranian, and Proto-Germanic, and Proto-Balto-Slavic, and Proto-Celtic, and ultimately you have a pretty good idea of what -- on the basis of very rigorous analysis -- must have been the forms of certain words/roots in Proto-Indo-European, before it split up. Now, this method does NOT yield reliable results further back than about 10,000 years, because beyond that, too much change has occurred for there to be any recognizable remnants (that we can be sure about anyway) in attested languages. (Pace Greenberg et al. who get lots of popular press.

One real triumph of this method of reconstruction was the Laryngeal Hypothesis: it was known that there were some troublesome places in Indo-European where the sound changes seemed not to be behaving in their usual regular way; things were happening to vowels and sometimes consonants that couldn't be easily explained based on what we saw in the attested languages. Ferdinand de Saussure in the late 19th century said that there had to be a set of three segments in the proto-language that had not survived in any of the daughter languages -- he was fairly conservative about claiming what they must have been, but he called them laryngeals and pointed out the precise locations where they must have occurred. Many years later, when a bunch of texts in Turkey were finally decoded and we knew we were looking at the ancient Anatolian language Hittite, the oldest attested Indo-European language -- voila: there were the laryngeals, exactly where Saussure had predicted they must be just on the basis of careful reconstruction.

There are other wrinkles, like you can do internal reconstruction under some circumstances, and there are things other than sounds that point to common ancestry (morphology, syntax, etc.). And semantic change is a really neat thing to trace, though much slipperier than sound change. But the general answer to your question is, we know what we know about Proto-Indo-European because of the Comparative Method, which arose in the 19th century and gives us a rigorous way to compare sounds in daughter languages and determine what the antecedent sounds must have been.

Oh, and the PIE reconstructions for the above words are (always preceded by a star to show they're unattested, followed by a hyphen if they're roots that get suffixed, and with hedges if a vowel or something is uncertain -- consonants are much easier to reconstruct than vowels -- oh yes and @ stands for schwa here):

*p@ter-         father
*ped-           foot
*bhrater-       brother
*bher-          carry
*gwei-          live
*sen-           old
*wi-ro-         man     (derived from *wei@- vital force)
*trei-          three
*dekm-          ten
*dkm-tom-       hundred (derived from *dekm- ten)


"Cogito ergo sum, sed credo ergo ero."