|
How to create a language
|
This section will take some grammar issues and develop them, showing with examples, when possible, how natural languages manage them, and what can you do about them. You can't have a language without a grammar; if you don't think about it, you'll probably copy the structures of your own language, and the whole thing will be an exercise of translation of single words.
The classic cathegorization is that languages can be inflecting, agglutinating, or isolating. This cathegorization has proven to be too limited, but I'll explain it, because it's a good starting point to understand the differences.
An inflecting language uses inflections, which are affixes used, for example, to conjugate verbs, decline nouns and other tasks. Some languages use suffixes for this purposes, while other use prefixes or even infixes. Examples from English are the -s used for pluralizing names, the -ed used to form the past of regular verbs, and affixes like the negative in- or the adverbial -ly . Another type of inflection (and "purer", if you like) is the change of the root forms of words. Examples are the inflection of strong verbs of English, like sing-sang-sung, which are inflected forms of a root concept "sing". Inflection by vowel change is quite usual in certain languages. Consonant change does exist, but it's rarer. Curious examples in English are the pairs breath-breathe (changes voiceless to voiced th, besides vowel change), house (noun) vs. to house (verb) (same change). Inflection includes some other devices like changing tone, stress, pitch or length of a vowel, or repeating a part of the root. The main thing about inflections, however, is that an inflection can carry more than one meaning at the same time. For example, in Spanish viví "I lived", the inflection -í shows that the verb is in the past tense, first person singular, indicative mood. Examples of inflecting languages are English, Spanish, German, Latin, Greek, and in general all Indoeuropean languages.
An agglutinating language uses suffixes or prefixes whose meaning is unique, and which are concatenated one after another. Some known agglutinating languages are Quechua and many other American languages, Turkish, Finnish, and Hungarian. For example (supplied by Mark Rosenfelder), in the Quechua word wasikunapi "in the houses", the plural suffix -kuna is separate from the locative case suffix -pi. In Finnish, huoneissansakaan means "(not) even in their rooms", and it consists of five agglutinated morphemes, "room-s-in-their-even" (thanks Jarkko Hietaniemi for this one).
An isolating language doesn't use affixes or root modifications at all. Each word is invariable, and meanings have to be modified by inserting additional words, or understood by context. The best known example of isolating language is Chinese. In Chinese, a noun by itself is not singular, nor plural; and a verb has no tense or person; these distinctions are made by adding quantifiers, adverbs, or pronouns. In effect you say "books" by saying "several book".
The modern classification of language grammars is a continuous scale which goes from analytic to synthetic. The more analytic a language, the more meaningless the words by themselves, so as to say, and the more important is context. The more synthetic a language, the more self-contained the words. The most analytic languages rely on context and word order to show meanings, while the synthetic languages tend to inflect words.
The scale is meant to be taken as a reference; there are no extreme points, but you can compare two languages and say that one is more synthetic than the other. Chinese is very analytic; a Chinese word by itself can mean a lot of different things, because no distinctions are made in it: you don't know if it's a verb, a noun, an adjective, or if it's past tense or future, or plural, or singular, or anything, you only have the root concept. Some Native American languages like Nootka or Chinook are the other end, so synthetic that indeed they were called polysynthetic, inflecting words in such ways that a single word can mean "the many little fires been lit in the house in the past" (I'm not making this up; the word is inikwihl'minih'isit, and by the way, it's not properly a verb or a noun; it needs verbal or noun prefixes...). In the middle, we have Japanese (quite analytic except for verbs), English (quite analytic too, as it barely distinguishes noun case or verbal person), Spanish, French and Italian (of the ones I know a bit of), German (already with many inflections) and all the agglutinating languages, which are in fact a subset of inflecting languages, Latin, Greek, Sanskrit...
So you'll have to pick up a point in the scale and stay there. This is probably the most important decision in the process. Each kind of grammar has its own pros and cons.
There's another classification of languages, which is far more complex, and was created by Edward Sapir in the 1920s. This divides concepts into four classes:
I. Basic (concrete) concepts (objects, actions, qualities): normally expressed by independent words or radical elements; they don't include any kind of relationship with other words.
II. Derivative concepts (generally less concrete than those in group I): normally expressed by affixation of non-radical elements to radicals, o by internal modification inside these. They denote ideas that don't have to do with the proposition (sentence) itself, but give the radical element a certain particular twist of meaning and are therefore intimately related to it in a concrete fashion. For example, English prefixes pre-, for-, un- and suffixes -less, -ly.
III. Concrete relationship concepts (yet more abstract): normally expressed by affixation or internal modification, but commonly in a less intimate fashion than group-II elements. They indicate relationships that go beyond the word itself. For example, English -s for plural nouns.
IV. Pure relationship concepts (totally abstract): expressed by affixation or internal modification of radical elements, or by independent words, or by word order within the sentence. They connect the concrete elements of the proposition, giving them a definite syntactic form. For example, the modifications of English him, her from he, she indicating accusative case; the prepositions to, for; the position of "the dog" in I see the dog indicating that it's the object of the verb, etc.
The classification of languages according to these classes is as follows:
A. Languages which only express concepts of groups I and IV, so that they have no means of modifying the meaning of the radical element by means of affixes or internal changes. For example, Chinese.
B. Languages which express concepts of groups I, II and IV, preserving pure syntactic relationships and being able to modify the meaning of radical elements by affixation or internal change.
C. Languages which express concepts of groups I and III, where syntactic relationships are expressed in necessary connection to barely concrete concepts, but they can't change the radical elements by affixation or internal change.
D. Languages which express concepts of groups I, II and III, i. e. where syntactic relationships are expressed in mixed ways, like in C., and can also modify the meaning of radical elements by affixation or internal change. In this group belong most of the "flexional" languages with which we are familiar, as well as many "agglutinating" languages.
Each one of the groups A, B, C, D can be subdivided into agglutinating, fusional and symbolic. Agglutination means the things added to the radical element are just juxtaposed (put together); fusional means they are sometimes merged; symbolism roughly means internal change. Group A also has an isolating subtype.
The method (agglutinating, fusional, or symbolic) for a certain group of concepts needn't be identical to the method for a different group. The classification uses a compound term, the first part referring to the method for group II concepts, and the second part to concepts in groups III and IV. These methods are sometimes not alone; English uses them all. For example, goodness from good is agglutination; books from book is regular fusion, depth from deep is irregular fusion, and geese from goose is symbolic fusion or symbolism.
All this rant is just about one thing: you don't have to expect everything must be in its "proper" place in your language (the proper place being that of English). English number (singular vs. plural) is a group III concept, quite abstract and forming part of the very core of words; we can't conceive an English noun without number. In Tibetan, number is an optional feature and it's not grammaticalized as in English; it's not an abstract thing that belongs into the word, but a concrete thing: the idea of plurality, "several" or "many", is expressed by a radical element which is a separate full-fledged word, a group I concept. It's not syntactic and can therefore be omitted when not needed.
Think hard about this! After you place your language on the scale, you have to decide which word classes you'll use, and how they'll link to one another.