|
How to create a language
|
English (and many other Western languages) has a set of word classes composed of verbs, nouns, pronouns, adjectives, articles, conjunctions, adverbs and perhaps some strange ill-defined classes. These classes are not at all universal. Many languages do not have articles. Some almost don't use adjectives (except for a few pairs like 'big/small', etc.). The only distinction that looks impossible to ignore, the verb-noun distinction, is not morphologically marked in some languages, where a certain form can be any of the two, undefined, and still be used as a full word. (The English ride, walk, lunch and many others used as nouns and verbs are not in the same situation; they are either a verb or a noun, although you can't tell them apart if they're alone.)
So let's see what you can do about word classes.
A noun is difficult to define, but you know one when you see it. Nouns can be almost invariable words, as in Chinese or Japanese, or they can be heavily inflected. Languages can make distinctions in nouns by:
Some distinction that languages make in verbs are:
Some very common verbs in English aren't found in other languages, like 'to have'. Many languages rephrase 'I have a book' by 'A book is to me', or 'with me' or something to that effect.
Plus, some verbs can be used as grammatical words beyond their original status. For example, I'm told that in Khmer you use the verb 'to give' as the preposition 'to', to mark the indirect object of verbs. And in Ainu, the conjugated forms of the verb 'to have' are used as possessive marks. For example [thanks Mathias Lasailly for this one]:
(note the 1st person singular prefixkukor kunupe kunukar rusuy 1s.have 1s.brother 1s.see want 'I want to see my brother'
1s
is placed before verbs and nouns). Given this, it's not impossible to think of a language where possessive pronouns don't exist, nor are they formed from personal pronouns, but are instead subordinate clauses, consisting of conjugated forms of 'to have'.
With adjectives, we enter the land of possibilities. You can choose to have adjectives (as a separate word class), or not. Adjectives can be an entirely different word class, as in English; or they can be a subset of nouns (considering morphology and behaviour), as in Spanish or Latin; or they can behave like verbs (as some do in Japanese). Let's examine these alternatives.
If adjectives are a completely different word class, then they don't have to behave like anything else; they can have their own rules of inflection, or not inflect at all. English adjectives are an example of this: they are invariable words (except for the comparative and superlative forms).
If adjectives are like nouns, or a subset of nouns, then they behave like nouns. In Spanish, where nouns have gender and number, adjectives have them too, and they must agree with their head noun. Sometimes they can become nouns without any change; rojas means both 'red' (feminine and plural) and 'red ones' (when preceded by an article). Curiously, nouns can become adjectives, in colloquial sentences like ¡Es tan payaso! 'He's so (much of a) clown!'. In Latin, adjectives agree with their head noun even in case. But the distinction between nouns and adjectives is usually well-defined in these languages; some other languages may choose not to make it.
In Japanese, adjectives of a particular class (na-adjectives) behave like nouns; they are placed before the noun they modify, followed by na, which is the relative form of the copula 'to be'. For example: kirei na kimono 'beautiful kimono' -- the nominal adjective (or qualitative noun, as some people call it) kirei means 'beauty' or 'beautiful', and the phrase could be translated as 'kimono which is beautiful / which has beauty'. You can add tense to the adjective by marking tense on the copula: kirei datta kimono 'kimono which was beautiful'.
If adjectives are like verbs, then they conjugate like verbs. Another class of Japanese adjectives (i-adjectives, because they end in -i) work this way; adjectives are usually a kind of participial form of verbs, or a single-word relative clause (relative clauses in Japanese come before the noun phrase they modify, the same as adjectives and demonstratives do). You can think of Japanese adjectives as a combination of an English adjective + the copula 'to be', though Japanese adjectives can and do take the copula sometimes. But the tense is still on the adjective, not on the copula. For example [thanks Donald Patrick Michael Goodman III for this one, and "Reena D." for the correction]: Kakkoii desu 'He is cute' (polite form); Kakkoikatta desu 'He was cute'. Here kakkoi- is the root, while -i is the suffix for adjectives in present tense, -katta is for past tense, and desu is the polite present tense form of the copula. As you see, the tense in this class goes directly on the adjective, not on the copula, which can be omitted sometimes.
In my own language Draseléq, adjectives do not exist as such. There are verbs that mean 'to be big', 'to be yellow', and even 'to be four'. You say 'a tall tree' by saying 'talling/talled tree', using a short participle. You say 'the tree is tall' by using the third person singular present tense of the verb 'to be tall' with 'the tree' as the subject: 'the tree **talls'. The best thing about this is that you merge two word classes into one, and you can use whatever devices you invented for one on the other. In Draseléq, you can express the equivalent of 'make/cause to be four' in one word.
Many adjectives may not exist at all in any form (although every language has some words that act like adjectives). The ideas of qualifying can be expressed in other ways. Tibetan uses abstract nouns instead of adjectives; you don't have the adjective 'large', but the noun 'magnitude, largeness', and you can express 'a large room' by saying 'a room of magnitude'. This is not ridiculous in English. 'A room of magnitude' is rare but possible, and 'a disaster of biblical proportions' (which follows the same structure) is common.
Nik Taylor informs that in some languages, the adjectives form a closed word class; there are a certain number of them (pairs like 'big/small', and the colours) and others can't be formed.
If you have a morphologically separate word class for adjectives, you should also invent some affixes to colour their meaning, to negate them, and to transform them into other word classes. Also think of comparatives and superlatives. It's not an obligation to have them, but a language should be able to express such ideas as something being taller, or redder, or uglier, than something else.
As an extra, you can read a compilation of a thread in the Conlang list, started by a question by Fredrik Ekman: are there languages without adjectives?
Conjunctions are words which put together different parts of a sentence. English common conjunctions are and, or, if, but, etc. Conjunctions can be present or not. It's possible to include some distinctions in conjunctions which aren't made in English; for example, the difference between exclusive and inclusive or. In Latin, you can say vel X vel Y (X or Y, or both) or aut X aut Y (X or Y, but not both). Conjunctions can be sometimes transformed into other things; in Latin, while you have et 'and', you can also use a postposed particle -que to join two nouns: Senatus Populusque Romae 'the Senate and the People of Rome'. Some languages do not have conjunctions at all; they simply put things together. 'X Y' (perhaps with a pause between them) means 'X and Y' (or even 'X or Y', depending on intonation and context). You can also use a case ending to join things, saying 'X together-with-Y' for 'X and Y'. Or you can replace conjunctions by adverbs: 'I tried but I couldn't' gives 'I tried, however, I couldn't'.
Do you have articles? English has two, a and the. Spanish has four, two indefinite and two definite ones; two are feminine and two are masculine. If your language has grammatical gender, then perhaps the articles should agree with their nouns. In Greek, articles agree not only in gender, but also in number and case, with their head noun. Scandinavian languages place the articles at the end of words, attached to them as inflections (for example, in Swedish en bok 'a book', boken 'the book', böcker 'books', böckerna 'the books'). Many languages do not have articles. In most cases, you can paraphrase articles by using adjectives, quantifiers (like some, all), or demonstratives (that, this). Articles are often unstressed and joined to the following words, perhaps with elision of vowels and other simplifications. In French, you say la voiture 'the car' but l'avion 'the plane'. In Italian and Portuguese, the articles are joined to whatever particle is in their way.
What else? Particles in general are little words that modify the meaning of other words, or the sentence. They are impossible to characterize more clearly. Japanese has lots of particles, which are used to mark nouns (the same function as case endings), or to add emphasis. For example anata no 'your' uses the genitive particle no; the particle wa signals a new subject or new information in the sentence, which will be omitted and understood in the next sentences. There's even an 'exclamation particle', yo, used to add force to statements; and an 'interrogative particle', ka, which signals a question (taberu ka 'shall we eat?').
Prepositions deserve to be named as different kind of particles, perhaps, but I won't go further. A language can have prepositions or postpositions, or neither. Whether a language is pre- or postpositional depends mainly on the position of the parts of speech (especially the verb arguments) in a sentence. As a general rule, SOV languages are postpositional, and VSO languages are prepositional; SVO languages can go either way (see Word order below).
The most common adpositions can be adequately replaced by case, and perhaps adverbs. Japanese shows many relationships with postposed particles which don't have a real meaning, but only functions, and in that way differ from what we would call postpositions. In some cases, when it needs to use the equivalent to a prepositional statement, it uses two nouns joined by the genitive particle: heya no naka '[the] room (genitive) interior-side', 'the room's inside, inside the room'. So in fact some of our prepositions are rendered by nouns.
The various components of a sentence often appear in a fixed order. The more analytic the language, generally the more fixed the word order is. Chinese sentences are so strictly ordered that the misplacement of any word can alter the meaning completely. The more synthetic the language, probably the freer the word order, because synthetic, very inflected words, can stand on their own, and they don't depend so much on context. For example, in Latin Petrus amat Paulum 'Peter loves Paul', the subject and the object are perfectly determined by case endings, and their place can be changed with no change of the meaning of the phrase: you can say Paulum Petrus amat or amat Petrus Paulum and it's OK. But in English, 'Peter loves Paul' and 'Paul loves Peter' mean different things, because word order serves the function of distinguishing subject and object; and 'loves Peter Paul' or 'Paul Peter loves' are impossible or ridiculous.
A synthetic language may have a free word order not only by resorting to case endings. It's been pointed to me that also other grammatical devices such as agreement (between verbs and nouns, nouns and adjectives, etc.) may serve this purpose.
Your language might have a strict order, or a free one. But whatever it is, you must decide what it will be like. The main structure of a complete sentence includes subject, object, and verb. These can of course be ordered in only six different ways: SVO, SOV, VSO, OVS, OSV, VOS. English almost ever uses SVO, although sometimes it lets out an OSV (in sentences like 'this I don't know' or 'to thee I will sing'). Spanish is a bit more loose: usually SVO, VSO as an alternative for most verbs, SOV or OVS when the object is a pronoun, etc. Perhaps certain verbs of your language can use one form, and others use a different one; or perhaps you could use one form for short sentences and another one for longer complex sentences. But there is always an unmarked word order, that is, a particular order that doesn't convey any extra information (such as emphasis), and is therefore 'neutral' for the hearer. For example, English unmarked word order is SVO. The examples of OVS order I gave are marked; they make you focus on the object.
Some orders are more common than others. According to surveys, SVO and SOV languages each comprise about 40% of the world's languages. VSO languages are relatively frequent too, 15%. The other word orders (where the object is before the subject) comprise about 5%. So if your language is intended to be average, use SVO or SOV; if you want it to be exotic and weird, try OVS, OSV or VOS.
Every part of the sentence can be divided into a head and zero or more modifiers. In a noun phrase like 'the little red cottage', the head is 'cottage' and the modifiers are the article and the two adjectives. Your language should have a fixed order for these too. English generally places modifiers before the head here (it's head-final). Spanish places articles before nouns, but adjectives generally after them. English places adverbs before verbs, but adverbial phrases (such as 'in the park') after the verb. Japanese places everything before the corresponding heads, even subordinate clauses ('the man that she saw' is '(that) she saw the man'; the subordinate clause acts as an adjective.
There are general tendencies about word order and other features of languages. For example, SOV languages almost always place modifiers (for example, adjectives) before heads (for example, nouns), and use postpositions, while VSO languages tend to do the opposite (heads first, and prepositions). SVO languages can go either way. SOV languages usually mark the subject somehow, since it could get confused with the object that follows; SVO languages don't need that marking (though many of them use it), because the verb itself separates subject and object.
In V2 languages, there is room for one and only one constituent before the verb. If something has to be emphasized, it usually comes to the front of the sentence (this is called focus fronting and happens in many languages). If the language is V2, however, this means that something else will have to move to the other side of the verb. For example, in German you can say (the verb, or actually the auxiliary, since the complete verb phrase is hat geschenkt, is in UPPERCASE):
Zum Geburtstag HAT sie ihm ein Buch geschenkt. For his birthday she has given him a book. (lit. 'For his birthday has she him a book given.') Ein Buch HAT sie ihm zum Geburtstag geschenkt. She has given him a book for his birthday. (lit. 'A book has she him for his birthday given.') Geschenkt HAT sie ihm zum Geburtstag ein Buch. She has given him a book for his birthday. (lit. 'Given has she him for his birthday a book.')
Of course, German has case, so the subject and objects don't get so confused as in the English literal gloss.
English is a Germanic language too, and though it has lost V2 compulsory order, it has kept some traces. You can see it in the way questions are asked (*'Who you saw?' is 'Who did you see?' because the auxiliary occupies the second position), in the use of auxiliaries in general, in phrases like 'There is', 'Here is', etc., and notably in seemingly 'inverted' sentences like 'Never had I seen such thing'.
This topic is maybe a bit outside the idea of this section, but I felt it was worth including. The word order classification of which I've been talking presume that there will be a subject, a verb and an object, and that they'll be differentiable by the word order itself and/or by case marks.
There's a different system, which is used in Malagasy and most Filipino languages, like Tagalog, in which subject, object and other modifiers may appear in different orders, and they're not marked in traditional ways. It's called a trigger system.
The trigger is the part of the sentence over which emphasis is placed (I'd call it the topic, but I'm not so sure about this). The trigger can be the 'subject' of the sentence according to our view, but also the object, or a location, or the verb (predicate) itself. The trigger is marked as such (by a particle or inflection, or by word order), but you only state 'this is the trigger', not its function. Other parts of the sentence are marked differently. Then the verb is marked to show the relationship of the action to the trigger. The 'case' of the trigger is not marked on the trigger but on the verb.
In order to illustrate this, I'll just transcribe part of a post to the Conlang list, by Kristian Jensen, who was kind enough to repost it when I asked for an explanation about the subject. Here it is:
In Tagalog, there are only three markings for case: the Trigger, the Genitive, and the Oblique. This is exactly like most (if not all) the Philippine languages. Furthermore, much like many Western Austronesian languages, there are a large inventory of affixes used to create different nuances in the verbs, noteably the verbal trigger. When the trigger plays the role of the agent, an agent-trigger affix is used with the verb. When the trigger plays the role of the patient, a patient-trigger affix is used with the verb. When the trigger plays the role of location, then a location-trigger affix is used with the verb. Etc. etc., etc... A particularly noteworthy feature of this system is that non-triggered (unfocused) core arguments are marked as the genitive. As a result, "I am buying" and "the buying (of something) of mine" (or "my buying (of something)") have identical structures. Verbal constructions appear to be identical with nominal constructions by the use genitives. One theory has it that the verbal affixes are actually nominalizing affixes. Examples always help. Take the sentence "The man cut some wood in the forest". With three different arguments, three trigger forms are possible. Below are parsing examples of the way a Filipino language would translate the sentence. I have refrained from using real language examples at this point hoping that it would be easier to understand how the _grammatical system_ (_not_ the morphological system) works.: AGENT Trigger: AT-cut GEN-wood OBL-forest TRG-man "[cutting-agent] [of wood] [at forest] = [man]" lit.: "The wood's cutter in the forest is the man" transl.: "The man, he cut some wood in the forest" PATIENT Trigger: PT-cut GEN-man OBL-forest TRG-wood "[cutting-patient] [of man] [at forest] = [wood]" lit.: "The man's cutting-patient in the forest is the wood" transl.: "The wood, the man cut it in the forest" LOCATION Trigger: LT-cut GEN-man GEN-wood TRG-forest "[cutting-location] [of man] [of wood] = [forest]" lit.:"The man's cutting-location of wood is the forest" transl.: "The forest, the man cut some wood in it" Note how I have nominalized the verbs in the transcription. Thus, the verb for cutting has been nominalized as an agent, a patient, or a location depending on what role the trigger plays. There are other verbal trigger forms too including benefactor and instrument. My own theory is that trigger languages only have one core argument. Such being the case, trigger languages resort to nominalizing verbs. This might also explain why passive constructions do not exist in trigger languages since the valency of the verb is not changed (cannot change) with different triggers.
In a language using a trigger system, it's not useful to talk about subject, object, etc., and word order may greatly vary. In Tagalog, the predicate (the nominalized verb) is the first word in the sentence, and the trigger is last. Other languages might be different. It's equally useless to talk of transitive or intransitive verbs, or of voice (active, passive, middle).
This is just to show you how things can be really different, and still understandable. See if you can imagine something else!