Paris to Istanbul! (and back to somewhere in between): Month 13.2. Words count.

This is the first of two blog entries about the English language. This entry concerns a lexical characteristic of English. The second entry – which I'm particularly excited about – concerns a phonetic characteristic of English.

One of my trusty blog-followers was pleased to tell me an interesting assertion he read about English: it has by far the most words of any language (with something like 600 000 words, compared to German in second place with 250 000. Someone can correct me if I'm wrong on the exact numbers). I'm equally pleased to give a more detailed response at this time.

The first reason to cast doubt on the assertion is methodological. It's possible that the author of the assertion derived his facts by counting the entries in various dictionaries. One problem immediately arises that many languages simply don't have dictionaries. Another problem is that standard English dictionaries are the result of centuries of compilation, whereas the dictionaries of other languages may be relatively new. Therefore, much more research has been done on "finding" words in English than in other languages, and consequently many more words have been found. It also follows that many archaic words are listed alongside modern variants in English dictionaries. More recent language dictionaries stick to words currently in use.

(Aside: Yes, people actually had to do research finding words in order to compile the first English dictionary. A great read about the creation of the English dictionary is "The Professor and the Madman.")

But suppose our friendly asserter got his data online. There are plenty of sites that track "new words" in English. Add these up and bingo! Tons of words to add to the English dictionary. Like "Belieber" and "screenager." The question any shrewd reader should now ask is, "Well so how does something become a 'new word'?" The answer is totally arbitrary: if it has been used at least x number of times by separate writers. That's it. (I can't remember what the number is, 25 or 82 or 103. The point is that it's arbitrary, and actually pretty low).

I can think of three problems with this arbitrary criterion for online word-counting. Each problem may lead to the inclusion of more words in the English lexicon than there ought to be. First, many coinages vanish as soon as their referents do. (At least I hope that Justin Bieber will vanish soon.) So it's not clear whether a word that was used x times last year, but x – 1 times this year, should still be counted as a word. Second, many coinages are technical jargon. So if cyclomethicone, the second ingredient in my facial cream, is not a legitimate word in English, than neither should "xenozoonosis" be just because x many people used it on the internet. Third, many of these coinages result from incorrect usage of a "real" English word. So just because a bunch of people have mistakenly used "irregardless" to mean "regardless," does "irregardless" become a legitimate word? Whether we count words online or in dictionaries, there is no uncontroversial methodology for summing up the English lexicon.

This brings me to the second reason for doubting that English has more words than any other language: there is no clear way to define the word "word." The problem is morphological. Morphology is the branch of linguistics that studies how meaning is built up from units of speech. A morpheme is a chunk of speech that bears meaning. Take the word "count." It consists of a single morpheme meaning "add up." Now we can add the morpheme –able which says, "It is possible to execute the action denoted by verb I attach to." We get "countable." Now we can add the morpheme un- which negates the adjective it precedes. We get "uncountable." And so on.

English isn't particularly agglutinative. That means that one word usually corresponds to one morpheme (e.g. all the words in "He is happy to be here" are monomorphemic). But some languages can create huge words just by attaching lots of morphemes together. Famous is the German Donaudampfschifffärtsgesellschaftskapitän which breaks down to the noun "Captain-of-the-society-for-the-passage-of-cargo-ships-on-the-Danube." And that's just a noun. In other languages, whole sentences are piled into a single word. So we can make really long words by adding a lot of morphemes together.

It turns out that infinitely many sentences can be constructed in any sufficiently rich language (I actually had to prove this in my intermediate logic class – there are orders of infinity, and the number of possible sentences corresponds to one of the orders of infinity). For our purposes, all human languages are "sufficiently rich." So there are infinitely many possible sentences in each language. Here's the key deduction: if there are languages in which words can be sentences, and all languages have infinitely many possible sentences, then there are languages with infinitely many possible words. So there, English.

I know what you're going to say. If we wrote English in such a way as to make words out of sentences, then English would have infinitely many words too. E.g. Asentencewouldlooklikethis – just one word. Now write every sentence as just one word, and suddenly we have a bazillion more words in English. It's not so easy. There are reasons why "word" and "sentence" are separate notions. Linguists have identified many criteria that distinguish them.

One of the main criteria for distinguishing words from sentences is their prosody. Individual words have particular stress patterns. Each language has a different way of assigning stress to words. For example in French, stress falls on the last syllable of a word. Within any particular language, the rules governing the intonation of whole sentences are different from the rules governing the intonation of a single word. For example, the same sentence can be a question or a statement in English ("You're coming home" vs. "You're coming home?"). But that doesn't mean that every word within a question has the same rising pitch as the overall question. So, words and sentences have different prosodies. If, in a language, a particular sentence has all the same prosody patterns as a word, we can say that the sentence is actually a single word.

There are additional criteria for distinguishing sentences from words. Sentences can be broken up into morphemes that can stand on their own (e.g. The word "go" of the sentence "Go home!" could stand alone). Words cannot be broken up into morphemes (meaningful parts) that stand alone. That's why un- isn't a word, even if it bears some kind of meaning. So, words and sentences contain different types of morphemes. If, in a language, a sentence cannot be broken up into smaller parts that can stand on their own, then that sentence is a word.

What I've hoped to demonstrate is that the definition of a "word" is a complex issue. Whether something is a word depends not just on its usage, but on a variety of linguistic "tests" (such as prosody and ability to be broken up into free-standing morphemes). Given these linguistic tests, there are plenty of "words" that cannot simply be looked up in the dictionary. There are languages that form new words every time a sentence is spoken. The grand conclusion is that we should be skeptical when anyone tries to "count" the number of words in a language, which misleadingly suggests that some languages are richer than others.

(An addendum might help to clarify the intuition that English does have a huge vocabulary. The reason is that English is a Germanic language, but received many loans from French after the Norman invasion in 1066. The result is that for most words with Germanic origins, there is a word having the same meaning but with Latin origins. E.g. dumb (G) – stupid (L), wonderful (G) – excellent (L), hunger (G) – famine (L), speech (G) – language (L).)

Paris to Istanbul! (and back to somewhere in between)

Info on some references I make:

Thursday, January 13, 2011

Month 13.2. Words count.

No comments:

Post a Comment

Followers

Blog Archive

About Me