Computers in Linguistics
Computational linguistics is a subfield of linguistics dealing with the capacity of computers to process human language. It studies the idea of automatic machine translation from one language to another, the development of computer languages, computer processing of texts, and the role of human language in creating artificial intelligence.
Text processing. Computers have been shown to be useful in detecting regularities about language. Text processing has been used to determine relative word and sound frequency. Comparisons of word frequency between written and spoken language have been made. The sum total of the texts thus used (whether written or spoken) is called a corpus. Computational data has also been used to determine authorship of anonymous texts (cf. the controversy surrounding authorship of the Russian novel Quiet Flows the Dawn).
Computers can also be used to produce a lexical cross reference to a literary text, called a concordance. There are large numbers of concordances to the Bible. There exists a large, 4-volume concordance to the works of the great 19th century Russian poet Pushkin. Computers can perform such tedious mechanical tasks quickly and accurately.
Talking computers. Some progress has been made in synthesizing human speech so that computers can recognize certain vocal signals. Once again, we are dealing with a one-to-one correspondence: the computer cannot recognize truly novel commands. Nor can it follow speech that is spoken with any slight deviation (with an accent, or fast speech ellipsis such as whatcha for what did you.) More success has come from attempts to record sounds using computers. A sound spectrograph can translate speech vibration of the air into a visual readout called a sound spectrogram. (See textbook p. 476.)
Computers have also been programmed to produce certain basic sounds. One phonetics computer program can reproduce all the sounds of the IPA alphabet with relative accuracy.
Automatic Machine Translation. First conceived of in the 1940's during
World War Two. The idea is to feed into the computer a passage written in a source language (language to be translated) and to receive a decoding in the target language (the language of the decoder). Huge efforts have been spent with little practical result. Limited word lists have been invented, with a one-to-one correspondence between source and target language.
It has been much more difficult to teach computers to generate meaningful language in a creative way. All computer "talking" is really only set responses to a limited set of stimuli (more like the discredited behaviorist view of language rather than true human language). Computer production of language is thus necessarily limited.
Computers have also been taught to parse sentences, that is, to break them down into syntactically well formed entities. Parsing computers have the same problem as humans when meeting ambiguous sentences.
Try to parse these sentences word by word:
The student forgot (that) the solution was in the back of the book.
Fat people eat accumulates.
(Example of sentence confused due to multiplicity of homonyms: Buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.
The buffalo that Buffalo buffalo buffalo also buffalo other Buffalo buffalo.
The buffalo that buffalo from the city of Buffalo deceive, deceive (other) buffalo from the city of Buffalo.
Some sentences are truly ambiguous in their structure and require real world knowledge or linguistic context to determine the intended meaning.
Flying planes can be dangerous
Your son has grown another foot.
Two cars reported stolen by the Bellingham police yesterday.
Tonight Dr. Ruth Westheimer discusses sex with David Letterman.
Mary had a little lamb (with mint sauce).
A computer would reveal both variants and yet would not have the common sense ability to choose the most likely meaning.
Even the most powerful text processing programs cannot easily distinguish between all likely and unlikely possible meanings. In fact, computers often come up with alternate (and unintended) meanings to sentences humans don't consider ambiguous, such as Time flies like an arrow.
A supercomputer generated these five meanings:
1) Time proceeds as quickly as an arrow proceeds--the intended meaning
2) Measure the speed of flies in the same way you measure the speed of an arrow.
3) Measure the speed of flies in the same way that an arrow measures the speed of flies.
4) Measure the speed of flies which look like an arrow.
5) Flies of a particular kind, time-flies, are fond of an arrow.
Humans, but not computers, can disregard absurd meanings without even being aware of them. This is because we humans bring an enormous amount of world knowledge to a conversation; this diverse knowledge allows us to speak cryptically or elliptically and yet still be understood.
Despite being programmed with enormous numbers of facts, rules and words, a computer would probably not come up with the intended meaning of many simple sentences.