There have been high hopes for Natural Language Processing. Natural Language Processing, also known simply as NLP, is part of the broader field of Artificial Intelligence, the effort towards making machines think. Computers may appear intelligent as they crunch numbers and process information with blazing speed. In truth, computers are nothing but dumb slaves who only understand on or off and are limited to exact instructions. But since the invention of the computer, scientists have been attempting to make computers not only appear intelligent but be intelligent.
A truly intelligent computer would not be imited to rigid computer language commands, but instead be able to process and understand the English language. This is the concept behind Natural Language Processing. The phases a message would go through during NLP would consist of message, syntax, semantics, pragmatics, and intended meaning. (M. A. Fischer, 1987) Syntax is the grammatical structure. Semantics is the literal meaning. Pragmatics is world knowledge, knowledge of the context, and a model of the sender. When syntax, semantics, and pragmatics are applied, accurate Natural Language Processing will exist.
Alan Turing predicted of NLP in 1950 (Daniel Crevier, 1994, page 9): I believe that in about fifty years’ time it will be possible to program computers …. to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning. ” But in 1950, the current computer technology was limited. Because of these limitations, NLP programs of that day focused on exploiting the strengths the computers did have. For example, a program called SYNTHEX tried to determine the meaning of sentences by looking up each word in its encyclopedia. Another early approach was Noam Chomsky’s at MIT.
He believed that language could be analyzed without any reference to semantics or pragmatics, just by simply looking at the syntax. Both of these techniques did not work. Scientists realized that their Artificial Intelligence programs did not think like people do and since people are much more intelligent than those programs they decided to make their programs think more closely like a person would. So in the late 1950s, scientists shifted from trying to exploit the capabilities of computers to trying to emulate the human brain. (Daniel Crevier, 1994)
Ross Quillian at Carnegie Mellon wanted to try to program the ssociative aspects of human memory to create better NLP programs. (Daniel Crevier, 1994) Quillian’s idea was to determine the meaning of a word by the words around it. For example, look at these sentences: After the strike, the president sent him away. After the strike, the umpire sent him away. Even though these sentences are the same except for one word, they have very different meaning because of the meaning of the word “strike”. Quillian said the meaning of strike should be determined by looking at the subject. In the first sentence, the word “president” makes the word “strike” mean labor dispute.
In the second entence, the word “umpire” makes the word “strike” mean that a batter has swung at a baseball and missed. In 1958, Joseph Weizenbaum had a different approach to Artificial Intelligence, which he discusses in this quote (Daniel Crevier, 1994, page 133): “Around 1958, I published my first paper, in the commercial magazine Datamation. I had written a program that could play a game called “five in a row. ” It’s like ticktacktoe, except you need rows of five exes or noughts to win. It’s also played on an unbounded board; ordinary coordinate will do. The program used a ridiculously simple strategy with no look ahead, but it could beat anyone ho played at the same naive level.
Since most people had never played the game before, that included just about everybody. Significantly, the paper was entitled: “How to Make a Computer Appear Intelligent” with appear emphasized. In a way, that was a forerunner to my later ELIZA, to establish my status as a charlatan or con man. But the other side of the coin was that I freely started it. The idea was to create the powerful illusion that the computer was intelligent. I went to considerable trouble in the paper to explain that there wasn’t much behind the scenes, that the machine wasn’t thinking. I explained the trategy well enough that anybody could write that program, which is the same thing I did with ELIZA. ”
ELIZA was a program written by Joe Weizenbaum which communicated to its user while impersonating a psychotherapist. Weizenbaum wrote the program to demonstrate the tricky alternatives to having programs look at syntax, semantics, or pragmatics. One of ELIZA’s tricks was mirroring sentences. Another trick was to pick a sentence from earlier in the dialogue and return it attached to a leading phrase at random intervals Also, ELIZA would watch for a list of key words, transform it in some way, and return it attached to a leading sentence.
These tricks worked well under the context of a psychiatrist who encourages patients to talk about their problems and answers their questions with other questions. However, these same tricks do not work well in other situations. In 1970, William Wood, AI researcher at Bolt, Beranek, and Newman, described an NLP method called Augmented Transition Network. (Daniel Crevier, 1994) Their idea was to look at the case of the word: agent (instigator of an event), instrument (stimulus or immediate physical cause of an event), and experiencer (undergoes effect of the action).
To tell the case, Filmore put estrictions on the cases such as an agent had to be animate. For example, in “The heat is baking the cake”, cake is inanimate and therefor the experiencer. Heat would be the instrument. An ATN could mix syntax rules with semantic props such as knowing a cake is inanimate. This worked out better than any other NLP technique to date. ATNs are still used in most modern NLPs. Roger Schank, Stanford researcher (Daniel Crevier, 1994, page 167): “Our aim was to write programs that would concentrate on crucial differences in meaning, not on issues of grammatical structure …. We used hatever grammatical rules were necessary in our quest to extract meanings from sentences but, to our surprise, little grammar proved to be relevant for translating sentences into a system of conceptual representations. ”
Schank reduced all verbs to 11 basic acts. Some of them are ATRANS (to transfer an abstract relationship), PTRANS (to transfer the physical location of an object), PROPEL (to apply physical force to an object), MOVE (for its owner to move a body part), MTRANS (to transfer mental information), and MBUILD (to build new information out of old information). Schank called these basic acts semantic primitives.
When his program saw in a sentence words usually relating to the transfer of possession (such as give, buy, sell, donate, etc. ) it would search for the normal props of ATRANS: the object being transferred, its receiver and original owner, the means of transfer, and so on If the program didn’t find these props, it would try another possible meaning of the verb. After successfully determining the meaning of the verb, the program would make inferences associated with the semantic primitive. For example, an ATRANS rule might be that if someone gets something they want, they may be happy about it and may use it. (Daniel Crevier, 1994)
Schank implemented his idea of conceptual dependency in a program called MARGIE (memory, analysis, response generation in English. ) MARGIE was a program that analyzed English sentences, turned them into semantic representations, and generated inferences from them. Take for example: “John went to a restaurant. He ordered a hamburger. It was cold when the waitress brought it. He left her a very small tip. ” MARGIE didn’t work. Schank and his colleagues found that “any single sentence lends itself to so many plausible inferences that it was impossible to isolate those pertinent to the next sentence. ”
For example, from It was cold when the waitress brought it” MARGIE might say “The hamburger’s temperature was between 75 and 90 degrees, The waitress brought the hamburger on a plate, She put the plate on a table, etc. ” The inference that cold food makes people unhappy would be so far down the line that it wouldn’t be looked at and as a result MARGIE wouldn’t have understood the story well enough to answer the question, “Why did John leave a small tip? ” While MARGIE applied syntax and semantics well, it forgot about pragmatics. To solve this problem, Schank moved to Yale and teamed up with Professor of Psychology Robert Abelson.
They realized hat most of our everyday activities are linked together in chains which they called “scripts. ” (Daniel Crevier, 1994) In 1975, SAM (Script Applied Mechanism), written by Richard Cullingford, used an automobile accident script to make sense out of newspaper reports of them. SAM built internal representations of the articles using semantic primitives. SAM was the first working natural language processing program. SAM successfully went from message to intended meaning because it successfully implemented the steps in-between – syntax, semantics, and pragmatics. Despite the success of SAM, Schank said “real understanding requires the bility to establish connections between pieces of information for which no prescribed set of rules, or scripts, exist. ” (Daniel Crevier, 1994, page 167)
So Robert Wilensky created PAM (Plan Applier Mechanism). PAM interpreted stories by linking sentences together through a character’s goals and plans. Here is an example of PAM (Daniel Crevier, 1994): John wanted money. He got a gun and walked into a liquor store. He told the owner he wanted some money. The owner gave John the money and John left. In the process of understanding the story, PAM put itself in the shoes of the participants. From John’s point of view:
I needed to get some dough. So I got myself this gun, and I walked down to the liquor store. I told the shopkeeper that if he didn’t let me have the money then I would shoot him. So he handed it over. Then I left. From the store owner’s point of view: I was minding the store when a man entered. He threatened me with a gun and demanded all the cash receipts. Well, I didn’t want to get hurt so I gave him the money. Then he escaped. A new idea from MIT is to grab bits and parts of speech and ask for more details from the user to understand what it didn’t before and to understand better what it did before (G. McWilliams, 1993).
In IBM’s current NLP programs, instead of having rules for determining context and meaning, the program determines its own rules from the relationships between words in its input. For example, the program could add a new definition to the word “bad” once it realized that it is slang for “incredible. ” IBM also uses statistical probability to determine the meaning of a word. IBM’s NLP programs also use a sentence-charting technique. For example, charting the sentence “The boy has left” and storing the boy as a noun phrase allows the computer to see that the subject of a following sentence beginning with “He” as “the boy. “