Sunday, November 30, 2014

Outthinking Computers, or When Commonsense Trumps Grammar

The "standard interpretation" of the...
The "standard interpretation" of the Turing Test, in which player C, the interrogator, is tasked with trying to determine which player - A or B - is a computer and which is a human. The interrogator is limited to only using the responses to written questions in order to make the determination. (Photo credit: Wikipedia)

Ever since 1950, one of the most popular measuring sticks of artificial intelligence has been the Turing test — named after mathematician Alan Turing. The idea is that a program with some kind of artificial intelligence should be able to use text-based chatting to convince more than 30 percent of people that it's a human being. In June 2014, researchers claimed that a chatbot named Eugene Goostman did just that.
Nowadays, however, many experts are questioning whether the Turing test is really the best test. A computer tricking people into thinking that it's a 13-year-old is definitely an achievement — but it's not necessarily the ideal display of true, humanlike thought.
So what would be a better test for artificial intelligence? One front-runner is an exam that relies on common sense. Specifically the test is of something called Winograd schemas. Because Winograd schemas rely on cultural knowledge, they're super easy for people and difficult for computers.

How to test computers for common sense

The test would take the form of a multiple-choice quiz of reading comprehension. But the text itself would have some very specific features. It would consist of Winograd schemas: pairs of sentences whose intended meaning can be flipped by changing just one word. They generally involve unclear pronouns or possessives. A famous example comes from Stanford computer scientist Terry Winograd:
  • "The city councilmen refused the demonstrators a permit because they feared violence. Who feared violence?"1) The city councilmen2) The demonstrators
And:
  • "The city councilmen refused the demonstrators a permit because they advocated violence. Who advocated violence?"1) The city councilmen2) The demonstrators
Most human beings can easily answer these questions. We use our common sense to figure out what "they" is supposed to be referring to in each case. And that common sense basically involves a combination of extensive cultural background knowledge with analytical skills. (In the first question, we can deduce that the city councilmen feared violence. In the second, the demonstrators advocated violence.)
For computers, however, these questions can be quite difficult. From a grammatical standpoint, the "they" in the sentences is technically unclear. In both questions, "they" could be either the councilmen or the demonstrators.
A computer could have access to all of Google and still not really be able to grasp that city councilmen are probably less likely to advocate violence than demonstrators. It's simply less culturally appropriate for councilmen to do so. But you're not going to find that in the dictionary under "city councilmen."
Here's some more Winograd schemas, from a growing, open collection of more than 100:
  • The trophy doesn't fit into the brown suitcase because it's too [small/large]. What is too [small/large]?Answers: The suitcase/the trophy.

  • Jane gave Joan candy because she [was/wasn't] hungry. Who [was/wasn't] hungry?Answers: Joan/Jane.
  • The woman held the girl against her [chest/will]. Whose [chest/will]?Answers: The woman's/the girl's

No comments: