Trapped by the Box: A Chatbot spouts rubbish to the media about the Turing Test

Alan Turing has set a hard test
To find out which computer chats best
About speedboats or fiddles
Or solving hard riddles
And fooling a real human guest

At the end of last week the media were full of stories of how Eugene Goostman, a guinea pig-owning, 13-year-old boy living in Odessa, Ukraine, (actually a computer program) had passed the Turing Test. I suspected it was another case of academic hype relating to those magic words "Artificial Intelligence" fooling the press and waited a few days for the reaction. As I expected the criticisms soon appeared - such as Celeste Biever writing in the New Scientist, the report That Computer Actually Got an F on the Turing Test on Wired, and Mike Masnick, who tears the whole publicity stunt to ribbons. Among many well-directed comments he says.

The whole concept of the Turing Test itself is kind of a joke. While it's fun to think about, creating a chatbot that can fool humans is not really the same thing as creating artificial intelligence. Many in the AI world look on the Turing Test as a needless distraction.

The Turing Test - Picture from 1clicknews

It helps to understand that Eugene Goostman is one of hundreds of bastard progrm descendants of Eliza, and early chat-bot program which attempted to carry out a simple conversation with a human being via a simple teletype. The "rules" in writing a chatbot seems to be that you keep on adding ad hoc rules to try and conceal the fact that your program does not really understand the human, and that the program writer does not understand how the human brain actually works. If all else fails the program takes approach that politicians use when asked an embarrassing question - you repeatedly try to change the subject of the conversation. The difference is that a politician knows what the answer is but does not want to admit to the truth, the chatbot doesn't want to revel that the human input means nothing to it - and is trying to evade revealing that it is merely a stupid computer.

It is worth looking at a hypothetical visual version of an interrogation test, as it might have been carried out a few decades ago. Via a normal TV screen a human is shown views of different places around the world. Some are real photographs and others are computer generated images, and the aim is to see if the computer can generate images from a general verbal description and fool the human into thinking the computer images were taken by a camera. The person programming the computer knows that the task will be difficult and narrows the scope of the test - for instance by artificially confining the views of the Rocky Mountains. He does this because he knows all about the mathematics of fractals - which can be useful in generating variations in landforms - and knows that it is easier to computer produce realistic images of forests of conifers - rather than mixed deciduous woodland. Because at least some of the landscapes should include signs of human activity special code is added to generate log canons, roads wandering along valley bottoms, and even tiny images of cars on those roads.

While such computer graphic techniques, when fully developed, have proved very valuable in constructing realistic sets in the film industry the guessing game tells us nothing about how camera lenses and photographic films combine to produce the "normal" images. Similar limitations apply to Eliza-style computer programs such as Eugene, which actually say very little about haw the brain works. Perhaps 40 or 50 years ago some people honestly thought that by writing such programs would tell use something about how humans think. Since then many millions of man hours by academics and students in university departments have written chatbot descendants of the Eliza program usually driven by a competitive urge to write something better than their rivals. The comparative lack of progress (especially if the effects of the increases in raw computer power are discounted) has clearly demonstrated that the Eliza approach is a blind alley as far as understanding human intelligence is concerned.

Alan Turing Statue at Bletchley [From Geograph]

Of course it was interesting that the Royal Society ran a competition to help remember the 60th anniversary of Alan Turing's death, but the competition has really demonstrated is that human brains (and not just media reporters) are not very good at critical thinking and will happily accept the hype of a well presented public relations story without any understanding of the real science that lies behind it.

THis problem kis nothing new. One can criticise many of the early Artificial Intelligence researchers as being more interested following an academic career based on playing games than in understanding the real world problems that a human brain has to deal with. In an ideal world scientists should be free to criticise weaknesses in other people's research, and to accept such criticism of one's own research in good faith. However this is not an ideal world and it may be that my failure to get research grants and papers published in the 1970's was because my views on the establishment research meant that doors were being slammed in my face.

At the time much of the research was into game playing (especially chess), solving formal logical puzzles, and writing Eliza style packages which generated text which superficially resembled natural language. My research had a very different background. The trigger was a study of a massive commercial sales accounting system where the trading rules were always changing due to a range of market forces, and where there needed to be really good two-way communication between the computer and the human user. I realised that the approach could be generalized to handle a wide range of real world tasks where it would be useful to have a system that could work symbiotically with the user. In the early 1970's my ideas were still embryonic and, fooled by the hype, I initially ignored Artificial Intelligence research on the grounds that it was tackling a different kind of difficult problem. When I mentioned this to a colleague he said that once you got under the glossy cover most A. I. research was trivial compared with what I was trying to do, and he loaned my a copy of a newly published Ph.D. thesis on the subject.

Perhaps I should not have been surprised with what I found. My research involved tasks involving many and dynamically changing rules, with data which could be incomplete or poorly defined, and when there might be no immediate answer, or many. The Ph.D. thesis looked at formal logic problems which could be characterised by a fixed number of well defined rules, well ordered data, and a guaranteed single solution - in effect a trivial subset of what I was trying to do. Having spent the weekend reading the thesis I made a couple of minor tweaks to my research software and got it to solve most of the problems in the thesis. While my approach was basically a pattern recognition one, I found it was possible to morph it into a powerful language for processing patterns - and the problem solving package TANTALIZE was the result. This was used to solve 15 consecutive Tantalizer problems as they were published weekly in the New Scientist, and also many of the similar problems in the A.I. literature.

There was only one problem - peer review! Papers which included descriptions of my system processing logical problems (including copy listings and timings) were rejected with a bald "too theoretical - will never work". On one or two occasions I was told that I couldn't expect a paper to be published if I used used CODIL [which is a pattern-recognition language which doesn't distinguish between program and data] because all papers on A.I. had to be written in pop-2 [a conventional rule-based programming language popular in the leading A.I. departments in UK at the time]. Finally a paper aimed at a top American journal came back with a rejection slip and four reviews. One was favourable, one reviewer admitted he didn't understand what I was doing, and two were about as insulting as an anonymous critical review can be. By this stage I was so depressed I decided to abandon all A.I. research, and switch to other application areas ofn CODIL. It was only some years later that I read the end of the coveringrejection letter. The editor (who would have known who had written the reviews) ended by urging me to continue the work because he thought there was something in it to annoy the reviewers so much!

Wednesday 11 June 2014

A Chatbot spouts rubbish to the media about the Turing Test

No comments:

Post a Comment