A Voice for Every Thing

A Technology Review article writes: “Until recently, the idea of holding a conversation with a computer seemed pure science fiction. If you asked a computer to “open the pod bay doors”—well, that was only in movies.”

Having taken a course in natural language processing, the advances that are reported here make perfect sense to me. In a few years we may have a spate of books about the social effects of “machines who speak,” just as we have been inundated with books about the effects of connectedness, the Net, or Facebook.

Language will be the third step forward in the last half-century of human-machine interaction. The first stage was the command line, where anyone who wanted to get a computer to do something had to learn a complex ornery language, full of strange loops, if-then statements, go-to commands –  and then painstakingly debug every line until it ran correctly. The second phase was the graphical user interface, which launched during the nineties. That brought computers out of laboratories and into everyday life, through the use of clickable icons and menus – which still require a significant learning period, but nothing like the arcana of the command line.

The third phase, however, is going to blast us out of the water. Green screens and icons are mysterious; language is our native land. The spoken word is natural; every person is immersed in language from birth; we learn it before we learn to walk; and it underlies every aspect of culture, from poetry to diplomacy, scholarship to puns. As our devices slowly become capable of listening to us (and responding in kind), everything will change.

The article notes that one word can accomplish wonders (“a single spoken command can accomplish tasks that would normally require a multitude of swipes and presses.”) Underlying this optimistic (but realistic) statement is a deep science in natural language that has become practical only in the last few years – the result of a few key insights, success in probabilistic methods, and the availability of enormous  amounts of machine-readable text. (“As MIT’s Glass says, “there has been a long-time saying in the speech-recognition community: ‘There’s no data like more data’.”)

Machines who speak will muddy the social waters. Why? We learn language from every conversation, article, sign, book, advertisement, commercial. As things begin to speak, they will have a grammar all their own; they will shape our understanding and use of language as much as the words we read and hear from other sources.

In fact, they have already started swimming about in the world’s language stream, along with several billion people who use English in similar non-standard fashion. There are about ten million SIRI’s out there who people are speaking to – in effect, millions of times a day people are listening to and imbibing (learning) its form of English. Just as a billion are mingling and mangling English and Chinese, and another billion doing so with English and Hindi, we now have English/SIRIan (Singlish?) entering the mainstream. It may be rivulet at the moment, but the ocean of language can expand to take all comers.

Since the new language technology (like SIRI) is cloud-based, many of our conversations with machines will go straight into a corporate server, where it can be parsed, analysed, and used to improve the system. But it may also be mined for other purposes – one reason that IBM recently banned the use of SIRI in its company, to prevent the inadvertent leaking of secrets. Loose lips are more likely to sink corporate battleships, when every word is recorded and available to curious algorithms.

Reference: Technology Review


This entry was posted in Computer science, Mind and thought. Bookmark the permalink.