How machines have come to speak human natural language

We all know the huge progress made by neural networks in understanding natural language. But we know less about how coders have taught machines to speak our language.

This revolution in data-processing compiling has made possible the many programming languages that we know today. By translating binary codes into a more usable language, they have made coding accessible to almost everyone.

According to Nick Polson and James Scott in AIQ, this revolution led to a second, which is the invention of natural language learning and recognition models. Here is how we got from one to the other.

The Invention of the compiler languages

The IBM Harvard Mark I
The IBM Harvard Mark I, PhotoQuest / Archive Photos / Getty Images

When Grace Hopper, a brilliant mathematician, was asked in 1944 to work on the Harvard Mark I, one of the very first programmable computers, she found great frustration.

Although she was used to calculating very long sequences of operations, performing actions by typing 0s and 1s was far too repetitive and tiring for her. As she had to establish very complex mathematical models to build the ballistic tables for the US Army weapons, she had to break down her calculations into the smallest possible arithmetic operations. The job was to describe the very precise allocations of O and 1 on computers for the most elementary operations.

But Hopper, by dint of using the same mathematical formulas and subroutines, then found a way to speed up the task. Realizing how easy it is for computers to repeat operations on a large scale, she had the idea of building an operation library. Each time the coder would indicate a formula defined by a single digit, the computer would replay exactly the operation. This is where the first data compiler (then called by Hopper FLOW-MATIC) was born, which allowed company employees to calculate inventory data for the first time.

With such a device, coders could compile and use programs that were written in advance and were sure to work. They could use real words from the dictionary to code, and make the computer speak their language.

This essential invention in computer science was the basis for the construction of all modern code languages (C, C++, Python, Java…). It was also the first invention to make the language of machines accessible to all users.

The Top-down approach limitations

IBM Shoebox speech-recognition demonstration
IBM’s Shoebox first speech-recognition computer

The invention of the compiler asked for the first time whether machines were capable of speaking our language, so that they could be reprogrammed to our wishes and voices.

During the 1960s, computer scientists wondered if they could apply the same compiler logic to make machines understand human language. Starting from a top-down approach, with very precise sets of commands and very defined constraints, they tried to elucidate the logical and grammatical rules behind human languages.

A machine like the “Shoebox” presented by IBM in 1962 was thus able to recognize couples of words with very poor but promising accuracy. With some strings of commands and limited vocabulary, computers were already able to identify a sentence, even if we had to specify every little exception. One would think that with even larger rule sets, machines could understand greater semantic complexity.

But the human language structure gradually turned out to be very different from deterministic, programming language systems. The meaning of the grammar of each language is based on a very large number of exceptions that allow for semantic differences.

A machine would find it for example difficult to consider every linguistic exceptions with if/then statements. Sentences can also be based on a lot of ambiguity of meaning, which needs referring to context. Furthermore, the acoustic difference of pronunciations, accents or simply the environment makes this complexity multidimensional.

So how computer scientists did proceed? They relied on vector-based natural language models, which use multidimensional semantic differences.

Predicting Word occurrences

word2vec vectors

Realizing how human languages had numerous statistical regularities, the researchers relied on a more probabilistic and bottom-down approach to word recognition.

To clarify ambiguities such as “weather report” and “wether report”, new models looked at how frequent these words were occurring (and concluded that “weather” report was much more frequent than “wether” report). Rather than trying to understand natural language, they thus used semantic data to predict what sounds more natural to the speakers.

But these techniques were still not accurate enough to consider more complex semantic elements, such as phrases or whole sentences. It is only in the 2010s that researchers have found more intelligent models to learn the language. Based on vector linguistic models like Word2vec, researchers have represented the meaning of words in their relative opposition to other words. Multidimensional vector fields allowed a much more accurate approximation in word recognition and translation.Neural networks learn from this by making constant analogies between semantic expressions (man is to woman = father is to mother). This allowed them to deduce the meaning of the sentences they hear, such as “I prefer to give it to my ****** rather than to my mother” (understanding that the most likely word is father). With recurrent neural network models, which memorize the outputs throughout the sentence, deep learning technologies are making great progress in the recognition, translation, and intelligent understanding of expressions. And this brings many opportunities for the human/machine interface.

How machines will speak their user’s language

All these advances in natural language learning give us hope for further progress in coding accessibility. Because speech recognition and semantic understanding technologies not only allow to understand the user, but could also allow users to communicate more easily with machines. One can imagine computers reprogramming themselves at the user’s own speech command.

Don’t Stop Here

More To Explore