Smooth talks like an AI

Artificial intelligences learn to speak thanks to “language models”. The simplest models allow the auto-complete feature on the smartphone: they suggest the next word. But the skill and progress of the most modern language models such as GPT-3, LaMDA, PaLM or ChatGPT are breathtaking, with, for example, computer programs capable of writing in the style of a given poet, simulating deceased people, explaining jokes, translating languages , and even produce and correct computer code – which would have been unthinkable just a few months ago. To do this, the models are based on increasingly complex models of neurons.

[Près de 80 000 lecteurs font confiance à la newsletter de The Conversation pour mieux comprendre les grands enjeux du monde. Abonnez-vous aujourd’hui]

When artificial intelligences speak arbitrarily

That said, the models are more shallow than these examples lead us to believe. We compared stories generated by language models to stories written by humans and found them to be less coherent, yet engaging and less surprising than stories written by humans.

More importantly, we can show that current language models have problems even with simple reasoning tasks. For example, when we ask:

“The lawyer visited the doctor; did the doctor visit the lawyer? »

… simple language models tend to say yes. GPT3 even replies that the lawyer did not visit the doctor. One possible reason we investigate is that these language models encode word positions symmetrically, and therefore do not distinguish between “before the verb” and “after the verb”, complicating the distinction between subject and object in a sentence.

In addition, the theoretical limitations of “transformer”-based language models mean that they cannot distinguish even and odd sequences of a particular element if these are interspersed with another element. In practice, this means that models cannot solve a task we call the “pizza task” – a simple puzzle of the form:

“The light is off. I press the light switch. I eat a pizza. I press the light switch. Is the light on?”

Here, an even sequence of light switches means the light is off, but a BERT model fails to learn that. The most powerful models at the moment (GPT-3 and chatGPT) categorically refuse to conclude that the light is off.

[Près de 80 000 lecteurs font confiance à la newsletter de The Conversation pour mieux comprendre les grands enjeux du monde. Abonnez-vous aujourd’hui]

Today’s language models also have difficulty with negation and generally perform poorly on reasoning tasks as these become more complex. For example, consider the following puzzle from China’s National Civil Service Examination:

David knows Mr. Zhang’s friend, Jack, and Jack knows David’s friend, Mrs. Lin. Everyone who knows Jack has a master’s degree, and everyone who knows Mrs. Lin is from Shanghai. Who is from Shanghai and has a master’s degree? »

Current models only respond correctly 45% of the time and chatGPT refuses to respond… while the best human performance is 96%.

The model doesn’t know that he doesn’t know

Most likely, the main drawback of current language models is that they are made to make analogies, but not to accurately remember too large a set. For example, they cannot remember a list of a million customers or a set of ten thousand proteins. The model will, at its own discretion, forget some of the things. An even bigger problem is that he can also, again at his own discretion, invent new things (a process aptly called “hallucination”). This is a fundamental consequence of machine learning techniques for language models, optimized for interpolation and generalization, not memorization.

The problem is that it is impossible to tell from the output of a language model whether an answer has been remembered or invented. For example, when GPT-3 is asked where Alan Turing died, the model deadpans “Warrington”, instead of the correct answer (Wilmslow). ChatGPT answers this question correctly, but when asked about lesser-known researchers like the authors of this article, it constitutes fact. And when asked more complicated questions, he gives elegant but false answers:

“Is the addition of two odd numbers odd? »

… the model politely replies:

“Yes, when you add two odd numbers, the result is always an odd number. This is because an odd number can be expressed as 2n+1, where n is an integer. When you add two numbers in this form, the result is (2n +1)+(2m+1) = 2(n+m)+2, which is also of the form 2p+1, where p is an integer. This means that the result is an odd number. »

On the form, the answer is very convincing, but false.

With these examples, we want to show that it is currently imprudent to rely on a language model to reason or make decisions. Models get better over time, know more things, and know more and more about not responding when they don’t have the information. But apart from simple questions, a language model can easily come up with an answer and with an equally invented and approximate explanation or proof.

Other methods excellent for reasoning about accurate facts

All this is not to say that language models wouldn’t be amazing tools with amazing properties. Nor is it to say that language models can never overcome these challenges, or that other methods can deep learning will not be developed for this purpose. Rather, it is to say that at the time of writing, in 2022, language models are not the preferred tool for reasoning or for storing accurate data.

For these functions, the tool of choice at the moment is “symbolic representations”: databases, knowledge bases and logic. These representations store data not implicitly, but as sets of entities (such as people, commercial products, or proteins) and relationships between these entities (such as who bought what, what contains what, etc.). Logical rules or constraints are then used to reason about these relationships in a way that is provably correct—although usually disregarding probabilistic information. Such reasoning was, for example, used in 2011 by the computer Watson during the game Jeopardy to answer the following questions:

“Who is the Spanish king whose portrait, painted by Titian, was stolen at gunpoint from an Argentine museum in 1987? »

The question can actually be translated into valid rules of logic on a knowledge base, and only King Philip II can match. Language models currently do not know how to answer this question, probably because they cannot remember and manipulate sufficient knowledge (links between known entities).

Et meget simpelt eksempel på en Fuzheado/Wikidata, CC BY-SA” data-src=”https://s.yimg.com/ny/api/res/1.2/CE5fxEhe3SynHARpAsShlA–/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MDtoPTY3Mw–/https://media.zenfs.com/fr/the_conversation_fr_articles_180/45c44328d0e4653422a0bd7deaac9e67″/>
A very simple example of a “knowledge graph”. These objects make it possible to connect concepts and entities. They are widely used by search engines and social networks. Fuzheado/Wikidata, CC BY-SA

It is probably no coincidence that the same large companies that build some of the most powerful language models (Google, Facebook, IBM) also build some of the largest knowledge bases. Today, these symbolic representations are often constructed by extracting information from a text in natural language, that is, an algorithm tries to create a knowledge base by analyzing press articles or an encyclopedia. The methods used for this are in this case the language models. In this case, language models are not the end goal, but a way to build knowledge bases. They are well-suited for this because they are highly noise-resistant, both in their training data and in their input. They are therefore well suited to handle ambiguous or noisy input, which is ubiquitous in human language.

Language models and symbolic representations are complementary: language models excel at parsing and generating natural language text. Symbolic methods are the tool of choice when it comes to storing exact objects and reasoning about them. An analogy with the human brain can be instructive: some tasks are easy enough that the human brain can perform them unconsciously, intuitively in a few milliseconds (reading simple words or entering the sum “2 + 2”); but abstract operations require painstaking, conscious and logical thinking (eg remembering telephone numbers, solving equations or determining the price/quality ratio of two washing machines).

Daniel Kahneman has dichotomized this spectrum into “System 1” for subconscious reasoning and “System 2” for effortful reasoning. With current technology, language models seem to solve “System 1” problems. Symbolic representations, on the other hand, are suitable for “System 2” problems. At least for the time being, therefore, it appears that both approaches have their raison d’être. Moreover, a whole spectrum between the two remains to be explored. Researchers are already exploring the link between language models and databases, and some see the future in merging neural and symbolic models into “neurosymbolic” approaches.

The original version of this article was published on The conversationa news site à dédié for the exchange of ideas between academic experts and the general public.

Read more:

  • How AI affects protein structure research

  • Football, a sport dissected by science but with often unpredictable results

Gaël Varoquaux has received support from the National Research Agency (LearnI chair), BPI France and the European Union.

Leave a Comment