The Skeptics Society & Skeptic magazine


Chatbots and the Pursuit of Artificial General Intelligence

Since the release of ChatGPT and other large language models (LLMs) in the fall of 2022, the hype and hysteria about artificial intelligence has been dialed up to 11. ChatGPT, Bard, and other LLMs (hereinafter, “chatbots”) have been seen as significant advances toward the holy grail of artificial general intelligence (AGI).

Chatbots are the result of artificial neural nets of unprecedented size, processing enormous samples of natural language—some say essentially the entire World Wide Web (or, at least, all of Wikipedia). At least up to a certain date:

Me: What is today’s date?

ChatGPT-3 [on 7/18/2023]: Today’s date is November 8, 2021.

ChatGPT-3.5 knows today’s date, but is still time-limited:

Me: What happened yesterday?

ChatGPT-3.5: I’m sorry for any confusion, but as of my last knowledge update in January 2022, I don’t have real-time information, and I’m unable to provide details about events that occurred after that date. To find out what happened yesterday, I recommend checking the latest news updates, online news websites, or other reliable sources for the most recent information.

Chatbots produce astonishingly convincing simulacra of natural language. Their outputs, however, are the result purely of pattern-matching. Chatbots are autocomplete on steroids. Chatbots “understand” nothing, even as they fluently respond to user prompts. It used to be said that natural language computer programs had syntax but not semantics. Chatbots have neither syntax nor semantics, only statistics.

• • • • • •

The notion that large language models might in some way resemble or parallel what’s actually going on in a real human brain is perhaps suggested by the phrase “artificial neural net[work].”

The idea that so-called artificial neural networks bear any resemblance to natural neural networks (in plain words, actual brains) is almost laughable. In artificial neural nets (including those in large language models), the “neurons” and the “synapses” are represented by one or two parameters (and the pattern of the network itself largely by a single number identifying the strength of the connection between pairs of neurons). Detailed biological models of actual neurons and synapses can consist of hundreds or even thousands of values for each neuron and each synapse—specifying, for example, the firing rate, the refractory period, and the strength of each neurotransmitter emitted—and all artificial neural nets omit entirely attempting to model the activity of the glial cells, which in actual brains influence neurons hormonally in ways that are scarcely understood.

In other words, although the latest LLMs are impressively large and their output impressively fluent, their models are oversimplified by two, three or perhaps even larger orders of magnitude. In the 1940s, artificial networks were said to be “modeled” on actual brains. Nowadays, they are only said to be “inspired” by natural neural networks—a word so vague as to disclaim any accuracy at all.

To the extent that anyone attempts to model language beyond the level of neurons and synapses, they would somehow have to map the semantics of thought onto the syntax of speech (and written language). But, as we’ve seen, large language models have neither syntax nor semantics. We have no idea how it all works, and we have no idea how to model it, either.

• • • • • •

That chatbots respond to prompts in natural language with natural language is the foundation of the illusion that there’s anything akin to comprehension occurring. But consider the psychological effect of reading this string:

100 000 010 000 111 000 101 000 110 000 011 001

Does this mean anything to you? Me neither. Now apply the following table:

[blank] 000
[period] 001
am 010
body 011
I 100
just 101
my 110
not 111

The string above can now be interpreted as:

I am not just my body.

Reading this sentence has quite a different effect. But from the point of view of the programming of a chatbot, they are exactly equivalent.

The training of a chatbot doesn’t even attempt to address what any of those words mean—that is, their referents in the real world. Back when AIs were programmed with some attempt to incorporate how sentences were constructed and interpreted, this was known as the syntactic bubble. As an example, let’s try to understand what “body” means solely by following Merriam-Webster. (We’ll ignore the fact that, whenever we look a word up, we have to determine the appropriate sense to choose the appropriate definition, which is sometimes unclear.) Our paper chase begins like this:

  • Body: the organized physical substance [of an animal or plant either living or dead].
  • Organized: having a formal organization to coordinate and carry out activities.
  • Formal: belonging to or constituting the form or essence of a thing; relating to or involving the outward form, structure, relationships, or arrangement of elements rather than content.
  • Organization: the condition or manner of being organized.

Well, that didn’t take long, did it? We’re caught in a loop from which we can never escape. (This is just the first loop encountered in chasing down “body.” Another loop goes from “physical” to “material” and back again. Nor was “body” chosen because it was expected to be particularly difficult; you can conduct the same exercise with any word at all.)

Nowadays, this problem might be dubbed the chatbot bubble. Chatbots no more understand the prompts to which they respond and the outputs they produce than the printer that disgorges your store receipt is genuinely appreciative when it thanks you for your purchase.

Why do we find chatbots so convincing that some people assert that they have already achieved sentience and others actually believe that they share a love with their automated interlocutors?

Well, we’re easily fooled. Have you ever answered a phone call and, thinking you were listening to a real person, tried to reply to a recording? I know I have.

Almost immediately after birth, people normally develop what’s known, awkwardly and unfortunately, as theory of mind. (Hereinafter, I’ll refer to this instead as a mind model.) This is the irresistible sense or intuition that other people (and most animals) experience, well, experience, just as we ourselves do—thoughts and feelings, pleasure and pain, hunger and thirst, sensation and perception. We make this inferential leap long before we have any intellectual or philosophical justification for it (or the chance to dispute it—as do eliminative materialists).

Since the invention of written language, people have made the same inference on much narrower grounds—we infer sentience from sentences alone. (Even if we don’t believe that the cash register itself is sentient, we know that someone once wrote the words “THANK YOU” that the cash register mindlessly spits out on the slip of paper you get.) Very recently, we’ve been engaging in email and text message dialogs with people we’ve never met or even seen, and we inevitably infer that our conversations are also with actual people. Now, when we engage in a dialogue with a chatbot and it replies with uncanny plausibility, we irresistibly make the same inference.

This effect is not entirely accidental. ChatGPT, for example, was designed to reply in the first person as if it had both consciousness and intentionality. For example:

I am a large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive. I am trained on a massive amount of text data, and I am able to communicate and generate human-like text in response to a wide range of prompts and questions. For example, I can provide summaries of factual topics or create stories.

Let’s see how much less compelling it becomes in the third person:

ChatGPT is a large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive. ChatGPT is trained on a massive amount of text data, and is able to communicate and generate human-like text in response to a wide range of prompts and questions. For example, ChatGPT can provide summaries of factual topics or create stories.

Now, this reads as not coming from ChatGPT itself but from its designers—who of course were its real source. But the designers of ChatGPT have, even in their canned disclaimers, themselves attributed to ChatGPT awareness and intentionality, and invite us to do the same:

I am always learning. I try my best to provide accurate and informative answers to your questions, but I may sometimes make mistakes. If you ever notice that I have provided a false or misleading answer, please let me know and I will do my best to correct it.

I am aware [of the phenomenon of chatbot hallucination], and I am working to improve my ability to detect and avoid it.

The intentionality attributed to ChatGPT is, in short, intentional:

My use of language, such as saying “I apologize,” is part of the programmed response to provide a more human-like interaction and to acknowledge and address user concerns.

• • • • • •

When we watch a magician (or, as some performers more properly refer to themselves, an illusionist) perform, we don’t believe that we’re actually seeing real magic. In fact, the delight and astonishment comes from the paradoxical incongruity of what we have just seen with what we know of how the world works. (When an illusionist demonstrates and explains how an effect is achieved, the usual response is not a feeling of relief but of disappointment.) No one believes that, as illusions become increasingly elaborate and convincing, we’re getting any closer to achieving actual magic. Why, then, would we believe that as chatbots become more fluent and convincing we’re getting any closer to actual intelligence?

• • • • • •

A recurrent claim is that, as chatbots incorporate larger and larger neural nets with deeper and deeper layers, digest greater and greater amounts of text, and receive more and more feedback from users and others, they will become somehow “better.” To the extent that “better” means more convincing, improvements are likely to be imperceptible—since their output is already convincing enough, thank you very much. But little or no attempt is actually being made to verify the factuality of the source material or what is emitted. ChatGPT contradicts itself about this (notice the use of the passive voice to avoid the assumption of responsibility):

…[E]fforts have been made to ensure the accuracy of the information…provide[d.] …[T]he selection of books, articles, and websites is typically based on a combination of factors, including considerations of reliability, diversity, and relevance. OpenAI aims to include sources that are widely recognized as reputable and credible within their respective fields.

But:

…[T]he training process typically involves using as much text data as possible rather than a manual selection process.

• • • • • •

Whatever intelligence is displayed by a chatbot is the intelligence embodied in the source texts that it has ingested. Imagine a chatbot entirely based on Wikipedia. As a recent article makes clear,1 Wikipedia is ChatGPT’s largest single source, and the relationship between Wikipedia and chatbots such as ChatGPT is more intimate and complicated than perhaps has been recognized. Wikipedia provides an application programming interface (API), allowing programmers to input a particular entry automatically for digestion and incorporation into its knowledge base, or just on-the-fly summarization or paraphrase.

In this case, why would anyone attribute intelligence to such a chatbot, rather than simply to the authors of the encyclopedia itself? And if a chatbot digested two such encyclopedias? Three? Three thousand? At what point should the attribution of intelligence shift from the authors of the sources to the chatbot itself?

Another major source is The New York Times (as well as other newspapers and magazines). In fact, the Times recently sued OpenAI (the creator of ChatGPT) and Microsoft (the creator of Bing and Bing Chat) for, in effect, plagiarism.2

ChatGPT, however, argues both sides of the question of whether it displays genuine intelligence with equal conviction. The following are excerpts from much longer and more detailed text responses to prompts.

ChatGPT displays artificial intelligence (AI) because it is an advanced language model that can understand and generate human-like text responses.

But:

…[W]hile ChatGPT is an impressive language model capable of generating human-like text responses, it falls short of displaying true artificial intelligence due to its lack of understanding, limited domain, contextual limitations, inability to learn, and potential biases and errors.

ChatGPT will agreeably argue both sides of almost any dispute you choose. Try it yourself with your own favorite controversy.

• • • • • •

In the natural world, we usually recognize intelligence only in animals with a brain and, of course, the products of such intelligence, such as in the good design of a tool. The lives of animals (hereinafter referring to animals with a brain) display two fundamental attributes:

  • Embodiment: animals have bodies equipped with senses to perceive and mechanisms to respond and behave. Their engagement with the world is consistent with the body they actually have. An octopus, for example, uses all eight of its tentacles, but a chimpanzee doesn’t try to make use of the four limbs it doesn’t have.
  • Embeddedness: animals are alive in the world, with which they engage and interact.

In addition to a mind model, animals behave as if they have a model of the world, based on the following basic attributes of reality:

  • Tridimensionality: Reality subtends three spatial dimensions.
  • Extensionality: Objects and animals have extent in each of the three spatial dimensions (even if one or even two dimensions may be minimal, as is the case of a sheet of paper or a hair).
  • Time: Reality passes continuously in time from the past to the present and into the future.
  • Continuity: Over time, and to a greater or lesser extent, things and processes persist from one moment to the next, with gradual change the norm and abrupt change the exception.
  • Regularity and Periodicity: A great many natural phenomena exhibit regular and often periodic change, such as the day/night cycle; waking and sleeping; needs such as hunger and thirst alternating with satiety following satisfaction; the lunar cycle; the seasons; birth, growth, maturity, and death; and the like.
  • Uniformity: To a greater or lesser extent, things and processes are more or less the same everywhere.
  • Consistency: To a greater or lesser extent, things and processes do not exhibit paradoxical properties.
  • Causality: Events in the present are contingent upon and caused by events in the past; events in the future will be contingent upon and caused by previous events.

Other basic attributes of an animal’s reality include, among others, the following:

  • Affordance: Everything in the world, natural and artificial, has properties and features that can be used to accomplish one or more goals. For example, a door handle can be used to open a door, and it can also be used to hang your hat.
  • Choice: To a greater or lesser degree, an animal, having agency, can choose its behavior, based on its own goals and its knowledge of its situation.

Further, animals become familiar with certain ubiquitous characteristics of their environments, including such features as gravity, light, sound, solidity, liquidity, gaseousness, the flow of liquids (rivers, waves, tides, rain), and gasses (winds), and so on.

All of these, and much more, are reflected in a world model that underlies an animal’s implicit understanding of reality. The mind model is a fundamental part of the world model. This world model is much deeper and wide-ranging than any enumerable collection of facts could possibly be. Our world model and our ability to engage with the world based upon it arise almost immediately after birth and may have even evolved into the brain without the need for any experience at all to trigger it into being. In the case of humans, this implicit world model almost always includes language, explicit education, and training. To date, all attempts to incorporate a world model into an AI have failed.

To give an example of implicit knowledge: you intuitively know that a person can’t touch their left elbow with their left hand. What does ChatGPT-3 think?

Yes, a person can touch their left elbow with their left hand by crossing their left arm over their chest and reaching their left hand towards their left elbow.

ChatGPT-3.5 does a good deal better:

No, it is physically impossible for a person to touch their left elbow with their left hand. The range of motion of the human arm does not allow for the hand to reach the same-side elbow without external assistance or manipulation.

• • • • • •

If a chatbot—necessarily disembodied and unembedded— can’t achieve actual intelligence, what about a robot? At least some AI researchers believe that actual computerized electromechanical robots can solve the dual fundamental realities of embodiedness and embeddedness. How’s that been going?

Boston Dynamics appears to have the most advanced robots. In one demonstration,3 the robot runs and jumps when it can simply walk, seeming to be more concerned with demonstrating its parkour (the sport of traversing obstacles in urban environments by running, climbing, and jumping athletically) skills than anything else.

Perhaps the most conspicuous deficiency in such robots is their silence. No attempt has been made to incorporate a chatbot into one—and it’s entirely unclear how such could be done. In the demonstration of the Boston Dynamics robot cited earlier, a person uses an app to input its instructions to the robot and the app screen is pointedly left unshown.

By its very nature, a chatbot—being unembodied and unembedded—has a timeless view from nowhere and everywhere, whereas a robot is necessarily in a particular place and a particular position at a particular moment in time.

• • • • • •

In our intuitive apprehension of intelligence, it’s often depicted as a kind of spotlight (or, lately, laser) that can, with equal facility, illuminate and penetrate any subject to which attention is directed. This is much like the historical misconception of vision as a force originating in the eyes, like Superman’s X-ray vision. But it is no such facility. We’re better at understanding some things than others; our intelligence works better on some problems than others. My favorite example of this is the Monty Hall problem.4 But this is hardly the only example. Popular websites list the top 10 fallacies, or the top 16 or 24 or 26. Wikipedia’s list contains over fifty unenumerated fallacies in several different categories.

• • • • • •

The concept of artificial intelligence was originally divided into two categories, weak and strong. The notion of strong artificial intelligence was one in which an AI would not only imitate a person but actually have conscious experience as does a person. Of course, eliminative materialists don’t believe that people have conscious experience, either, so this has never been a problem for them. Except for those who believe in the possibility of uploading a mind to a computer—an unlikely prospect—most AI researchers have shifted the distinction from weak/strong to narrow/general.

The concept of narrow intelligence is, well, self-contradictory. As we’ve already seen, the entire idea of intelligence is that it’s a general (even if not unlimited) ability to deal flexibly and creatively with novel, challenging situations. The closest natural equivalent to narrow artificial intelligence is perhaps people that used to be called idiot savants—now more politely just savants—although of course savants have enough broad natural intelligence to navigate their own worlds reasonably well. More recently, the term machine learning is often used to moot any claim of any kind of intelligence at all.

The distinction now is between good old-fashioned programming—which involves encoding explicit instructions for automating a task—and a computer program that, one way or another, learns to do whatever it’s supposed to. Before chatbots, probably the best-known example of machine learning was computer chess.

Because chess has been traditionally seen as a quintessential example of human intelligence, it’s likely that innumerable early computer programmers tried their hand at writing a program that could play the game. Before the days of machine learning, computer chess programs were written explicitly, and they were quite poor. But, with the incorporation of machine learning, a program eventually defeated the current world champion at the time, and nowadays chess programs routinely defeat international grandmasters. Despite this, perhaps surprisingly, chess is now more popular than ever. One program, Stockfish, considered the best in the world, is even used to rate the performance of players in real time.

Make no mistake, however: the rules of the game still had to be explicitly programmed into Stockfish and every other chess program. Without further programming and training, not even Stockfish could play any of the hundreds of chess variations that have been devised.

Ironically, although ChatGPT-3 claims to be able to play chess, it can’t, as demonstrated in this simple example (I’ve edited the moves to conform to standard notation).

Me: Let’s play chess.

ChatGPT-3: Sure, I’d be happy to play a game of chess with you! Shall we start?

Me, playing White: e2 e4

ChatGPT-3, playing Black: e7 e5

White: N g1 h3

Black: N b8 c6

White: N b1 a3

Black: N g8 f6

White: f2 f3

Black: d7 d6

White: d2 d3

Black: e7 e6

Although, through its first four moves, ChatGPT-3 appears to know how to play, on its fifth move, it tries to move a pawn that it had already been moved from a square it no longer occupies back to e6!

With ChatGPT-3.5, the response is equally absurd. After I play e4, it responds:

Now, it’s my move. I’ll respond with 1…c5, the Sicilian Defense:

But on the board it shows itself having made two moves, not just one (both e5 and c5)!

Other published examples of ChatGPT playing chess include it making impossible and illegal moves such as taking its own pieces.

Although ChatGPT can apparently converse plausibly on any topic, including the rules of chess, it is as narrow a machine learning programing as any chess program. Its words are no more meaningful than N b8 c6, and it is no more a step to artificial general intelligence than can Stockfish play Tic-Tac-Toe.

• • • • • •

A major problem with machine learning programs is their opacity—they are completely and fundamentally black boxes. As ChatGPT helpfully explains:

A black box typically refers to a device or system that is opaque or mysterious in its inner workings or processes.

The term “black box” in this sense is from World War II, referring to secret electronic navigational devices literally housed in a non-reflective black housing.5

As explained by Melanie Mitchell in her book Artificial Intelligence: A Guide for Thinking Humans, machine learning programs aren’t learning what we think they’re learning. The author gives several compelling examples, including a school bus identified by a machine learning program as an ostrich. The same program has also identified completely random television “snow” as a robin, a cheetah, an armadillo, a panda, a centipede, a peacock, a jackfruit, and a bubble, and abstract geometric patterns as a penguin, a starfish, a baseball, a guitar, a railroad car, and a remote control.

Unlike traditional computer programs which are opaque enough even when heavily documented and commented by their programmers, large language models consist virtually entirely of parameters (that is, numbers). For example, GPT-3 (a version of ChatGPT) contains 175 billion parameters based on a training corpus of about 500 billion tokens (words and parts of words such as the plural S) and using a token length of 2,048 as its context in predicting the next word. Imagine trying to manually trace and understand the reason a chatbot produces a particular response to a particular prompt! Given this overwhelming complexity and such a daunting task, chatbots can only be regarded as oracles.

[The term] oracle refers to a system or source of information that is relied upon for its accuracy or expertise, even though the inner workings or processes are not fully comprehensible or transparent. It is essentially a trusted authority or source of knowledge that provides guidance or answers to questions, without revealing its underlying methods or reasoning. This concept is often used metaphorically to describe situations where information or advice is accepted without full understanding or scrutiny.

Thank you, ChatGPT.

As a step toward artificial general intelligence, chatbots are a blind alley leading to a dead end. END

About the Author

As a step toward artificial general intelligence, chatbots are a blind alley leading to a dead end. As a skeptical observer of the pursuit of artificial intelligence and other subjects, Peter Kassan has been a regular contributor to Skeptic since 2006. Over his long career in the software products industry, he has held positions at the technical, managerial, and executive levels. Kassan is the author of several software patents. He can be reached at [email protected].

References
  1. https://bit.ly/3RYnkzh
  2. https://bit.ly/3GZ3z4B
  3. https://bit.ly/3tFxuM3
  4. https://bit.ly/48qNm3W
  5. https://bit.ly/48eCyWO

This article was published on July 12, 2024.

 
Skeptic Magazine App on iPhone

SKEPTIC App

Whether at home or on the go, the SKEPTIC App is the easiest way to read your favorite articles. Within the app, users can purchase the current issue and back issues. Download the app today and get a 30-day free trial subscription.

Download the Skeptic Magazine App for iOS, available on the App Store
Download the Skeptic Magazine App for Android, available on Google Play
SKEPTIC • 3938 State St., Suite 101, Santa Barbara, CA, 93105-3114 • 1-805-576-9396 • Copyright © 1992–2024. All rights reserved • Privacy Policy