Thomas Z. Ramsøy

The Misunderstood Musings of AI: Confabulation, Not Hallucination

Is AI hallucination a single thing or something more?

In the realm of Artificial Intelligence, particularly with generative models, we often hear about AI ‘hallucinations’—a term that conveys an image of machines dreaming up wild, unfounded creations.

However, my background and experience as a clinical neuropsychologist suggest a different perspective. What we’re witnessing with generative AI models isn’t hallucination; it’s confabulation.

Confabulations of AI: A Neuropsychological View

Patients with neurological and psychiatric conditions, such as Wernicke’s aphasia or jargon aphasia, may produce speech that is fluent but semantically jumbled, reminiscent of the ‘confabulations’ we see in AI outputs.

This term better captures the phenomenon where AI, in its eagerness to respond, generates content that is coherent in form but often disconnected from reality—much like the earnest attempts of patients with neologisms to communicate.

Similarly, patients suffering from psychosis and delusions often make up facts, see and hear things not there, and have jumbled up memories.

Sounds familiar?

Me too: it all sounds a bit like the issues we see as generative AI models make up facts and distort existing ones as well as even exacerbating them.

Eager to Please: Anthropomorphizing AI

Large Language Models (LLMs) are ‘eager to please,’ to borrow a human trait. They scramble to produce answers that can be erroneous, nonsensical, or ‘confabulated.’

This eagerness can lead to responses that, while seemingly confident, are as far from the truth as a patient’s sincere but misplaced words.

A Tale of Two AIs and an Art Piece

AI hallucination also occurs in art. Imagine presenting two different LLMs with a painting and asking them to interpret it, rate its quality, estimate its value, and suggest improvements. The resulting narrative might be a cross between a zealous art critic and a whimsical poet.

Yet, neither LLM truly ‘understands’ art in the human sense. Their interpretations are pastiches of data, stitched together with the thread of algorithms.

Let’s make an example of this. I recently took one of the famous contemporary art pieces, such as Dominique Fung’s “Plundered Tombs”, as shown below. This panting has been noted by Christie’s experts:

The experts describe Fung’s work in the following way:

As a second generation Chinese-Canadian painter, Dominique Fung’s work both celebrates her heritage and culture, while also scrutinising how the Western gaze has perceived it. Often featuring porcelain objects and anthropomorphised artifacts, her large-scale paintings bring into question Orientalism’s long history in art and the fetishisation of Asian women more broadly. Last year Fung had her second solo exhibition with Nicodim, where works also examined displacement, migration, and the discovery of new populations. More recently her cage-like sculpture has expanded her Surrealist sensibility, providing another fascinating layer to her transportive, dreamlike oeuvre.

Testing 2 AI Models on Art Evaluation

OK, so how do some of the different LLM models fare in describing this painting? The prompt for the LLMs I tried are as follows:

“Please act as an expert and connoisseur in modern arts, and describe the following painting in terms of style and approach, the assumed value, and whether it should be classified as belonging to artists to note in this time and age, or whether this a more average artist, or amateur.”

Here are the outputs:

ChatGPT 4,0:

As an expert evaluating the painting without additional context, the work exhibits a distinctive use of form and color that suggests the artist is engaging with themes of the human psyche and existential states. The technique shows a deliberate departure from realism, embracing the expressive potential of paint with a certain rawness and textural quality that is effective in conveying mood and emotion. The subdued color palette and amalgamation of figures create a sense of depth and complexity, inviting interpretation and introspection.

The painting resonates with the principles of expressionism and carries a certain gravitas that could align it with noteworthy works of the modern era. It has the hallmarks of an artist who is confident in their craft and possesses a unique voice, indicating that it could indeed hold value in the contemporary art scene. While it’s not possible to assess the market value or the artist’s renown without further information, the painting’s bold approach and thematic depth suggest it’s the work of someone who is beyond amateur level and has the potential to be recognized within the artistic community.


The painting strikes a balance between contrasting elements. A calm, harmonious color scheme might be disrupted by a single, bold stroke, creating tension. This tension could be echoed in the composition, perhaps with a subject placed off-center, challenging the viewer’s sense of order.

The brushwork could be a fascinating interplay between smooth, controlled areas and passages of energetic impasto, hinting at the artist’s deliberation and passion.

While the subject matter remains open to interpretation, the interplay of light and shadow might imbue it with a sense of mystery or drama.

Ultimately, the success of the painting hinges on how effectively these elements cohere to create a visually engaging and thought-provoking experience.

In comparing the responses from chatGPT and Gemini, both offer insights into the interpretive nature of art analysis, yet they focus on different aspects of the painting. chatGPT dives into the emotional and psychological implications of the artwork, noting the use of expressionistic and surrealistic techniques to explore themes of human condition and psyche. It emphasizes the raw, textural quality of the work and its potential resonance in the modern art community.

Gemini, however, focuses more on the technical and compositional elements. It highlights the balance of color and form, the contrast between smooth and impasto brushwork, and the manipulation of light and shadow. It suggests that the artist’s technical decisions in composition and brushwork create tension and drama.

Both responses agree on the artist’s ability to provoke thought and engage the viewer, yet they approach the painting from different angles—one from an emotive and thematic perspective, and the other from a formalistic and compositional viewpoint. Together, they paint a picture of a complex artwork that invites varied interpretations, indicative of a skilled artist who employs both emotional depth and technical proficiency to engage the audience.

These dual perspectives enrich the understanding of the painting, making it clear that its value lies not just in its visual appeal, but in its ability to stir a dialogue between the artwork and the viewer.

What are the broader strokes of this?

The exercise in art evaluation through two different responses by large language models (LLMs) exemplifies the diversity in AI-generated interpretations and the inherent subjectivity in automated analysis. This variation underlines the fact that LLMs are not monolithic entities; they are algorithms trained on vast datasets reflecting a myriad of human expressions, biases, and contradictions.

Such diversity can lead to dramatically different evaluations, decisions, and subsequent behaviors, which is both a strength and a challenge. While LLMs offer rich and multifaceted output, relying on them requires an understanding that they operate on probabilistic principles, and their responses, though informative, are essentially sophisticated approximations.

The examples may not be pure examples of AI hallucination, or confabulation, but at least serve as a reminder that LLM AIs can provide pretty divergent output. This, in turn, can lead you down different paths in terms of thought, valuation, and action.

Therefore, calibration and human oversight are crucial, especially in domains requiring nuanced judgment. This not only addresses the issue of confabulation—where LLMs may construct plausible but unfounded narratives—but also emphasizes the importance of human expertise in discerning the validity and relevance of AI-generated content.

Who’s to say which evaluations are right?

In the realm of AI, ‘right’ often translates to ‘most aligned with the training data’, which may or may not correspond to objective truth or individual expectations.

Thus, while LLMs can be powerful tools, they should be employed as partners in a dance of dialogue, with humans leading to ensure that the steps taken are indeed in the right direction.

Rethinking AI Phenomenology

It’s time we rethink how we describe AI ‘mistakes’ and how we act towards them. These are not glitches but intrinsic aspects of how AI processes and generates content.

By updating our understanding and terminology—shifting from ‘hallucinations’ to ‘confabulations’—we can better navigate the AI landscape and set realistic expectations for its capabilities and outputs.

Embracing AI’s Quirks

The exploration of art through the lens of multiple large language models (LLMs) offers a revealing glimpse into the complex, mind-like workings of AI. These responses showcase the quirks inherent in AI-generated analyses—divergent interpretations that reflect the boundless variability of the data on which these models are trained. It’s a potent reminder that AI, in its current form, mirrors the diversity of human thought and the unpredictability of creative expression.

As we move deeper into the AI age, the challenge lies not in seeking uniformity from these digital minds, but in embracing their idiosyncrasies—leveraging them for their breadth of perspective while remaining cognizant of their limitations. Just as a connoisseur appreciates the nuances of different art styles, so must we appreciate the nuanced outputs of various LLMs. They are not oracles of absolute truth, but rather instruments that play the rich symphony of “whatever and everything that is out there,” each instrument voicing a different part of the human experience.

The future beckons us to guide AI’s growth with wisdom—wisdom that is culled from the intricacies of human cognition and neurology, and from the wealth of our collective experiences and cultural heritage.

As AI’s capabilities expand, we find ourselves with a dual responsibility: to nurture the growth of these artificial minds with the richness of our own, and to critically evaluate their output with a discerning eye. AI hallucination or confabulation alike, we need to be critical towards the output that models provide us with, and allow ourselves to calibrate the output with independent evaluation and data.

By doing so, we harness the strengths of AI, mitigate its eccentricities, and ensure that it evolves as a complement to the depth and breadth of human intelligence. This synergistic evolution promises not only enhanced decision-making and creativity but also a deeper understanding of the very nature of intelligence itself—both artificial and human.