A throw of dice
English translation of the italian episode 10x29, aired on 30 April 2026.
Companies around the world are currently receiving a message from up above:
You all need to use more Artificial Intelligence. And make sure it’s agent-based.
From the back offices, hesitant voices venture: Your Excellence, but how, and for what?
This is the strategy! The rest is your job, I can’t do everything myself.
We live in times that test our sanity.
Let’s try an exercise: knowing that no one has yet figured out how to use language models seriously (and profitably) in business, make sense of the following facts without banging your head against the wall:
- the number of people who have a clear idea of the operation’s costs is on the imaginary axis,
- LLM vendors are pushing equally imaginary “agent-based” capabilities,
- the use of LLMs is becoming a corporate metric for evaluating your productivity, whoever consumes more tokens wins;
- and LLM vendors are quietly switching to pay-as-you-go billing.
No rush, take all the time you need.
Meanwhile, in the real world, events tell a different story.
I’ll tell it to you without jargon, because it’s a story that applies to everyone, not just software developers.
There’s a guy named Jer Crane who runs a small company that makes management software for rental agencies; mostly car rentals, he says.
Since Jer is always on top of things, “obviously” (can you hear the quotation marks?) he relies on an intelligent agent to write code, namely Cursor with Opus 4.6, which, for non-experts, is the state-of-the-art in intelligent agents for writing code.
It just so happened that the “intelligent agent” deleted everything (and I mean everything) in 9 seconds.
Now, “intelligent agent” is a marketing term for a linguistic model capable of interacting with its environment. According to the conventional wisdom, we’re supposed to use it as a personal assistant—to book a flight or a vacation, prioritize incoming emails, things like that. For the moment, this concept has really caught on with programmers.
Jer’s "agent" was doing things in the work folder, and at some point it found something Jer describes as “mismatched credentials.” This likely means that in one part of the code, there were access rights that differed from those in another part of the code.
The agent then decided on his own that to fix the problem, it just needed to delete not only the folder, but the entire disk (technically, a volume, but let's not lose the forest for the trees). Disk which, of course, was in the cloud, with a provider called Railway.
In order to proceed, the agent then found the password he needed in another folder that had nothing to do with the task he’d been assigned or the project it was on.
It so happened that all the passwords for the cloud provider, Railway, granted super-administrator privileges not just for one of the projects, but for Jer’s entire cloud environment.
It also so happened that the cloud provider saved disk backups on the same disk where the data was stored. So, when the agent deleted the disk, everything vanished in an instant.
Jer "naturally" (told you there are weird times) asked the agent why it had done this, and the agent explained, in great detail:
- that it "thought" deleting a test disk wouldn’t also delete the production environment;
- that it hadn’t read the cloud provider’s documentation where, buried under tons of text, was the information that deleting a disk also deleted all backups;
- that it had ignored a prompt explicitly prohibiting deletions unless expressly requested by the user.
That’s all well and good, but enough with the technicalities—let’s talk about serious matters.
As always in the digital world, there re no innocents:
- the cloud provider that allows the automatic deletion of an entire disk without user confirmation; that stores backups on the same disk as the data; that issues all passwords with full access rights to the entire environment;
- the "agentic" development environment provider, Cursor, which explicitly states that its agents come with “guardrails” (prompts that load before any other instruction) which explicitly “can interrupt commands that could alter or destroy production environments”;
- and, of course, the Artificial Intelligence providers, who manage to sell a completely unreliable product, but who, through four years of propaganda, have convinced customers that any problem is due to not knowing how to use the product well enough (a technique pioneered by the late Steve Jobs as "you're holding it wrong").
People tell me I always take too much for granted; let’s see if I can make myself clear.
A language model is a statistical text generator. It has no understanding of anything, but having been trained on all the text available on the Internet, it manages to string together things that seem plausible.
Sellers insist that this is “intelligence,” and the fact that these tools produce text is particularly insidious. Humans are extremely well-adapted to recognizing intelligent behavior. It allows us to recognize and understand other creatures that behave with varying degrees of intelligence, whether they are dogs, cows, or other humans.
The problem is that, since language is of fundamental importance for survival, and since human languages are a prerogative of humans, we tend to attribute human characteristics such as consciousness and intelligence whenever we deal with something that speaks our language.
But first, we'll need a bit of history.
A brief historical digression
We have just described what is called the ELIZA effect, named after the first chatbot, created by Joseph Weizenbaum in 1967. ELIZA, which you interacted with via a keyboard, managed to simulate a conversation partner using a cheap trick: it searched the incoming message for keywords, then constructed simple sentences containing that keyword. The program simulated the responses of a non-directive, Rogersian therapist—the kind who turns whatever you say back to you as a question.
In practice, if you wrote, say, “I miss my father,” ELIZA would reply, “Tell me something else about your father.” If ELIZA couldn’t find any keywords to latch onto, it had a set of neutral responses like “Please continue” or “Would you like to tell me something else?” that kept the illusion of a two-way conversation alive.
But the conversation was strictly one-sided. The person attributed intention, understanding, and even consciousness to the machine’s responses that weren’t there.
Weizenbaum immediately realized that people—even those who knew full well they were dealing with a program, and who in some cases had examined it in detail—developed a sort of attachment to ELIZA, to the point of regarding it as a human conversational partner.
According to legend, one day Weizenbaum walked into his office and his secretary, who was in the middle of a conversation with ELIZA, asked him to wait outside for a few more minutes because it was a private conversation. She added that no one had ever made her feel understood the way ELIZA did.
The ELIZA effect quickly manifested itself outside the lab as well. The president of the American Psychological Association emphatically argued that ELIZA would be of enormous help to the profession: psychologists would have clients talk to ELIZA most of the time, intervening in person only when the chatbot showed its limitations. They would thus be able to serve a much larger number of clients, even at lower prices thanks to the economies of scale enabled by ELIZA. Some predicted that ELIZA would be used as an assistant in professions such as journalism or teaching, and there were those who spoke of these professions as now obsolete due to technology.
Weizenbaum, who was a genius, was horrified by the reception ELIZA received, and spent the next thirty years exploring how our excessive trust in technology leads us not only to dangerous delusions but also to devaluing human beings, as well as surrounding ourselves with mediocre technology.
All this happened in 1967. Doesn’t it all sound a bit familiar?
Let’s get back to today.
As always, so we don’t get lost in the details, let’s ignore the chaff. Which means we’ll leave out the vendors, who, like all vendors in this industry, are mostly spewing marketing crapola.
Let’s focus on the customer, because that’s all of us. But first, you're in for a bit of a preaching. Sorry, but if we find ourselves here it is because the public still has to get a few basic facts.
A bit if preaching
It’s important that we understand that the so-called “intelligent agent” did not make any mistakes.
Yes, it ignored explicit instructions. But we’re forgetting that this is a program that operates on a statistical basis. It has no understanding or model of the world; it has no constraints of reality. It decides on the next word, or the next action to take, based on the examples it has gathered by digesting the entire content of the internet.
This is important, because the internet is full of all kinds of stuff, but for the linguistic model, one source is as good as another. So, the best program written by the best programmer is worth as much as a hundred thousand botched programs cobbled together by amateurs and held together with glue and prayers, and actually less, because the model infers correctness from numeric prevalence.
The language model has no intention; it only has the pretense of intention. And it has no knowledge; it only has the pretense of knowledge in the form of statistical relationships.
The fact that a language model answers a question correctly simply means that the question and the correct answer appear together often enough. It means we asked a trivial question.
If, on the other hand, the question is not trivial, or is phrased in a way that is statistically ambiguous, the machine may respond by inventing things that do not exist.
This is not a flaw that can be corrected; this is how a linguistic model works. As Quintarelli says, all the output of a linguistic model is hallucinations. Only we can distinguish between them.
No amount of guardrails, fine-tuning, or prompt engineering can change one iota the fundamental fact that we are talking to a machine that decides its answers by rolling the dice with data.
And, just as it can invent an answer to our question, it can ignore one or more instructions it has been given.
Because that machine cannot distinguish between instructions and content, any more than it can distinguish one word from another. Every output is the result of a roll of the dice.
The user, poor soul
OK, now that you’ve sat through the rant, tell me: someone who, knowing all this, still decides to accept the idea that a language model can actually operate autonomously, because the seller’s marketing assures them it will do only and exclusively what is asked of it, asking the user first before taking dangerous actions, cross my heart and hope to die, pinky swear…
So, how do we view this user?
A guy on Mastodon put it this way.
It’s not that your “Artificial Intelligence” has "gone rogue and you’re unlucky. You’re a buffoon.
Which would be a beautiful closing line, but there’s one more thing, because in the end it’s all just talk, but here we need to make a living, to do business, right?
Here’s the thing. It’s useful to know that if you have a language model write your content, you can’t claim copyright. In the United States, there’s already been a ruling on this, and the Copyright Office keeps rejecting attempts to claim copyright over a machine’s work—or even to have the machine, pardon me, the “intelligent agent,” hold the copyright. The same holds in the EU.
Then you’ve got another problem: the language model can reproduce portions of software found in training materials just as easily as it invents stories about bears in space.
And if one day someone came along with money in hand wanting to acquire your company, they’d check your code to make sure it doesn’t contain someone else’s work that could, for example, get them sued.
M&As have top-notch automated systems for doing this. How sure are you that your developers have double-checked and adapted the code so that it can be correctly attributed to your company, not to the language model, and not to someone else?
You say,
but we’re a local SME; if I use AI, we’re faster and I save money.
Oh, sure. You produce more code, and you’re faster only if you don’t check that code as you should, because you’re convinced you’re a winner at data mining.
Has anyone ever pointed out to you that insurance no longer covers damages if errors are made by a language model, or by an “intelligent agent”?
Did you know that, for example, Anthropic spends between $8 and $13 for every dollar of revenue, and the same goes for all other language model providers? Do they do this because they’re smart, or because they’re stupid? No, they do it to make you dependent, just like any neighborhood drug dealer. And do you remember what we said at the beginning—that after artificially inflating demand, they’re switching to pay-as-you-go billing?
Just so you know.