LLMs, Humor & Play
Essay • 950 Words • Humor, Artificial Intelligence, 2026 • 06/17/2026 • View in graph
Why LLMs aren’t funny and the possible future of play
There are 980 words in this article, and it will probably take you less than 5 minutes to read it.
This article was published 2026-06-17 00:00:00 -0400, which makes this post and me old when I published it.
Can an LLM write a joke? Well yes and no. Many people argue if an LLM can really do perform speech acts like assertion because there is no intentionality and/or Theory of Mind behind its output. So if a joke is something created in order to make someone laugh, the LLM would have to understand and intend that, which many critics would rightfully argue that it’s unable to do so, at the very least right now. However in a purely formal way, yes it can output something that looks like a joke (but still lacking something like taste), which can even make you laugh (although I cannot find any randomized control human-judge trial of human vs LLM generated jokes).
The next question that we then have to ask if “What makes something funny?” There are a lot of competing theories, but ultimately (to me) what is funny is whatever makes you laugh1. So really, by evaluating it (subconsciously perhaps) and finding something funny makes it funny. This means that humor may be fundamentally goal-oriented, to make you laugh. But what makes you laugh? Can we measure that? This is where (formal) evals come in.
What are evals and why are they important?
“Evals are, at their core, systematic methods for measuring whether your AI system is doing what you want it to do. They answer questions like: Is the model’s output accurate? Is it consistent? Does it stay within the guardrails you set? Does it degrade when the input changes slightly? Is it getting better or worse as you iterate on your system?”
The Death of Prompt Engineering, And How Evals Are Rising in Its Place by Yahav M
You may already be able to see how they could be helpful, but they become incredibly important in autonomous agentic systems. Evals are necessary for loops, for a system to get to a goal, it must be able to experiment and measure the outcomes of the various things that it tries in order to move closer to achieving its set goal. The creator of OpenClaw, a popular open-source autonomous agent, talking about loops which started discourse™:

The lack of humor evals is not for lack of trying. There is a branch of (computational) linguistics called computational humor that seeks to understand as well as generate humorous content. Maybe its timing or semantic distance. Of course in reality, it’s probably a mixture of a ton of different things, and while I am fascinated by this line of questioning in this sub-field, I do think that it quickly starts to miss the point. To what end is it to generate jokes? To say that we could? Humor is a puzzle, and jokes are little conceptual puzzles unto themselves, so I guess there will always be people who are fascinated by trying to crack them.
Let’s say for the sake of the argument that a formal eval for humor is found and is able to be employed in an agentic loop. What would happen? There would be individual differences in jokes because of temperature causing non-determinism in the outputs, but otherwise I feel like you would start to see certain themes arise. Maybe through frequency of appearance in training data, or by RLHF, without prompting it at the beginning with the skeleton or general premise of a joke, we would land on some kind of statically average joke (maybe a knock-knock joke). Now wouldn’t that be funny?
At the center of this is all is the question “What makes humor fun to consume and participate in?” In Ted Cohen’s book, Jokes: Philosophical Thoughts on Joking Matters, he talks about how jokes by nature require a shared understanding of things in order for them to work, so there is a certain kind of intimacy shared by the joke teller and the joke listener. In David Shoemaker’s book, Wisecracks: Humor and Morality in Everyday Life, he talks about the incredibly important role of interpersonal humor and how we relate to and play with each other. Think about riffing with your buddies or comedically sparring with them.
To participate in humor is perhaps to play with a quirk of human evolution; using language to exploit a quirk of predictive processing. But we do play with non-human agents, like playing fetch with your dog, so who’s to say that it always has to be human-to-human play. What if beyond working with LLMs, we might play with them in the future?2 While LLMs right now are too RLHF’d to be an assistant that its humor abilities are constrained by its own professionalism, there is no guarantee that’s what it will be like that forever. What might it look like to you riffed or run bits with/on Claude? Is this good for humanity? Bad? More advanced AI and chatbots opens up novel and unexpected ways of interacting with LLMs, and it is our responsibility as people to think through the potential consequences.
Footnotes
I need to think more about this and if it is good for humanity or not. I think that if we open up another entire paradigm of interacting with LLMs beyond work (I think that romantic chatbots are still in this work mode of emotional labor and sycophancy) that AI will only become more entrenched in (certain people’s) lives. Whether this is a good thing or a bad thing is up to other thinkers, but also only time will be able to tell. We are not soothsayers.