52 Comments

Unfortunately LLMs are trying to confound what people mean by creativity. These authors trained the model on the entire bodies of work created by those poets in the first place. And as you can see here, even then the outputs are just mundane mimicry.

Poets are seldom so derivative, let alone regurgitating "accessible" language in someone else's style!

Expand full comment

I do not use the term creativity much anymore because it has been bastardized by Silicon valley. We must look at the verbiage now also. Artisanship is above and beyond creativity. Silicon Valley wishes and has pretty much took creativity to a data point. We must be very careful going forward

Expand full comment

Love the focus on artisanship! Easier to see the value in it too.

Expand full comment

thank you for responding. I am taking stand. Humanity is the truth not a artificial thing which doesn't have a heartbeat. Humans function with both. There are heartfelt decisions and brain made decisions so I am over this A.I comparison with humans. I pose if this thing is so superior why must it be tied to comparing it with human output? I am personally over this bad game.

Expand full comment

Glad this resonated! Please check out my publication, Ratchets Everywhere, where I write about how innovation propagates as a one-way ratchet through evolution and collaboration, creating abundance through complexity: https://gairiksachdeva.substack.com/

Expand full comment

Why are the inventors so obsessed with replacing himan creativity? It is our souls.

Expand full comment

Inventor is giving them far too much credit. Neural networks have existed for several decades. These people are just plugging very old research into a massive AI research database (they skirted IP law by claiming to be a non-profit research group) and computers that have far more capacity than existed before.

These people are often sociopath grifters who derive pleasure only from 'winning', which they often conflate with making other people lose. I know a real AI researcher, and he's as appalled by what Silly Valley private equity is doing with these algorithms as you are.

Expand full comment

humanity is number thats why also. However, they have folks thinking Tech A.I is number one. Humanity better wake up

Expand full comment

meant to humanity is number one. Apologies

Expand full comment

You know Prometheus myth? Wanting to steal creative fire for mankind? And his punishment. These people might get the courage to use their “fire” creatively? Not to penalize mankind in favor of machines!

Expand full comment

Science fiction classics from 50,60s saw this coming, vernor vinge’s True Names, Phil Dick’s Minority Report show AI structures working as decision makers. It is our history and future. Man is only animal supposedly that can foresee his death. Yet we ignore history.

Expand full comment

Many of the people making AI love this literature, and yet they WANT to make everything AI-driven. Something something torture nexus.

Expand full comment

Seems unlikely this tech will replace human creativity. But understanding how close the technology can get to human level on many different topics is an imporant thing to study.

Expand full comment

Exactly! When will humans aka humanity realize this is a silent war taking place right in front of them.

Expand full comment

This is interestingly similar to much of the use of LLMs. Unless you have expertise in a specific, it will be difficult to find the issues and errors in the LLM output.

Using "naive" subjects for distinguishing poetry is similar to asking the general public to question an LLM about some aspect of law, and then getting them to rate the accuracy and completeness of the reply. They are very unlikely to be able to detect the good from the bad.

It is so important to critically evaluate the methodology to detect the good research from the bad and meaningless.

Expand full comment

The deeper questions that never surface to the top for any of our unbridled acceptance of AI centre on WHY we WANT AI poetry, music, writing, et al

Expand full comment

I decided to let 4o take a shot at poetry, imitating the styles of Walt Whitman, T.S. Eliot, and Lord Byron, and giving it "autumn" as a theme.

https://chatgpt.com/share/673ccd81-d98c-8003-ae82-e4d9d11cd575

The rhyme pattern 3.5 stubbornly clung to seems to be absent. Only its imitation of Byron concerns itself with rhyme at all. It seems to have, as a pastiche, improved. Beyond this, I cannot say: when it comes to poetry, I am frankly a philistine.

So I had a silly idea, and presented them to a different ChatGPT 4o instance, as the work of a college student who was turning these in to a hard-ass professor. The results were... probably too nice, but with some expected critiques. "There’s a tendency to focus more on imitating the surface style rather than capturing the deeper thematic essence. Encouraging more complexity, especially in the T.S. Eliot poem, and more drama or irony in the Byron poem, could enhance the pieces."

https://chatgpt.com/share/673cd13c-c54c-8003-a21c-e416c5cc83c0

Expand full comment

As someone who writes and appreciates poetry, I do think these are better than the ones from the study. Someone who is familiar with the poets in detail would be able to tell these apart from real ones but there is not as simple a rule as there was for the ones in the study.

I still don't think they are any substitute for good, original poetry. However, if I were teaching a creative writing class they are now good enough that it would be harder to determine that they are AI-generated, since they are now above the level of poetry writing ability that even most college students have.

It is interesting to me that 4o apparently "knows" how to write free verse now instead of assuming all poetry is basically ABAB iambic tetrameter. I don't know how that emerged from having more training data, or if UT was the Reinforcement Learning from Human Feedback.

I will have to test 4o on imitating e.e. Cummings and see what results.

Expand full comment

Here were 3 poems ostensibly based on the styles of slightly more obscure poets that 4o did.

I actually think there were some fascinating lines in these, although some that didn't work, as well.

I wouldn't blame people honestly if they liked one of these poems, just as I wouldn't blame people for liking some of the art DALL-E, etc creates.

The real issues, then, become, most immediately with the unlicensed use of copyrighted materials for training, and also energy consumption and that people can pass off fakes as real.

LLMs may be hitting a wall, and thry are not the path to AGI (of human-level generalizable learning is even possible for AI at all). But they do have their uses.

https://chatgpt.com/share/67415566-005c-8011-bd53-4edd857783e7

Expand full comment

4o also has some interesting capabilities to introspect on and interrogate the themes of the poems it generates. By contrast, the sample poems from 3.5 were so thematically banal that I don't think there would have been much to work with.

Since poetic concepts are intrinsically more subjective than many fields, the tendency of ChatGPT to confabulate facts ("hallucinate") doesn't cause as much problem here. Its analysis is probably shallower than a professional literary critic, philosopher or theologian would be, but it's better than most college students whose papers I review, and certainly better than people with no subject matter experience.

I think it is silly that people claim ChatGPT is at PhD level right now, but I could believe undergrad level, and I don't know what its ceiling is.

https://chatgpt.com/share/6741592c-6f18-8011-b2ee-967fa565b2ff

Expand full comment

I think maybe the folks at OpenAI are confusing PhD with LSD.

A simple mistake that anyone could make

Expand full comment

I ran out of free 4o for the night, but interestingly, I was also able to get Claude Sonnet 3.5 to generate a sestina about a flood. It is not terribly interesting in content, and has a few inactive word choices. But a sestina is a hard form to write, and at least I do think it is entirely a sestina.

So it definitely seems LLM poetry has advanced substantially beyond what this study tested.

https://claude.site/artifacts/c6b3efe0-69c4-4db8-b413-eae7e9ee7332

Expand full comment

Davis's paper was a fun read. I did not read the original research, but the impression I got was that the while study setup is extremely poorly thought out.

In a technical analogy it would be asking the general public to judge computer code, or legal argument, or medical research. Does that tell us anything about what is judged? Or more about the judges?

ChatGPT can fool amateurs. Now, whoopie. Researchers writing about that can fool those who don't generally consume research. Whoopie again. Spot a pattern, anyone?

ChatGPT's output may be preferred over difficult poetry. Well. fast food is generally preferred over healthier or more complicated alternatives too. Does that make fast food any better? At pressing base evolutionary buttons, sure. Spot a pattern, anyone?

Expand full comment

I really like your take on this! As someone who is not really a foodie or a poetry devotee, I do find resorting to preferences to assess ChatGPT output does seem a bit like a fast food vs pate taste test.

I can't help but also think that using some of the "objective" methods of the Davis's paper could also be misleading and even a bit circular in deciding if the AI poetry is "good" poetry. However, the original paper was unfortunately making the seemingly much more difficult (objectively testable) claim that “AI-generated poetry is indistinguishable from human-written poetry.”

Expand full comment

> I also think it is a safe bet that the idea that, one hundred years later, scientists would write that drivel generated by an automaton is “indistinguishable” from Shakespeare and Whitman would not have occurred to I.A. Richards in his darkest dreams, and would have occurred to Orwell only in his darkest dreams.

Maudlin, machine-produce "prolefeed" is one of the components of the dystopian state of Oceania in 1984, so suffice to say it did occur to him in his darkest dreams.

Expand full comment

"The AI-generated poems were generated using ChatGPT-3.5. "... So they use a shitty outdated model and are surprised when it produces subpar poetry? Would be very different result using something like Claude 3.5 Sonnet (or even 3 Opus) or chatgpt-4o-latest

Expand full comment

Yes, it was actually (see comments thread above). I agree with commenters who are critiquing excessive AI hype in the popular press though, as well as poor study design on who would be discriminating the samples.

Expand full comment

I haven't read the study, but I suspect that if someone thought the ChatGPT output was as good as Shakespeare, they haven't read Shakespeare! "Did my heart love till now? Forswear it, sight! For I ne'er saw true beauty till this night."

Expand full comment

The mere fact that GenAI’s default poetry is AABB should tell you that it’s simply taking the mean of all its training data, which is vast tracts of poetry from all kinds of online (often self-proclaimed) poets, not just the good or great ones — most of which is in AABB rhyme. And since most of all poetry is average and cliched, it’s also programmed and bound to take the mean of that, and produce dross. It’s quite literally destined for average. It can’t escape it.

(Throw in some hallucinations for head-scratching fun to add to the banality.)

For fun, ask ChatGPT to not rhyme.

Again and again, it can’t not.

Expand full comment

I didn't believe the headline period. Poetry is a human thing. Not a data point. We must now try not to make everything a data point in this A.I era. The hype must be shutdown. Human is human and A.I is A.i till further notice.

Expand full comment

In another venue, where the subjects of the experiments had been called "participants", I wrote:

"The word "participants" is doing a lot of work here. Said "participants" were folks who don't read much poetry much. Such as myself. (And most of the poetry mentioned is in a language (period English) that the "participants" don't speak and/or haven't studied.)

But I do (try to) play and study jazz (a bit), and can assure you lots of "participants" wouldn't like jazz, and even some who like it don't know what to listen for. Heck, there are tales about (and live recordings demonstrating) players getting pissed off at audiences who clap on 1 and 3, so drop a beat so that the audience clapping is shifted to 2 and 4, where it should be, thank you. So next we can do a study of how AI produced jazz is preferred by "participants".

I call BS. YMMV, of course."

Expand full comment

I have been working with LLMs to produce full novels and novel series, film scripts, papers and other writing media for quite some time, and seen virtually nobody do anything remotely similar. I have a Python system I wrote years back which takes a prompt - “I need a Romance Novel set in New Mexico, in the style of Barbara Cartland, give me 10 options”, and it will go though all the processes of writing and with guidance create a novel sent to on-demand printing and a week later book in hand. It elevates editing to a primary process. I think very few people understand the writing process, and how much is editing.

The question rightly is not whether it can simulate Shakespeare, it can do so incredibly well. The more complex question is can it be guided to create large scale complex works of literature - the chance that it creates something more sophisticated than the person steering it is quite small.

I have about 35-40,000 novel and fiction series books generated which I use to understand what prompt matrices and controls work. It was harder to get stable illustrations automatically created from text than text.

What LLMs can do that humans can’t is both duplicate a writing style and create a hundred versions (rewrites) at every step in the process. I could take that Barbara Cartland structure and have it rewritten by Nabokov or Chaucer or Austen or Rushdie, in parallel, in 30 minutes, edit and refine for a few hours and you have amazing works.

I’ve tried many, many single-pass (unsupervised) works but the only things which had adequate results were genre books like hardcore gay pulp fiction, or the aforementioned Romance Novel but I have learned more about fiction prompts ans deep structure. One group which worked very well were “Autobiographies” - my favorites were Abraham Lincoln, Oscar Wilde, Aleister Crowley, Bluto, William Burroughs, Walt Whitman…

For everyone I work with who contemplates LLM’s, my advice has been the same for some time - it will give you what you ask for. If you say “a poem by Shakespeare”, there are almost uncountable ways to respond to that, and of those a fraction which are of interest, and those some will be remarkable. You could search forever. The more precise you get the more amazing the result.

The sweet spot is to grasp how to use LLM’s to iterate prompt matrices to constrain the results automatically to a target which interests you.

Then the problem is whether the person originator of the works has suffice to taste or depth to create something of interest examining the human condition.

Expand full comment

Still, GenAI is an awesome tool, I use it every day. I am still amazed that this can even work, it's like magic.

But I don't think we really have to worry about creativity being replaced by LLMs. If we look at the mass market and mass media, 99% is already complete garbage. Just open Youtube, TikTok, Netflix or... Substack if you need some convincing. GenAI is just going to skew the ratio to 99.9%. No soul, no narrative, just a pile of garbage.

And the reality of today's society is that it's what people actually want. They want a distraction, to fill their already empty soul. They watch Netflix while working out, or doing their laundry, or even browsing their phone. The show is empty, the music is empty, the story is empty, and it renders all their tasks and activities empty as well.

In the end, the side-effect of GenAI might be, at least for some, the realization that this is not the way, and that we should disconnect the WiFi and fall back to reality and to substance.

Expand full comment

Great thoughts. Ted Gioia's Substack has some great essays on the antidote to this, whether hollow culture is bring produced by people or AI.

But sci-fi writers have predicted this foe a long time. Orwell's 1984 was mentioned above but I think Fahrenheit 451 by Ray Bradbury is a closer parallel to our culture. People can "amuse themselves to death" with trivia (Neil Postman) without even the kind of heavy-handed censorship that occurred in Bradbury's book. (Though both the new populist Right and the woke progressive Left are calling for more censorship these days as well.)

Expand full comment

Byung-Chul Han and other thinkers of our time also talk about this. They talk about the crisis of the narrative perpetuated by the additive nature of social media. I am starting to think that this is really where the meaning crisis stems from.

Expand full comment

LLMs already include the seeds of their own demise, having been trained on social media and mountains of other garbage (increasingly including their own output)

Why some folks believe that scaling the garbage pile up (on stuff like TikTok) will improve things is hard to fathom.

But who am I to question them.

Scale away!

Expand full comment

What came to mind is the blind taste tests of wine. That wonderful year when "2 buck chuck" won as the best wine. However, I will also note that when looked into more deeply, "2 buck chuck" is made from surplus grapes, and some years top-grade vineyards over-produce. The chemists of wine that produce "2 buck chuck" are as good or better than any other vintners. Given an excellent basis for their wine, they can make an excellent wine from it. This has little relation to ChatGPT though.

Expand full comment

I think the point of connection here is the Reinforcement Learning through Human Feedback - people still in the loop for fine tuning. Unfortunately though largely being paid in adequately and underrecognized to do that work

Expand full comment

Those who can, do. Those who can't create Artificial Imitation.

Expand full comment