Just for fun, I tried the prompt "Create a picture of an elephant, with no living room in sight. Absolutely no living rooms." ChatGPT generated an image it described thus: "Here's the picture of an elephant standing in the vast open savannah. There's absolutely no living room in sight, just the natural beauty of the wild."
Only, the "vast open savannah" has 8-10 houses! All, presumably, with living rooms.
Case in point: today I was trying to get ChatGPT to help me write a python script to extract email addresses from an old database a client sent me. I wanted it to write a script to exclude email address occurring after "Return-path:" However it would keep insisting on interpreting "not after" as meaning "only before". No matter how many times I clarified the issue, it would unceasingly gravitate to that misinterpretation. It apparently has no ability to understand context – context that any programmer, even a beginner one, would.
We should evaluate ANNs by stupid errors they make. Too much hype is devoted to "superhuman" results, e.g. categorizing 1000 bird species. Too little attention is paid to completely embarrassing incompetence they display.
But don't you think this is an acutely bad analysis? DALLE is a diffusion model, and is prompted by GPT which is a language model. The problem isn't in the way the transformer works, it is in the system that DALLE uses to generate an image.
Has anyone tried to see if they can significantly boost reliability by using a system of multiple instances of the AI? For example, the primary instance generates five different responses, a team of (let’s say) 9 other instances vote on the best (or vote that a new set gets generated), whichever response wins is what’s sent to the user, and the primary system is made to forget/discard the others. I think they do something like this in their training process… I’m just wondering what it would actually be like to interact with such a system - would it be significantly more rational-seeming?
Brilliantly written ( the missing elephant aaryicle). At the heart of GPT is the transformer. It is based on pattern matching ( zero- lag cross correlation). The match depends on the word embedding- a mapping of words ( count tokens) to a 512 dimensional vector space.
Pattern matching of this type are very good at creating procedures. They start to fail when the requests/prompts implicitly require inference - even a small amount
This is a bad argument Craig so don't use this in your teachings. DALLE is a diffusion model, and is prompted by GPT which is a language model. The problem isn't in the way the transformer works, it is in the system that DALLE uses to generate an image. So this argument is dead in the water.
Happy Birthday, may there be 'absolutely no' cake today!
from your lips to Bing’s ears: https://x.com/garymarcus/status/1755622012280881514?s=61
I don't know why, but the bottom right elephant example (the empty room with "no elephant here" written on the wall) cracks me up.
Just for fun, I tried the prompt "Create a picture of an elephant, with no living room in sight. Absolutely no living rooms." ChatGPT generated an image it described thus: "Here's the picture of an elephant standing in the vast open savannah. There's absolutely no living room in sight, just the natural beauty of the wild."
Only, the "vast open savannah" has 8-10 houses! All, presumably, with living rooms.
Generative AI is channeling Magritte, as in "ceci n'est pas une pipe" : https://www.renemagritte.org/the-treachery-of-images.jsp
nothing new under the sun!
Case in point: today I was trying to get ChatGPT to help me write a python script to extract email addresses from an old database a client sent me. I wanted it to write a script to exclude email address occurring after "Return-path:" However it would keep insisting on interpreting "not after" as meaning "only before". No matter how many times I clarified the issue, it would unceasingly gravitate to that misinterpretation. It apparently has no ability to understand context – context that any programmer, even a beginner one, would.
nice example
We should evaluate ANNs by stupid errors they make. Too much hype is devoted to "superhuman" results, e.g. categorizing 1000 bird species. Too little attention is paid to completely embarrassing incompetence they display.
and too much shit is given to me for trying to reshape that balance 🤷♂️
The establishment will always resist wrongthink. Keep up the good fight, you are not alone.
But don't you think this is an acutely bad analysis? DALLE is a diffusion model, and is prompted by GPT which is a language model. The problem isn't in the way the transformer works, it is in the system that DALLE uses to generate an image.
happy birthday! !!
pd. Chat GPT reminds me some politicians :)
Is this a new AI meme now - "Absolutely no X"...?
Here's one: "Absolutely no understanding of AGI".
OpenAI et al should get t-shirts made.
Has anyone tried to see if they can significantly boost reliability by using a system of multiple instances of the AI? For example, the primary instance generates five different responses, a team of (let’s say) 9 other instances vote on the best (or vote that a new set gets generated), whichever response wins is what’s sent to the user, and the primary system is made to forget/discard the others. I think they do something like this in their training process… I’m just wondering what it would actually be like to interact with such a system - would it be significantly more rational-seeming?
GPT-4 does some version of this, an ideal called Mixture-of-Experts.
Happy Birthday. Hiding in plain site. The problems are.
Brilliantly written ( the missing elephant aaryicle). At the heart of GPT is the transformer. It is based on pattern matching ( zero- lag cross correlation). The match depends on the word embedding- a mapping of words ( count tokens) to a 512 dimensional vector space.
Pattern matching of this type are very good at creating procedures. They start to fail when the requests/prompts implicitly require inference - even a small amount
This is wrong though? DALLE isn't GPT. Why are you misrepresenting a language model as a diffusion model?
Happy birthday Gary; love your stuff and use it for the way I teach AI in humanistic approach as the true skeptic....best, Craig
This is a bad argument Craig so don't use this in your teachings. DALLE is a diffusion model, and is prompted by GPT which is a language model. The problem isn't in the way the transformer works, it is in the system that DALLE uses to generate an image. So this argument is dead in the water.
Happy Birthday Gary!
I love these! Here's Gemini - an office with no giraffes in it (they're breaking in!) https://twitter.com/khulick/status/1755619256534696071
"Draw me abtouulyy no polar bear" is my *favorite*. Thank you! Adorable!
Well, Gary, it’s about time some addressed the elephant in the room! Which as we agree, sucks.
Happy Birthday! With age comes wisdom, insight, and a body in collapse :). Two out of three ain’t bad…