112 Comments

GenAI promise: help us find more needles.

GenAI reality: makes bigger haystacks, spray paints them silver.

Expand full comment
author

🤣🤣😢

Expand full comment

EXTREMELY good analogy. And ROTFL.

Expand full comment

FANTASTIC!!!!!

Expand full comment

Obviously not a bot

Expand full comment

Absolutely this is frightening. And what I worry about also is when these scientifically inaccurate AI generated scientific journal articles are then hoovered up into the LLM training data. It will be a race to the bottom for reliable scientific knowledge.

Expand full comment
author

Really should have made this point!

Expand full comment

Ugh. Yea, there’s a horrific feedback loop there. I hadn’t thought of that.

Expand full comment

The question is how that situation might be undone. If that's even possible. (Considered as a goal, this is a case where "partially ameliorated" is insufficient to work as a solution.)

Expand full comment

"In my opinion, every article with ChatGPT remnants should be considered suspect and perhaps retracted, because hallucinations may have filtered in, and both authors and reviewers) were asleep at the switch."

And those authors should be stripped of their research positions, the reviewers fired, and the journals shuttered. Anything that has "Certainly, here is a list of..." or "I'm a language model" is corrupt and demonstrates that such articles were not reviewed in any meaningful sense. Everybody involved should be turfed with extreme prejudice.

Expand full comment

We also have to consider that those papers which contain those phrases are the low-hanging fruit and that anyone with ten minutes could mask over problematic phrases, allowing AI-gen papers to remain in the literature even after a cleanup job.

Expand full comment

Yes, absolutely. These phrases just identify the stupid criminals. The smart ones are better.

Expand full comment

I've read some research that LLM language might be rather recognisable (or at least the suspicion might be easy) as the language is rather middle of the road. If that is true, and it sounds logical given how LLMs work, then circumventing detection will be problematic. It is easy enough to generate-without-understanding good 'average' language, but more unique language might be undoable. A bit like un-stirring the milk from the coffee

Expand full comment

How soon before we train AIs to detect other AIs, similar to using AI to detect deep fake videos, I wonder?

Expand full comment

Clearly this is one of the few undeniable use cases that's emerged from the froth so far

Expand full comment

This is highlighting problems that have long existed in the peer reviewed literature. Prior to LLMs, it just took a little more effort to steal a chart, make an image edit, stick a few extra numbers into the spreadsheet, or exaggerate a claim. There are many cases of high profile researchers being revealed as plagiarists and fabricators. It's just easier now. And for that reason, it will escalate. This could be a good thing if it forces a new kind of rigor for journals that seek a good reputation...if such a thing exists anymore.

Expand full comment

Yes, spot on. We've had a reproducibility crisis in academia for quite a while, particularly with the softer "sciences" where reproduction is always going to be more difficult, but also in hard sciences. I think this is starting to show that academia is no better than a bunch of 11th graders cheating on their history term papers (which was my initial thought the first time I saw ChatGPT). Generative AI just gives dishonest people a more powerful tool and lowers the barrier to entry, thus increasing the temptation.

Expand full comment
Mar 15·edited Mar 16Liked by Gary Marcus

I am puzzled by the vindictiveness here, by the black-and-white thinking. There is no evidence that academia as a whole is broken. I struggle to think of how my own field, plant systematics and molecular phylogenetics, has a reproducibility crisis, for example. I guarantee that if you repeated any recent study published in any serious journal with a different set of markers, you would get pretty much the same results again; not in every minute detail because of well-understood biological realities such as deep coalescence and gene tree incongruence, but certainly at the level of what the studies are about, e.g., "is this genus a natural group or not?" or "in what geographic area did this group of organisms begin to evolve?". So, excuse me if I consider it insulting to be told I am an 11th grader cheating on my term paper when I produce real data, produce reproducible results, and write my own papers, and there is no evidence whatsoever that my field has any such problems.

What is more, although I would never use ChatGPT myself, even if somebody does, IMO the consequences depend on what they used it for. If they did a legitimate study and then asked it to suggest how to write a paragraph while struggling with the write-up, I don't really see how they have misbehaved to the degree that they deserve to have their economic existence destroyed. In the end, data, analysis, and interpretation are the key, so only manipulating or inventing data is what I would consider to be a career-ending move.

Expand full comment

The reproducibility crisis in academia has been well covered over that past several years. I personally believe it's based on two things: (a) honest researchers who construct poor experiments and end up publishing noise, and (b) dishonest researchers who either fake their results entirely or structure their experiments and report their results so as to favor a particular outcome (torturing the data, as it is known). Either way, the reproducibility problem is real. If you're one of the honest researchers publishing honest, reproducible work, that's great! You're part of the solution. But if that's the case, I'm not sure why you would want to shield those who might be doing shoddy work or even straying into full fraud. We should have zero tolerance for those things. Any peer-reviewer who lets "I'm an LLM..." through a reviewed paper makes a mockery of the peer review process and should be turfed. Similarly, any journal which claims to have an "editor" that publishes such a paper should be shut down. Each of them had "one job" to do, and they failed. That's inexcusable.

Expand full comment
Mar 17·edited Mar 17

Couple of disparate points here. First, I am not saying there isn't a reproducibility crisis somewhere, nor am I saying that I am exceptional; I am saying that there is no such crisis in my field. My problem is with blanket statements such as "science has a reproducibility crisis", "academia is broken", or "peer review doesn't work". None of this is the case, although it makes for good rage posts and click-bait. It is simply a lie to claim that the outputs of inorganic science, geology, astronomy, entomology, or population genetics are irreproducible nonsense indistinguishable from randomly made up claims. In any of these fields, you can go out there and, say, redo the genetics of this here rare species with SNPs, and you will again find the same general result as other population geneticists got a few years ago with microsats, or point your radiotelescope at the same patch of sky and replicate the astronomers' observations.

Second, peer review and editorial processes are heuristic. The purpose of peer review in particular is not to go over every word in a manuscript with a fine-tooth comb but to check if the science makes sense: are the data plausible, were the methods appropriate, do the conclusions follow? I am puzzled how "certainly, here is" could possibly slip past the editorial processes that I am familiar with, where I get obsessive queries about how exactly to cite a software I used or get my language edited from active to passive voice to match journal style, and therefore my suspicion, as expressed in other comments here, is that the affected journals are generally low quality. Ultimately, however, everybody can overlook something, accidents happen, QC is heuristic - it catches only most fraud (there are false negatives) and sometimes rejects a good paper (there are false positives). Saying that a credible journal that has been leading its field for decades needs to be "shut down" and every reviewer who sleepily read over a 'regenerate response' in her eightieth review after doing seventy-nine without flaw should be "turfed" is kind of unhinged, misunderstands the purpose of peer review, and would mean that in short order there are barely any journals and peer reviewers left. It is like, oh no, this dentist overlooked a decaying tooth *once* in his ten years' career, he now deserves to be unemployed for the rest of his life, or no, wait, dentistry has a deep-rooted problem and needs to be shut down entirely!

Third, and also as I wrote before, the point of science are data and their interpretation. I would never use ChatGPT to write something for me, but I still struggle to see what the problem is if a scientist has produced valid data using valid methods and arrived at a logical interpretation of those data and then uses ChatGPT to draft three paragraphs in the introduction. I am genuinely puzzled how the value of the resulting work would differ in any what that matters from him not having used ChatGPT at that step. Yes, leaving 'as an LLM, I cannot' in the text and seeing it through to publication is an extremely worrying sign of laziness and sloppiness on the part of authors and editors, absolutely no doubt about that. If it turns out that the entire contribution is worthless spam generated by PlagiarismBOT to buff up the author's CV, that should absolutely be a black mark against that person and lead to major soul-searching by the editors. What I take issue with is only the absolutism at display here, the readiness to throw the baby out with the bathwater, the eagerness to declare everything and everybody rotten and wipe the slate clean, as if there would ever be flawless humans and the possibility of exhaustive quality control even if you re-started the entire enterprise from scratch.

Expand full comment

Also, I should say that you're basically coming back with "The hard sciences don't have a reproducibility problem" (everything you cited was hard science). I agree that the hard sciences are certainly better (and said so previously in this thread). But there are issues in even "hard" areas (e.g., medicine). And to be clear, not all reproducibility issues are caused by nefarious actors. And having a "reproduction problem" doesn't mean that 100% of the papers in that area are false. But the fact that some results can't be recreated begins to taint the whole tower of dependent results. The more issues you have, particularly with widely-cited papers, the less you can rely on. Again, it sounds like you work in some areas that you feel are clean. That's great. Keep them that way.

Expand full comment

Well, sure, I could have been less absolute. But let's take an older issue in academic circles that predates LLMs: plagiarism. Many schools have a no-tolerance view of plagiarism. One strike and you're done. I see this issue of LLMs similarly. In fact, I would argue that copying and pasting verbatim from an LLM into an academic paper is plagiarizing the work of the LLM if it isn't cited. Is there some wiggle room? Absolutely. If you want to argue that somebody should get three strikes, okay, I'm probably with you for that. But the wiggle room is not infinite, or ever large (not 10 strikes). At some point, if we have any standards at all, heads must roll.

Expand full comment

The Professors who falsified data are what is terrifying people. I have no doubt most professors are great people, but it is terrifying that elite institutions got fooled by smooth-talking frauds. Furthermore, high tuition and bizarre campus behavior (antisemitism, cancel culture,...) has many people ready to believe the worst when it comes to American universities. I have discouraged many young men not to waste their time and money on useless tertiary education. I feel obligated to.

Expand full comment

I would be grateful to be pointed towards a profession that contains zero frauds, or where all frauds are immediately discovered upon even just attempting to begin a fraud. Given that professions are run by humans, I doubt that such a system exists. Even medicine, where lives are directly at stake, has its share of fraud and corrupt behaviour because, well, humans.

I have no connection to American universities myself and was educated in a country that had no tuition fees, but I understand cancel culture to mean "students making use of their freedom of speech to protest against things they disagree with" and a lot (although certainly not all) of what currently gets called antisemitism to mean "students making use of their freedom of speech to protest against the indiscriminate targeting of civilians in Gaza".

But either way, it would not follow that any and all tertiary education is useless. I wanted to become a biologist. The only way to do realise my dream was through tertiary education and a doctorate. That was therefore demonstrably useful to me.

Expand full comment

And that is wonderful. I am happy you are a biologist. I support young people who are ready for university and prepared to go there. But here in the US, where the cost is insane, students are admitted who need a year or more of no-credit study to prepare to take university-level classes. In addition, your description of cancel culture is not accurate. In the US we have many problems, but freedom of speech is one of our core characteristics. I believe in every individual's right to peaceful protest. I do not believe it is okay to use violence to stop those whose ideas who disagree with. Similarly, it is fine to have any opinion regarding international issues (for example, I feel strongly about many issues regarding Tibet and Bhutan, as well as Arunachal Pradesh in India), but this must be done in peaceful protest. It is unacceptable to assault, or threaten to kill or rape fellow students (that happened at my alma mater, Cornell). I am not Jewish or Palestinian, but it is unacceptable for individuals to be afraid to speak their minds. That is the norm in the US (hopefully not where you are).

I have friends who are tenured professors. I know most are good. The problem with bad apples is that if you get enough of them, they destroy the image of everyone (see racist police in American cities). I wish you the best, and I respect your field of research and teaching. I hope you do much to advance human knowledge in that area. What I am against is the enshittification of our glorious universities. So much is broken in the US. Our roads suck. Our hospitals suck. Universities, national parks and music are pretty much all we have left. We cannot afford to lose any more. I think this might be a culture issue, as the US has some really insane structures that make little sense, especially with regards to health care and tertiary education. I am all for profit-making businesses online, in gaming, in almost anything, but not in hospitals and education. The US really dropped the ball on that. The Universities and going to need to reform (too many are simply not going any more), but health care, that is terrifying here.

Expand full comment

Soft? That is an exaggeration of most so-called qualitative research... absolutely fluid if not vaporous is more like it.

Expand full comment

LOL, indeed. I come from the hard science side, so I was trying to be diplomatic.

Expand full comment

Qualitative research is a sensemaking guide, not an answer. The hard sciences may be great at building an AI, but they're not worth a damn in terms of living with what happens afterward--or defining what that'll be.

But to be more diplomatic--the hard scientists in this thread have cited at least two fallacious definitions of plagiarism, a subject with a rich (qualitative!!!) literature that is, um, actually kind of important. Subfocus of one of my more influential grad seminars on the history of American publishing :)

Expand full comment

"the total number of articles may radically spike, many of them dubious and a waste of reviewers’ time. Lots of bad stuff is going to sneak in."

Many of them dubious? I would suggest that all of the LLM created ones will need retraction.

With this sort of spike model collapse will be inevitable.

BTW, I have just been looking at the questions of Accuracy, Consistency and Completeness of LLM responses using very slight changes to the input prompt. The results are amazing and worrying.

Expand full comment
author

Send me results; I have been meaning for a long time to write about this. Maybe in April.

Expand full comment

I belive Lorenz called that "the Butterfly Effect." His observations of the Sensitive Dependence on Initial Conditions, was born and bred in computers.

Expand full comment
Mar 15Liked by Gary Marcus

Reminds me of the supercomputers sent by Trisolarions to mess with our science and scientific minds, in the book Three-body Problem.

Expand full comment

Surprise, we didn't need a hostile alien species to destroy us, we just needed a hostile techbro subspecies...

Expand full comment

Blaming Tech Bros for this is like blaming inner city violence on guns. It is pointless. It is bad enough we had a war on drugs. I do not think we can survive similar wars on guns and technology. I recently heard a young man telling his friends how they should "ban burning anything" to fulfill their environmentalist dreams. America needs more intelligent folks, tech bro or not. We cannot take much more ignorance.

Expand full comment

So, instead, we should let the techbros create the solutions to problems they created. Just like Altman had Worldcoin ready to go for human verification when AI he helped produce ruined the internet, and he just happened to personally benefit from both of those things at the expense of almost everyone else. Sorry not sorry, Technofeudalism doesn't particularly appeal to me.

Expand full comment

So roll your own. No one is forcing you to mess with crypto or use their social media platforms. I still get print magazines to avoid dealing with the crappy web of today. I wish I could still get text-only sites and use tcp/ip email connections, but the bad guys won (many of them state-paid thugs abroad who are hard at work to get Trump elected).

I miss the glory days of the internet, but every publicly funded alternative (see all the ones France has tried) has never actually obtained users. Profit motivates many young men and women to go to extraordinary lengths (like giving up their 20's to write code and fluff Venture Capital exec's).

The whole "if you are motivated by money then you are bad" idea is just a tool manipulative adults use to trick silly students into working for meager wages, or even for free in order to support organizations that allow ladies who lunch to travel the world in style and pretend like they "made a difference." I was married to one of these monsters from when I was 22 until I was 32 (she only married me because I could support her while she worked for NGO's). She and her peers are exactly the kind of people who destroy anything they touch (she works at Harvard now).

Profit motives can lead to some bad outcomes, but take that away and you just get raw power, and that is even worse. I get sick seeing young people lied to about the benefits of a crappy degree, or being lied to that they should pick a job based on passion instead of earning an honest living and having private time for passion.

We know how to control and regulate the profit-seeking. We have systems for that. We do not have systems to regulate the monsters at our universities that are lying to young people and warping their minds. Just look at the majors available at low-end universities. They offer degrees in Professional Sports Management, and Music Industry Management, or Sports Medicine Management,..., all these cool-sounding fields to trick 18-year-olds into borrowing too much money for a worthless degree. The very idea of telling a young person to work for free "if they really care." That entire fake virtue thing is immoral.

I would rather have well-regulated profit-seekers than ideological creeps. They are easier to manage and direct. If everybody were more profit-seeking, the world would be a better place. The anti-union creeps, the freaks on the left, all the problems are radicals. We need some kind of normative behavior again, whatever that may be. Just standards and less insane, ideology-based radicalism.

Expand full comment

You mean, TikTok? ;P same result, far more effective.

Expand full comment

And if you prompt GPT-4 the wrong way, it will just make shit up - like quantitative data - without providing a disclaimer. And then if you use these bullshit numbers in an article, and then later on, you ask for similar numbers, it will quote the numbers that it made up previously. And so the "knowledge base" builds...

Expand full comment

In the long ago era of 2015, when I first really started worrying about AI in general (no pun intended, sorry), I was consumed by the idea that if AI was able to simulate reality "well" enough, science would grind to a halt because nothing would be reproducible or verifiable without expenses exponential to the cost of generating shit in the first place. Science is based on providing proof, but anyone who must doubt every single thing about their reality gets nothing done and lives a tiring, miserable existence.

Personally, the only "upside" I have personally seen of this entire hype cycle is the ability to generate specific images on command for blogs like this one (which reduces their value to 0 immediately, no offense intended), and summarizing text that nobody was going to read anyway, especially if that text was AI-generated in the first place. These trinkets are not worth the price of our entire civilization. However dysfunctional it is currently, it will cease to function at all once this dreck clogs up every gear keeping society running.

Expand full comment

The Internet of the future will be free, public, good--pick two

Expand full comment
Mar 15Liked by Gary Marcus

"Trust but Verify" should apply to LLMs, not just nuclear proliferation, with a slight mod: Trust [gingerly] but Verify [profusely].

Expand full comment
author

gingerly->as little as possible

Expand full comment

Don’t trust before you verify

Expand full comment

I have checked the first twenty results that come up with the linked "certainly, here is" search. Eighteen of them are in low-quality, for-profit open access journals, generally run by 'publishers' I have never even heard of but that were clearly created in the last few years to jump on the open access wagon, or PDFs posted on the social media network ResearchGate. The two others are published by IEEE and Springer, but they are contributions to conference proceedings, which (for better or for worse) attract less scrutiny than a journal paper, because they are just accompanying a talk given by the authors, which would have been the main event. More confusingly, some of the publications did not seem to contain the phrase at all, so there also may have been a few false positives?

To be clear, I am concerned by the use of ChatGPT et al in science, especially when I see "Certainly, here is a literature survey with citations", given that this isn't just somebody having the bot create a draft of the introduction to overcome writer's block but instead circumventing one of the key steps of writing a paper, understanding the literature in your own field. Also, of course, the bots are known to make up references, so that's a terrible idea anyway.

Still, as I wrote in the previous thread, this doesn't show to me that it is a quantitative problem with serious journals. Hallucinated references, for example, would be discovered by every publisher I have recently published with, because their production editors cross-check all references while adding DOIs to the list. At that moment at the latest, the gig would be up, even if the reviewers missed a made-up reference.

Thus, currently, it still seems to me as if this is a match made in heaven between lazy and incompetent authors who use PlagiarismBot 4.0 and 'we will publish anything you want on our website and format it to look like a research paper, in exchange for a fee' style predatory operations run out of a garage somewhere, but not a significant problem in any scientific journal whose articles I, as a scientist, would actually read myself. The main problem will be that most people do not have the background knowledge to differentiate between a well-edited, high quality journal and the bottom-feeding paper mills, so they will mistakenly conclude that all of science must be broken now.

But in reality, this is like buying fake Rolexes off shady dudes in a back alley three times in a row, have them come apart in your hands each time, and then concluding that Rolex itself is a scam. Sorry, but not that company's fault if you can't figure out that this shifty-looking fellow who runs off the moment he has the money isn't selling you the real thing for fifty bucks. Likewise, I do not see what the scientific community can possibly do even in theory if some random tweeter who comes pre-convinced that science deserves to be shut down as a whole points selectively towards the poor standards of something with a name to the effect of "International Scholarly Journal of Futuristic Advancement in Innovation Science" that solicits contributions via emails that start with "Greetings of the Day, Professor!!!!". Hard-working, diligent scientists have no control over how many people set up a website and format it to ape the look of a research journal. It is easy! Any reader here could open their own journal within the week, if they put their mind to it.

A flood of submissions may become a problem for genuine journals, perhaps. We will have to see how that works out. At the moment I am happy (not really) to report that submitting a manuscript to most serious journals is such a soul-destroying battle with editorial management software that it may discourage spamming to a sufficient degree. There was a short window in my career where you could just email your manuscript to the editor but, alas, that definitely isn't the case these days.

Expand full comment

Maybe because of attention like Gary’s blogs the peer review system will get a much needed overhaul which it badly needed before this happened. This just makes it much more obvious. Great work!

Expand full comment
Mar 15·edited Mar 15

Bad "journals" and paper-writing factories have existed for many many years. It has had various effects, a key one of which has been the brand enhancement of, and focus on, a small number of key journals and conferences within each sub-field and specialist community. Most global academic communities are surprisingly small; one of the reasons I didn't want to spend the rest of my post-PhD life studying the intricacies of human colour vision was that I didn't fancy having the same arguments with the same 50 people for four decades. Pull one of these GPT howlers in a key publication for your sub-field and you can kiss your reputation goodbye in no time.

Expand full comment

“Shut it down if they can’t fix this problem.” 🎯

Expand full comment

The violations are so blatant that the authors did not even bother to delete the AI specific languages. I suspect lots of other authors also used AI but they just remembered to delete these languages. The author-reviewer system has always been based on good faith, and now it's under more threat than ever. PS: I guess some reviewers are also using AI.

Expand full comment

What’s more, it will start digesting its own output, as some have noted.

“Certainly, here is ‘Certainly, here is “Certainly, here is ‘Certainly, here is’ “ ‘ “ ...

Reminds me of standing in a bathroom that has a mirror on the back wall also, and you get this wild tunnel effect.

Welcome To The Hall Of Mirrors: empty and meaningless, reflecting whatever you put in the middle.

Expand full comment

I wonder if the solution to this problem ends up being a return to ye olden days, when luminaries in a field acted as gatekeepers, and it was essentially impossible to get any recognition without their blessing. On the one hand, gatekeeping is deeply unfair, and it massively slows progress (there is a reason why science was once said to progress "one funeral at a time"). On the other hand, gatekeepers do at least have an incentive to protect their reputation as arbiters of the quality of the work they endorse.

If you think about it, the cost balance between type I and type II errors shifts as the average quality of the submitted work declines. If the median paper is AI-generated garbage, then it's worth missing a few good papers to be certain of rejecting the chaff. Science as an enterprise suffers, both because valuable work gets discarded and because some worthy researchers can't get traction and get bounced from the field. However, it doesn't suffer nearly as much as it will if nobody can find the good work because it is lost in a sea of garbage.

Expand full comment

Great insight! Type I and type II errors will be hard for AI to distinguish if recognize consistently at all.

Expand full comment

"Move fast and break things" might not be the strategy it's been cracked up to be

Expand full comment
author

indeed that’s opening to my new book

Expand full comment

Granted, the slogan does work well as an informal definition of inertial momentum.

Expand full comment