Major publishers and aggregators are also under pressure to add chatbots to their products. Elsevier is offering Scopus AI, for example. It uses RAG and a knowledge graph, and the answers seem to come from the summaries of the papers it selects; however, it may still be very misleading. The answers are slightly different every time, depending on the papers it includes. This is not obvious to most users. Users are asking that the summaries be consistent, and Elsevier's reps were not making it obvious that this is not possible. They also said it was great for topics they did not understand, but underwhelmed when they were experts in the topic. This is a red flag to me - nearly any source looks good when you know very little, and it is hard to know what you don't know. Scopus AI may be fine if users understand that it is a place to generate ideas to explore further (and verify), but it may be a problem if users assume that it is trustworthy.
This made me smile, because I remember the same criterion of "he sounded smart when he talked about stuff I didn't know but like a fool when he talked about something I knew" applied to a certain human entrepreneur. Yes, that is a humongous red flag. Wish more people would recognise it as such!
Great to put a name on the phenomenon. Real life example of this recently with an OT who loved Huberman's podcast, and she was shocked shocked shocked that his take on autism did not match her professional opinion. I pointed out that perhaps his other episodes were not reliable (with examples) and her mind was blown.
In my opinion, this sort of thing is expected, given that science has degenerated significantly from just a seeking of knowledge. A lot of science is more of a game now that is just about securing more funding for research that really won't help anyone. It's also become a religion, not in the way it operates, but regarding its societal role: people need something to believe in because religion is being displaced.
So, I wonder: does this AI fakery show the danger of AI (which is indeed a danger) or is it actually better at showing the farce of modern science?
Hopefully. One of the main reasons I am moving away from chemistry is publish or perish. What's the point of making compounds/materials and characterizing them mindlessly if no real insights are gained (and if you work on a few projects for several years, bureaucratic issues of "what's hot" (even if scientifically unjustified based on years of hype and zero hard progress in a certain "hot area") and "who's your boss" can still lead to low visibility for your projects, even if they ultimately lead to powerful insights? ). Also, the environmental impact of making compounds and working in a lab is huge. Similar arguments apply to computational methods in chemistry (with a few exceptions, what's the point of characterizing thousands of materials with 100 different functionals? there will always be tradeoffs, and the differences are negligible for most chemical purposes). In this respect, AI research seems more representative (i.e. relevant, even if a bit hyped, at least it's good for your mental health as a researcher to know that your research, even if hyped, is at least visible and provable via products), people actually care about AI (unlike chemistry), I can have a high (positive) impact on society (even if it's at the product level and things get abstracted away as intellectual property, at least I get paid a lot more), and AI is extremely relevant and fascinating. I really hope that the advent of LLMs correlates with the death of publish or perish, so that chemistry and other "hard sciences" can become hot and relevant again.
Here's the thing... Hallucinations and mistakes from chatbots can degrade science. But given the evidence coming out about how much data is faked and images manipulated in papers submitted to even the best journals, my bigger fear is genAI being used to manipulate data, generate fake images, and just up your H-index. These are tools for the complete enshittification of science, until we come up with rigorous tools to detect genAI output.
"could quickly overwhelm their reviewing processes"
Could? It's already happening and I'm refusing article reviews by the dozens. Last year I'd not have refused a single one in fear of being dropped from the reviewer panel(s), but this is easily overwhelming every productive and tenured scientist with some reviewing track record RIGHT NOW[1].
[1] Especially if you work in anything related AI, duh.
The Ur problem is "Publish or Perish" where quantity is the sole analytical metric. Now add ChatBots spewing word salad at the speed of electrons and the necessary and sufficient conditions for the destruction of scientific publishing in this decade. Since we know the ChatBot manufacturer's attitude is "give me my money, who cares who gets my pox" we shouldn't expect any help there. My solution: anyone submitting a paper shown to be Chatbotted should have their name placed in a publicly accessible file at, say, the National Science Foundation and permanently banned from publication.
As LLMs start to ingest the output of other LLMS and their own work, possibly through RAG, the output will become ever more unreliable, bizarre and useless.
I gave a lecture on AI and Social Work last night. Highly recommended your subreddit and X. Thanks for this. Always good. Time for philosophy of science to weigh in.
Derek Lowe's "In the Pipeline" blog at Science wrote about this a year or so ago. He noted a slew of questionable papers of Chinese origin being published on organometallics (if I doth remember accurately.)
Honestly, I see this as a net good for the science community because Gen AI is just being used by authors who were previously just spewing out human generated garbage. What comes to mind are the Helen Pluckrose et. al. who published absolute garbage in journals to show how bad the journals really were.
The issue with bad writing has plagued the scientific community for years and if LLMs force them to be better then I think it will also clean up bad human writing too.
Oof, yeah some fields were long gone way before LLMs. The Grievance Studies Hoax made that clear as day. Now anyone can author random postmodernist critical theory word salad with ChatGPT!
And they're the type most likely to use AI too. I really hope that's the first field to break under the weight of psychobabble spewed out by AI. Kind of telling that as bad as AI is at writing, it can nail Grievence Studies jargon!
Fair point, but the issue is wider than that: While there is (very!) finite supply of editors and referees, the floodgates are open on generated submissions. The breakdown is not in the submission part, but in the review part. And you're not really suggesting to automate that, right?
No, not automate that part but I think the selection process needs to be more focused. One issue is that reviewers are often not paid and so they just scan.
We are paid, it is called our salary. Please do not turn everything into a gig economy, because if you think reviewers "just scan" when doing it out of a sense of duty to their field of research, wait until there is a monetary incentive for the worst-paid scientists to accept as many papers as possible to review and maximise through-put.
(Also, where does the money for paying reviewers come from? Higher article publication fees, so that colleagues at poorer institutions have it even harder to publish? Or higher subscription fees, to make science even less accessible? Elsevier's profits? If it is the third, you may be interested to learn that I have a bridge to sell at a very reasonable price!)
(Also, many of us would not even receive the payment. Again, salaried employee here. Employer would negotiate a consultation contract with the publisher and collect the reviewing fee themselves, because I am expected to work full-time for my employer and not have any side-hustles.)
If you are a reviewer, based on the amalgamation of odd phrases, I can see why the process is broken. If you'd care, please rephrase into a cogent statement.
Ye gods, what a way to avoid engaging on any of the issues I raised with the idea of paid review. This is a reply to a reply to a reply ... under a newsletter post, and I am not even a native speaker of English. The world will not end if I use an incomplete sentence as a stylistic device for effect in this particular medium. What is more, avoiding odd phrases in reviewer reports is also hardly the purpose of peer review - it is to discover serious flaws in analysis or interpretation, or attempted fraud.
Right, so I hope you realize what I did was expose serious flaws in your analysis and intepretation. My original point was that the review process more often broken when scientific papers like this get through.
Are you suggesting the example in the original essay was reviewed with more than a scan? Because if it was, then it confirms my assertion that the review process is broken.
I thought that was a tumour at first, not (apparently) a rat with giant testicles looking up at its own penis that towers above its own body. It looks so wrong, I'm still somehow not confident in what I'm seeing, like as if I'm seeing something supernatural and want a rational explanation for it. Not sure a polite way to put it, but just wtf? Imagine coming across that looking at the paper on your own and seeing that, without any knowledge of it being from AI, trying to process what you're looking at!
Interestingly, the retraction statement refers to its "AI-generated figures," so I assume that means all the images are from gen AI, which makes one wonder about the text as well. The other images are far more technical and less obviously inaccurate to me as someone unfamiliar with this area of medical science, despite that I suspect they're equally fictitious if also AI generated. Worrying how this may have gone unnoticed for much longer if they had been wiser to exclude that particular image.
I have a PhD in molecular and cell biology and the other figures are clearly AI-generated. In fact, when a friend sent the article it was one of the other figures that raised my eyebrows (a complex diagram of the JAK/STAT signaling pathway in which every single molecule is either JAK or STAT).
I sort of assumed the mouse one was some kind of stylized zoom-in, sort of how you might do a cutaway of a blood vessel coming out of a forearm.
I concur that reputation and integrity are critical.
"FWIW both articles came from China, locked in a race with the US to dominate the science journal productivity statistics."
Then "science journal productivity statistics" must be pretty simple to game. Is it the lack of consequences that have teeth when a paper is shown to be "smoke & mirrors"? Maybe an economist could use game theory to sort this out.
Stop and consider the current state of affairs in the media. Rathergate in 2004 cost Dan Rather his job and the producer he was working with never worked in the industry again. Today it would be ignored or get a Pulitzer Prize.
In fairness, that rat paper was in Frontiers, so much reputation wasn't at risk in the first place.
That leads to my two cents here. The perception by many on social media whenever something like this happens is "academic publishing must be completely broken for this to be possible". No, that is simply not how this works.
First cent: The quality of journals varies widely. Be it higher-tier journals in their respective fields like the Journal of Biogeography or small taxonomic journals run by a natural history collection, no journal that I am publishing in would fail to catch something like the rat figures. They have actual quality control processes from peer review right through to type setting editors and are run by editors who are motivated by professional pride and love of their field. Other journals, however, are paper mills where you can publish pretty much whatever in exchange for a fee, journals that have no history past and no purpose beyond being founded to make a buck when open access publishing arose. The problem is, only an expert may know those two types apart for their given field. Although being published by Frontiers or MDPI may serve as indicator in one direction, being published by a more established company like Elsevier is unfortunately no guarantee of quality by and in itself.
Second cent: Even under the best of circumstances, I cannot fault editors and reviewers for overlooking the odd instance of parts of a text being generated, even if it has "regenerate response" in it somewhere. People are overworked and overloaded and may miss something. Peer review is a heuristic - even at its best, it sometimes overlooks a fraud and rejects a good paper. And while I would prefer a world where people have to write their own papers without help from generative AI or be rejected, ultimately what matters is whether they have novel data and analyse and interpret them correctly. It won't be the end of the world if a team that legitimately discovered a new group of bacteria has ChatGPT suggest the first part of the introduction when they start writing up their discovery.
The problem won't be with the quality of science itself, as competent scientists will be able to tell the difference and know who does good work just like in the past. It will what a stream of garbage 'review articles' published in high-throughput open access journals will do to the public perception of science.
To be clear, I know that serious colleagues publish good papers in Frontier journals. But I think it is fair to say that there are journals that were founded by researchers who thought it is time for the field to have a journal specialising in topic XYZ, because there is a need for it, and journals that were founded because a publisher wanted to earn money, and that this difference may come with certain implications for their seriousness and level of quality control at least on average. Not a binary by any means, but the effect of the underlying mission, pride, and incentive structures.
Feel bad for future generations that will have to sift out synthetic bullshit like this. Feels synonymous with all the microplastics in the ocean, can we ever clean it all out? Probably not, then again.. maybe there's an AI for that. 🙄
Major publishers and aggregators are also under pressure to add chatbots to their products. Elsevier is offering Scopus AI, for example. It uses RAG and a knowledge graph, and the answers seem to come from the summaries of the papers it selects; however, it may still be very misleading. The answers are slightly different every time, depending on the papers it includes. This is not obvious to most users. Users are asking that the summaries be consistent, and Elsevier's reps were not making it obvious that this is not possible. They also said it was great for topics they did not understand, but underwhelmed when they were experts in the topic. This is a red flag to me - nearly any source looks good when you know very little, and it is hard to know what you don't know. Scopus AI may be fine if users understand that it is a place to generate ideas to explore further (and verify), but it may be a problem if users assume that it is trustworthy.
thanks and fully agree with concerns you raise
This made me smile, because I remember the same criterion of "he sounded smart when he talked about stuff I didn't know but like a fool when he talked about something I knew" applied to a certain human entrepreneur. Yes, that is a humongous red flag. Wish more people would recognise it as such!
We even have a name for it (https://en.wikipedia.org/wiki/Michael_Crichton#Gell-Mann_amnesia_effect) and I love everything about your comment as it is the fundamental essence of the self-actualized ability to stop and THINK.
Great to put a name on the phenomenon. Real life example of this recently with an OT who loved Huberman's podcast, and she was shocked shocked shocked that his take on autism did not match her professional opinion. I pointed out that perhaps his other episodes were not reliable (with examples) and her mind was blown.
In my opinion, this sort of thing is expected, given that science has degenerated significantly from just a seeking of knowledge. A lot of science is more of a game now that is just about securing more funding for research that really won't help anyone. It's also become a religion, not in the way it operates, but regarding its societal role: people need something to believe in because religion is being displaced.
So, I wonder: does this AI fakery show the danger of AI (which is indeed a danger) or is it actually better at showing the farce of modern science?
It's increasingly obvious to me that both things you mention are really two sides of the same coin.
Correct. I should have said it that way!
Might this become the unexpected end of publish or perish?
Hopefully. One of the main reasons I am moving away from chemistry is publish or perish. What's the point of making compounds/materials and characterizing them mindlessly if no real insights are gained (and if you work on a few projects for several years, bureaucratic issues of "what's hot" (even if scientifically unjustified based on years of hype and zero hard progress in a certain "hot area") and "who's your boss" can still lead to low visibility for your projects, even if they ultimately lead to powerful insights? ). Also, the environmental impact of making compounds and working in a lab is huge. Similar arguments apply to computational methods in chemistry (with a few exceptions, what's the point of characterizing thousands of materials with 100 different functionals? there will always be tradeoffs, and the differences are negligible for most chemical purposes). In this respect, AI research seems more representative (i.e. relevant, even if a bit hyped, at least it's good for your mental health as a researcher to know that your research, even if hyped, is at least visible and provable via products), people actually care about AI (unlike chemistry), I can have a high (positive) impact on society (even if it's at the product level and things get abstracted away as intellectual property, at least I get paid a lot more), and AI is extremely relevant and fascinating. I really hope that the advent of LLMs correlates with the death of publish or perish, so that chemistry and other "hard sciences" can become hot and relevant again.
Remember the 1990's? How internet would be a fountain of good and true? Boy, history does rhyme.
😢
Regulating chatbots *is* regulating people
Assuming LLMs actually cut it
I'd agree with that, but it's besides the point. Maximizing the potential of the internet is what should be aimed for.
Here's the thing... Hallucinations and mistakes from chatbots can degrade science. But given the evidence coming out about how much data is faked and images manipulated in papers submitted to even the best journals, my bigger fear is genAI being used to manipulate data, generate fake images, and just up your H-index. These are tools for the complete enshittification of science, until we come up with rigorous tools to detect genAI output.
"could quickly overwhelm their reviewing processes"
Could? It's already happening and I'm refusing article reviews by the dozens. Last year I'd not have refused a single one in fear of being dropped from the reviewer panel(s), but this is easily overwhelming every productive and tenured scientist with some reviewing track record RIGHT NOW[1].
[1] Especially if you work in anything related AI, duh.
The Ur problem is "Publish or Perish" where quantity is the sole analytical metric. Now add ChatBots spewing word salad at the speed of electrons and the necessary and sufficient conditions for the destruction of scientific publishing in this decade. Since we know the ChatBot manufacturer's attitude is "give me my money, who cares who gets my pox" we shouldn't expect any help there. My solution: anyone submitting a paper shown to be Chatbotted should have their name placed in a publicly accessible file at, say, the National Science Foundation and permanently banned from publication.
As LLMs start to ingest the output of other LLMS and their own work, possibly through RAG, the output will become ever more unreliable, bizarre and useless.
It is called Model Collapse.
I gave a lecture on AI and Social Work last night. Highly recommended your subreddit and X. Thanks for this. Always good. Time for philosophy of science to weigh in.
Derek Lowe's "In the Pipeline" blog at Science wrote about this a year or so ago. He noted a slew of questionable papers of Chinese origin being published on organometallics (if I doth remember accurately.)
Honestly, I see this as a net good for the science community because Gen AI is just being used by authors who were previously just spewing out human generated garbage. What comes to mind are the Helen Pluckrose et. al. who published absolute garbage in journals to show how bad the journals really were.
The issue with bad writing has plagued the scientific community for years and if LLMs force them to be better then I think it will also clean up bad human writing too.
https://www.theatlantic.com/ideas/archive/2018/10/new-sokal-hoax/572212/
Oof, yeah some fields were long gone way before LLMs. The Grievance Studies Hoax made that clear as day. Now anyone can author random postmodernist critical theory word salad with ChatGPT!
And they're the type most likely to use AI too. I really hope that's the first field to break under the weight of psychobabble spewed out by AI. Kind of telling that as bad as AI is at writing, it can nail Grievence Studies jargon!
Fair point, but the issue is wider than that: While there is (very!) finite supply of editors and referees, the floodgates are open on generated submissions. The breakdown is not in the submission part, but in the review part. And you're not really suggesting to automate that, right?
No, not automate that part but I think the selection process needs to be more focused. One issue is that reviewers are often not paid and so they just scan.
We are paid, it is called our salary. Please do not turn everything into a gig economy, because if you think reviewers "just scan" when doing it out of a sense of duty to their field of research, wait until there is a monetary incentive for the worst-paid scientists to accept as many papers as possible to review and maximise through-put.
(Also, where does the money for paying reviewers come from? Higher article publication fees, so that colleagues at poorer institutions have it even harder to publish? Or higher subscription fees, to make science even less accessible? Elsevier's profits? If it is the third, you may be interested to learn that I have a bridge to sell at a very reasonable price!)
(Also, many of us would not even receive the payment. Again, salaried employee here. Employer would negotiate a consultation contract with the publisher and collect the reviewing fee themselves, because I am expected to work full-time for my employer and not have any side-hustles.)
If you are a reviewer, based on the amalgamation of odd phrases, I can see why the process is broken. If you'd care, please rephrase into a cogent statement.
Ye gods, what a way to avoid engaging on any of the issues I raised with the idea of paid review. This is a reply to a reply to a reply ... under a newsletter post, and I am not even a native speaker of English. The world will not end if I use an incomplete sentence as a stylistic device for effect in this particular medium. What is more, avoiding odd phrases in reviewer reports is also hardly the purpose of peer review - it is to discover serious flaws in analysis or interpretation, or attempted fraud.
Right, so I hope you realize what I did was expose serious flaws in your analysis and intepretation. My original point was that the review process more often broken when scientific papers like this get through.
Are you suggesting the example in the original essay was reviewed with more than a scan? Because if it was, then it confirms my assertion that the review process is broken.
Metrics, targets, you know the drill.
I wish I lived in the world where the techbros were held to account....
I thought that was a tumour at first, not (apparently) a rat with giant testicles looking up at its own penis that towers above its own body. It looks so wrong, I'm still somehow not confident in what I'm seeing, like as if I'm seeing something supernatural and want a rational explanation for it. Not sure a polite way to put it, but just wtf? Imagine coming across that looking at the paper on your own and seeing that, without any knowledge of it being from AI, trying to process what you're looking at!
i gave a link to the article. check it out!
Interestingly, the retraction statement refers to its "AI-generated figures," so I assume that means all the images are from gen AI, which makes one wonder about the text as well. The other images are far more technical and less obviously inaccurate to me as someone unfamiliar with this area of medical science, despite that I suspect they're equally fictitious if also AI generated. Worrying how this may have gone unnoticed for much longer if they had been wiser to exclude that particular image.
I have a PhD in molecular and cell biology and the other figures are clearly AI-generated. In fact, when a friend sent the article it was one of the other figures that raised my eyebrows (a complex diagram of the JAK/STAT signaling pathway in which every single molecule is either JAK or STAT).
I sort of assumed the mouse one was some kind of stylized zoom-in, sort of how you might do a cutaway of a blood vessel coming out of a forearm.
Science is so yesterday. Obedience is the new thing.
I concur that reputation and integrity are critical.
"FWIW both articles came from China, locked in a race with the US to dominate the science journal productivity statistics."
Then "science journal productivity statistics" must be pretty simple to game. Is it the lack of consequences that have teeth when a paper is shown to be "smoke & mirrors"? Maybe an economist could use game theory to sort this out.
Stop and consider the current state of affairs in the media. Rathergate in 2004 cost Dan Rather his job and the producer he was working with never worked in the industry again. Today it would be ignored or get a Pulitzer Prize.
In fairness, that rat paper was in Frontiers, so much reputation wasn't at risk in the first place.
That leads to my two cents here. The perception by many on social media whenever something like this happens is "academic publishing must be completely broken for this to be possible". No, that is simply not how this works.
First cent: The quality of journals varies widely. Be it higher-tier journals in their respective fields like the Journal of Biogeography or small taxonomic journals run by a natural history collection, no journal that I am publishing in would fail to catch something like the rat figures. They have actual quality control processes from peer review right through to type setting editors and are run by editors who are motivated by professional pride and love of their field. Other journals, however, are paper mills where you can publish pretty much whatever in exchange for a fee, journals that have no history past and no purpose beyond being founded to make a buck when open access publishing arose. The problem is, only an expert may know those two types apart for their given field. Although being published by Frontiers or MDPI may serve as indicator in one direction, being published by a more established company like Elsevier is unfortunately no guarantee of quality by and in itself.
Second cent: Even under the best of circumstances, I cannot fault editors and reviewers for overlooking the odd instance of parts of a text being generated, even if it has "regenerate response" in it somewhere. People are overworked and overloaded and may miss something. Peer review is a heuristic - even at its best, it sometimes overlooks a fraud and rejects a good paper. And while I would prefer a world where people have to write their own papers without help from generative AI or be rejected, ultimately what matters is whether they have novel data and analyse and interpret them correctly. It won't be the end of the world if a team that legitimately discovered a new group of bacteria has ChatGPT suggest the first part of the introduction when they start writing up their discovery.
The problem won't be with the quality of science itself, as competent scientists will be able to tell the difference and know who does good work just like in the past. It will what a stream of garbage 'review articles' published in high-throughput open access journals will do to the public perception of science.
Ouch re Frontiers…
To be clear, I know that serious colleagues publish good papers in Frontier journals. But I think it is fair to say that there are journals that were founded by researchers who thought it is time for the field to have a journal specialising in topic XYZ, because there is a need for it, and journals that were founded because a publisher wanted to earn money, and that this difference may come with certain implications for their seriousness and level of quality control at least on average. Not a binary by any means, but the effect of the underlying mission, pride, and incentive structures.
Feel bad for future generations that will have to sift out synthetic bullshit like this. Feels synonymous with all the microplastics in the ocean, can we ever clean it all out? Probably not, then again.. maybe there's an AI for that. 🙄
that’s a damn fine (though unbearably sad) metaphor