47 Comments

Gary, this is huge. For background, I used to work for the CIA's Counter-proliferation Division, which existed to stop the creation and spread of Weapons of Mass Destruction like nukes, chemical weapons, and biological weapons. The development of chemical and nuclear weapons require chemicals, elements, and machinery that is distinctive, and thus easier to discover and shut down via treaty, sanctions, or covert avenues.

Biological weapons always were and always will be the toughest nut to crack, in terms of stopping their development, because so much of biological weapons development is identical with legitimate biological research. This means LLMs will make that already-hard task even harder. It also points out something every AI company will be loathe to admit: if you are improving a lethal technology like bioweapons, what you are developing is inherently dual-use, e.g. it can be used for civilian AND military ends. The most serious dual use technology always faces export restrictions for exactly that reason. I suspect one reason OpenAI's evaluation was, "Oh, this isn't statistically significant" is because if it WERE statistically significant, they've put LLMs in an entirely different regulatory category, and despite what they claim, IMO they do NOT want any meaningful regulation. Their valuation would PLUMMET if Uncle Sam said "Oh, hey, this is export restricted."

(of course, trying to enforce that would be a nightmare)

The fact that this study used GPT-4 with no safety guardrails in place (a model version the public can't access) is not a reason disregard the threat here. Meta's open-source LLAMA is only 6 months-1 year behind OpenAI, but because they've made their weights public, they've made the safety guardrails trivially east to shut down. We cannot pretend safety guardrails on ChatGPT will save us when LLAMA WILL catch up and LLAMA's guardrails can be disabled in an hour. That's one reason open-source models are potentially very dangerous. Meta will never admit that, anymore than OpenAI will admit LLMs can be dual use. Their whole business model depends on them never being classified that way. I posted something related to this a couple of weeks back. https://technoskeptic.substack.com/p/ai-safety-meme-of-the-week-d9e

Expand full comment
founding

1. The result is still derivative, it’s just that the LLM groups had more access to information of the *contents* of the sources on the Internet.

2. Nonetheless, this the information equivalent of the easy access to firearms. We really want a more efficient predictive analytics model that allows *more* antisocial behavior? I hope not…

3. Never screw with an experimental psychologist. The study of behavior has so much more noise and so little signal relative to the older sciences that we’ve created what are among the most formidable experimental designs and statistical analyses among all areas of scientific inquiry. Gary is right. The chances of this study being published in a peer-reviewed journal - esp. a tier 4 or 5 one, is zero.

Expand full comment
Feb 4Liked by Gary Marcus

What I found noticeable is that experts profited more from GPT4 than amateurs. This is in line of my expectations, in for instance coding. Beginners do not profit as much from these systems as expert coders. You cannot create expert coders out of amateurs by using LLMs, but you can improve the productivity of experts. It fits with the fact that to profit from these systems, *you* have to bring. the understanding to the table (as the LLMs doe not have any).

Expand full comment
Feb 4Liked by Gary Marcus

It is understandable that people are worries about possibility of future pandemic after the last one killed more than 7m people worldwide. You would think that humanity would be wiser and try really hard to prevent the next one. Then why are wet markets still operating in parts of the world and nobody seems to care? https://www.telegraph.co.uk/global-health/science-and-disease/why-wet-markets-will-never-close-despite-global-threat-human/

Expand full comment
Feb 4Liked by Gary Marcus

What irks me, by the way, is that more and more we're taking these non-peer-reviewed documents seriously. And that they are getting air time for these. Somehow I get the feeling this undermines the scientific method.

Expand full comment

I'm getting real sick of AI companies weaponizing Schrodinger's Apocalypse against the rest of the world. When the GPTs first became powerful Altman was wringing his hands over the end of the world but still preaching the promise of utopia. Now ClosedAI is publishing papers that downplay the risks of bioweapon development. Instead of society being forced to sit on the sidelines and watch these power-tripping mouth-breathers play with fire, we should have a mechanism to drag them before a court with actual power to ask "what the ever-loving fuck is wrong with you?"

Expand full comment

I'm not seeing the big deal here. Any tool that makes knowledge more available can be used for ill purposes. I'm sure building a bioweapon would be harder without other information dissemination tools: scihub, Wikipedia, Google search, bookstores, etc.

It made experts somewhat more effective in a time-limited setting, but a bioterrorist would have years of obsessive time to put into their demented project. I think it's fairly unclear that GPT would be a big lift IRL, and even if it is, there are a ton of legitimate use cases for researching disease-causing microbes.

Expand full comment

Isn’t a more relevant question: Is access to ChatGPT more dangerous than access to the internet in general. My guess is no, it is not.

Expand full comment

I wouldn't worry about it too much.

based on my experience (which I may not disclose in detail, sorry), much of what is on the interweb regarding how to make a "bioweapon" (toxin such as Botulinum, or bacterium such as Anthrax, for examples) is - according to Bayesian modeling - more likely to kill the operator than to result in a useable weapon.

large Bayesian networks like GPT are more likely to come up with innovative chemical weapons that can be made by armatures (e.g., substituting a different chlorinating agent or alcohol)

Expand full comment

Late to the post, because I was away. I always find these discussions of AI doom highly interesting but also frustrating, because there is rarely a clear mechanism of action. At the extreme, the likes of Yudkowski argue that AI will turn us all into paperclips or design a virus that kills us all with 100% mortality within the week, and if asked why we wouldn't just turn off the AI's power button or how that virus would work biologically, one only gets the response that the AI will so smart that it can do things we now consider impossible. Like zap us from the sky when we reach for the off-button, because that worked in that scifi horror story they read once. The problem is two-sided: for these people, intelligence is magic, so a superior intelligence is god-like; and they are perfectly ignorant of physics and biology and are therefor unencumbered by any understanding of what a virus can or cannot do, for example.

That does not apply to you, a cognitive scientist, but even with this piece I am puzzled how the danger is meant to manifest. As far as I understand, the 'tasks' here are intellectual, in fact comparable to googling and reading the scientific literature. There is an efficiency gain, yes, but what I don't understand is why the people who are expert enough to understand and make practical use of what ChatGPT summarises for them couldn't just as well do a literature search and arrive at exactly the same outcomes with a two week delay. Conversely, those users who cannot do the literature search right now are likely not competent enough to understand and make use of what they get from ChatGPT.

And then comes the actual bottleneck, having an extremely expensive, well equipped laboratory with all the right supplies from suppliers who have to follow strict regulations regarding who they sell certain items to. In that sense at least, there is an equivalent to the belief that a sufficiently smart AI can simply will us to Alpha Centauri by ignoring physical distances, radiation, micrometeorites, and most importantly funding and resource limitations to the building of spaceships, because that lab and those supplies and competent lab technicians do not manifest from thin air. Perhaps the problem with the study is that it looked at a small intellectual exercise in isolation instead of asking, "and now what?"

There are real dangers to the widespread use of AI, like drowning in spam, driving human creators out of business and thus impoverishing our culture, or making poor automated decisions, but it really doesn't click to me how this particular use case introduces a risk that hasn't existed since scientific journals were invented and made available in university libraries.

Expand full comment

The problem is the test is not real world. Lone wolf scientist are quite rare and are dangerous without LLM assistance. For example the anthrax attack shortly after 9/11

The attacks involved the mailing of letters containing powdered anthrax to various media outlets and two U.S. senators. Five people died and 17 others were infected as a result of the attacks.

The correct test would use non-scientists unversed in bioweapon technology. The danger is an LLM that assisted a terrorist.

That opens a pandora's box for any intelligent human to build weapons of mass destruction (WMD).

My experience was in the counter intelligence response to all the 9/11 event as a member of the technology group directorate that brought together all 15 agencies. We used a WMD knowledge graph taxonomy of concepts to index threats.

Expand full comment

I’d love help building a mental model here.

In some cases, you ridicule models for faults, hallucinations, inconsistencies, and inabilities to be trusted and how relying on LLMs is dumb because they’re not using abstract symbolic reasoning under the hood. Or that diffusion models constantly make obvious mistakes and will continue to do so with each generation.

In others, you describe the worry about how they are effective in helping people build bioweapons (and likely to continue to get better with larger models). Or how audio/visual/text models are getting very persuasive and believable and convincing others in deepfakes and pose grave threats.

While I have my own (hopefully!) consistent views here, I’m not sure I understand enough to describe your views on AI despite reading your substack since it started. I’d love to see an article that addresses these together holistically! Otherwise it feels like the common theme is “AI is bad!” but in contradictory dimensions each time in isolation (“it’s too capable! it’s too incapable!”). I need help putting those pieces together into a consistent framework of what you believe, and would love an essay filling fleshing that out at some point...thanks!

Expand full comment

Whatever about the appropriateness of the statistical evaluation - what do we know about the ecological validity of the measurements? What does an “increase in accuracy of 0.8” actually mean about real world outcomes of people trying to produce bioweapons? My best guess is “very little”.

Expand full comment

The small sample size and thus insufficient statistical power is a valid concern. However, you can not infer statistical significance by extrapolating from a tiny sample size (n = 25 per cell). However, the paragraph about the Bonferroni correction is completely off the mark. They report dozens of dependent variables. Obviously, they need to correct for multiple comparisons?

Tiny sample sizes and no correction for multiple comparison led to the replication crises in psychology. Quite disappointing to see you advocating for such shady practices.

These methodological problems should be solved by power analysis and a proper preregistration.

Expand full comment

Apart from the unpersuasive study itself, it would take more than a handful of experiments to demonstrate that GenAI —or any AI technique— was not a force multiplier for weapons development. Clearly, it is a potentially useful timesaver. The question is more about the effect size.

Expand full comment

I believe the 10^31 number in footnote 2 is an estimate of the number of virions, not the number of virus species, which is surely far smaller; apparently only in the millions.

Expand full comment