20 Comments
founding

Amazing! Still derivative, but impressive.

Imagine if “Gen AI” were called “Internet/Digital reproduction engines”. Copyright law would have shut it down in less than a month. But since it’s “AI’, anything goes.

Except, of course, intelligence :)

Expand full comment

I wonder to what extent this kind of thing—and fundamentally not being sure who/what you're dealing with when you're online—will lead to a relative increase in the value of in-person relationships going forward. The only way to be sure it's a real person if he's in front of you, in the flesh. (Until we have lifelike androids, I guess? then all bets are off)

Expand full comment
Feb 4·edited Feb 4Liked by Gary Marcus

"I am reminded of the economist’s term negative externalities. In the old days, factories generated pollution, and expected everyone else to deal with the consequences. Now it’s AI developers who are expecting to do what they do scot-free, while society picks up the tab."

Mainstream economists still tell me that "negative externalities are the exception". There is a big reluctance to even account for them. But I agree that this needs to change. Not only in AI.

Expand full comment
Feb 5·edited Feb 5Liked by Gary Marcus

Maybe negative externalities are even more important to AI than first meets the eye.

AI is one of the core technologies promoted by techno-optimists. But what justifies the optimism of the techno-optimists in the first place?

I believe it is their blindness to negative externalities.

Expand full comment

Why do scam artists get real-looking CGI, but the CGI in multi-million dollar Hollywood movies looks so fake?

Expand full comment

In a sense the future seems to be morphing into the past. Soon physical presence will be the only way to legitimize identity.

Expand full comment
Feb 7·edited Feb 7Liked by Gary Marcus

Yes. And physical media and hard copy books with Library of Congress registration, copyright, etc. will loom much more important in terms of being considered reliable sources.

This new voice and image deepfake capability opens up all sorts of opportunities. This new scam might be viewed as an example of AI sheep-dipping, for example. In the "espionage/intelligence" sense of the metaphor- the construction of a false biography, resume, or CV, which can now be extended to voice mockups and image mockups.

This is entirely different than the newer "cybersecurity/counterintelligence" usage, where a sheep-dipped confabulated identity is referred to as a "sheep", and "sheep-dipping" is defined to constitute the means of effectively vetting bogus representation

https://www.opswat.com/blog/what-does-sheep-dip-mean-cyber-security

I get that tightly networked cybersecurity circles have a lot of safeguards against hackability. But exactly how much protection are individuals, businesses, information content providers, news media, artists and musicians- or even singers- expected to maintain, in the deepfake era? (Not sure about the limitations of deepfake voice imitation, in the realm of singing- speech imitation is relatively unambitious, compared to singing, and the phone is such a low-fidelity device that it helps to facilitate the process of deepfaking. But how much verisimilitude can AI attain when compared to a conversation or singer being heard in a real-world environment, or via high-resolution media recorded for high-fidelity audio transmission and reproduction?

I've always thought that the Achilles Heel of the Internet and digital media was its inherent mutability and vulnerability to tampering and unlicensed copying, which can be reliably counteracted only by a bulletproof level of Provenance. In that respect, I'd expect blockchain ID technology to become much more widely used. But I anticipate that there will likely be workarounds and deepfakes for that, eventually.

Yet another example of how an ethos of voluntary good conduct can be undermined by a miniscule fraction of one percent of users who are Bad Actors. Possibly to the extent of completely disabling some features of digital communication that were formerly taken for granted.

Expand full comment

I think it is equally plausible that the victim made up the story about the deep-fake to save face regarding their lack of due diligence. The original article doesn't present any hard evidence that a real-time deep-fake was involved, so it seems we only have the victim's word for it. So there are two possible scenarios: a) a savvy professional was duped by super-impressive real-time deep-fake, b) the victim invented story about deep-fake to offset accusations of negligence. For me, scenario b) has a higher prior likelihood than a), and currently there is no hard evidence that would lead me to revise my priors in this particular case.

Expand full comment
author

We have the cops word. And surely they interrogated him. So no.

Expand full comment

This just pushes the credibility issue onto the cop. If the cop is not an expert, then the cop's prior belief in the possibility of convincing real-time deep-fakes will be quite high, possibly much higher than that of the savvy "victim". So the cop may figure, even without tangible evidence, that it is more likely that there was a RT deep-fake than that the victim was lying. There is no tangible evidence here. From a non-layman perspective, the existence of a convincing interactive real-time deep-fake is highly unlikely, and this extraordinary claim requires extraordinary evidence, which is totally lacking here. What this story demonstrates is not that we are easily taken in by deep-fakes, but that we are easily taken in by stories about deep-fakes.

Expand full comment

Well, I've got about 50 years of experience looking into incidents like these, going back to the Security Pacific Bank one that gets touted as the first computer crime (it wasn't). So there are things that make the story plausible and things I have concerns about.

Plausible: The amount is about right for a big multinational target, big enough to be worth the trouble to the attacker but within the discretionary amount of (say) a trading desk. Unilateral authority to transfer is also plausible because most such businesses place speed and convenience over security. A request for a secret transfer by the CFO also plausible for a Hong Kong operation because lots of sketchy sh*t goes on there. You can't cheat an honest corporation :-)

Concern: The ability to generate an animated deepfake in real time from prompts. It seems to me this is necessary in case the mark asks a question -- I don't think simple playback would work. I can accept audio only but I'd need to see a demo of audio plus video. Would be a cool thing for Gary and friends to do. Have W.C. Fields (distinctive appearance and voice, lots of training material) come in over Zoom and interact with you.

Expand full comment
author

the real time stuff that was demo’d last year at TED was very impressive/very scary. can definitely be done in real time now.

Expand full comment
founding

Essentially, yes. Whether they (the “Big 5” Techs) generated did it internally, or provided the APIs to others, they were fully aware of the “fair use” violations these bots were based on. They remain liable.

Expand full comment

I have some doubts!

Is this an updated version of the old "tie up and beat up" the insider bank heist trick?

I'm sure the Hong Kong police will supply the necessary details to assuage my doubts.

AI might be getting a bum rap in this case. Possibly.

Not an accomplice but an "innocent bystander"?

Clever people getting ahead of the pack in using AI.

Expand full comment
author

I do think it’s fair to ask your question, and see also eg Elon Musk claiming that there were deep fakes made of him. Sometimes will be true, sometimes not. A lot of chaos.

Expand full comment

We're going to need some scalar dimensional measure of the verisimilitude of ongoing (re)presented Experience. With emojis, perhaps. As punctuation.

First-order: Firsthand personal Experience, as experienced by the Experiencer, etc.

Second-order: the "Trial Witness/Reporter" level: the Observation of the behaviors of other humans sharing physical space and time

Third-order: all mediated realms of personal communication exchanges intended as exchanges between two people who already know each other- phone conversations; email; private, by invitation only social media networks. Digital transmission verified by shared human personal memories of the past. Vulnerable to voice and image faking, but also subject to informal checks that become more reliable if a more detailed inquiry is pursued. A capability that simply isn't required, under the vast majority of circumstances: personal phone conversations with ones relatives, friends, or personal acquaintances very seldom stray into realms like "can you wire me $2000?" or "I've been kidnapped!" Unlikely. But as long as that vulnerability exists, it might be possible to be taken by surprise and be fooled.

https://duckduckgo.com/?t=ffab&q=deepfake+voice+kidnapping&atb=v336-1&ia=web

Fourth-order: everything on a screen, basically. the rest of the digital media realm. Possibly its entirety- every message from a smartphone; every digital image; every video; every Youtube clip; all digitally reproduced audio; every textual representation--including comments and chat exchanges, where anonymity is easily available, users are not reliably identified even when their user name "looks real", and comments might conceivably be generated by non-human bots (a possibility for how many years, now?) That "reality" is all up for grabs, potentially. The observation of (pre)recorded media resides at least as much in a realm of Suggestion (from memory) as it does with Mind In The Present, and possibly moreso. For example, so far the nature documentaries I watch appear to depict their subjects with authentically detailed accuracy (albeit often aided by fakery intended to fool the fauna.)

But there's nothing to stop sophisticated AI from morphing those video media depictions into fantasias bearing no relationship to actual content in the natural world. Which would mean what, in the event that it were to occur? Maybe you can figure it out for me.

A big part of the answer is to not mistake the 4th order for the other orders. It doesn't deserve to be taken seriously. Certainly not at first glance.

So here we all are, each with our firsthand experiential perspective, focusing on observing a transmission of a Reality that's been mediated to its Fourth Order remove.

I'm going outside to breathe some fresh air, and check into the awesome 2nd Order world...

Expand full comment

Well, there goes the neighborhood. I would estimate that a very large percentage of funds transfers between financial entities today are verified only by voice and facial recognition. Historically, the systems relied on controls at the point of conversion for protection; at any given time there were billions of dollars "sitting" in the wrong place but the hitch was you couldn't cash it out. Cryptocurrency changed all that, so losing voice/facial integrity removes the last barrier.

Expand full comment

"Now it’s AI developers who are expecting to do what they do scot-free, while society picks up the tab" - you could say the same about every technology that is sometimes used for the malicious purposes, starting with the internet and mobile phones. Would you blame the inventors?

Expand full comment
founding

The question is one of malicious intent. The Internet was a DARPA project, and *smartphones* (not mobile phones which were around for 10+ years earlier) went mainstream with the advent iPhone Android. No intent to harm, no liability.

But the OpenAI/Microsoft and Google GPTs were entirely aware that “guard rails” were needed. And that deepfakes were equivalent to identity thefts. So yes, they will be found legally liable. See my Substack for evidence-supported details.

Expand full comment

What do you mean? That OpenAI and Google products were used to generate these deepfakes used for the identity thefts?

Expand full comment