科学美国人60秒--你比机器更擅长识别假货吗？_在线英语听力室

(单词翻译:双击或拖选)

Are You Better Than a Machine at Spotting a Deepfake?

New research shows that detecting digital fakes generated by machine learning might be a job best done with humans still in the loop.

Sarah Vitak: This is Scientific American’s 60 Second Science. I’m Sarah Vitak.

Early last year a TikTok of Tom Cruise doing a magic trick went viral.

[CLIP: Deepfake of Tom Cruise says, “I’m going to show you some magic. It’s the real thing. I mean, it’s all the real thing.”]

Vitak: Only, it wasn’t the real thing. It wasn’t really Tom Cruise at all. It was a deepfake.

Matt Groh: A deepfake is a video where an individual's face has been altered by a neural¹ network to make an individual do or say something that the individual has not done or said.

Vitak: That is Matt Groh, a Ph.D. student and researcher at the M.I.T. Media Lab. (Just a bit of full disclosure here: I worked at the Media Lab for a few years, and I know Matt and one of the other authors on this research.)

Groh: It seems like there’s a lot of anxiety and a lot of worry about deepfakes and our inability to, you know, know the difference between real or fake.

Vitak: But he points out that the videos posted on the Deep Tom Cruise account aren’t your standard deepfakes.

The creator, Chris Umé, went back and edited individual frames by hand to remove any mistakes or flaws left behind by the algorithm. It takes him about 24 hours of work for each 30-second clip. It makes the videos look eerily² realistic. But without that human touch, a lot of flaws show up in algorithmically generated deepfake videos.

Being able to discern between deepfakes and real videos is something that social media platforms in particular are really concerned about as they need to figure out how to moderate and filter this content.

You might think, “Okay, well, if the videos are generated by an AI, can’t we just have an AI that detects them as well?”

Groh: The answer is kind of yes but kind of no. And so I can go—you want me to go into, like, why that? Okay, cool. So the reason why it’s kind of difficult to predict whether video has been manipulated or not is because it’s actually a fairly complex task. And so AI is getting really good at a lot of specific tasks that have lots of constraints³ to them. And so AI is fantastic at chess. AI is fantastic at Go. AI is really good at a lot of different medical diagnoses, not all, but some specific medical diagnoses, AI is really good at. But video has a lot of different dimensions to it.

Vitak: But a human face isn’t as simple as a game board or a clump⁴ of abnormally growing cells. It’s three-dimensional, varied⁵. Its features create morphing patterns of shadow and brightness. And it’s rarely at rest.

Groh: And sometimes you can have a more static situation, where one person is looking directly at the camera, and much stuff is not changing. But a lot of times people are walking. Maybe there’s multiple people. People’s heads are turning.

Vitak: In 2020 Meta (formerly Facebook) held a competition where they asked people to submit deepfake detection algorithms. The algorithms were tested on a “holdout set,” which was a mixture of real videos and deepfake videos that fit some important criteria⁶.

Groh: So all these videos are 10 seconds. And all these videos show actors, unknown actors, people who are not famous in nondescript settings, saying something that’s not so important. And the reason I bring that up is because it means that we’re focusing on just the visual manipulations. So we’re not focusing on “Do”—like, “Do you know something about this politician or this actor?” and, like, “That’s not what they would have said. That's not like their belief” or something. “Is this, like, kind of crazy?” We’re not focusing on those kinds of questions.

Vitak: The competition had a cash prize of $1 million that was split between top teams. The winning algorithm was only able to get 65 percent accuracy.

Groh: That means that 65 out of 100 videos, it predicted correctly. But it’s a binary⁷ prediction. It’s either deepfake or not. And that means it’s not that far off from 50–50. And so the question then we had was, “Well, how well would humans do, relative to this best AI on this holdout set?”

Groh and his team had a hunch⁸ that humans might be uniquely suited to detect deepfakes—in large part because all deepfakes are videos of faces.

Groh: People are really good at recognizing faces. Just think about how many faces you see every day. Maybe not that much in the pandemic, but generally speaking, you see a lot of faces. And it turns out that we actually have a special part in our brains for facial recognition. It’s called the fusiform face area. And not only do we have this special part in our brain, but babies are even—like, have proclivities⁹ to faces versus¹⁰ nonface objects.

Vitak: Because deepfakes themselves are so new (the term was coined in late 2017) most of the research so far around spotting deepfakes in the wild has really been about developing detection algorithms: programs that can, for instance, detect visual or audio artifacts left by the machine-learning methods that generate deepfakes. There is far less research on human’s ability to detect deepfakes. There are several reasons for this, but chief among them is that designing this kind of experiment for humans is challenging and expensive. Most studies that ask humans to do computer-based tasks use crowdsourcing platforms that pay people for their time. It gets expensive very quickly.

The group did do a pilot with paid participants but ultimately came up with a creative, out-of-the-box solution to gather data.

Groh: The way that we actually got a lot of observations was hosting this online and making this publicly available to anyone. And so there’s a Web site, detectfakes.media.mit.edu, where we hosted it, and it was just totally available and there were some articles about this experiment when we launched it. And so we got a little bit of buzz from people talking about it; we tweeted about this. And then we made this. It’s kind of high on the Google search results when you’re looking for deepfake detection and just curious about this thing. And so we actually had about 1,000 people a month come visit the site.

Vitak: They started with putting two videos side by side and asking people to say which was a deepfake.

Groh: And it turns out that people are pretty good at that, about 80 percent on average. And then the question was “Okay, so they’re significantly better than the algorithm on this side-by-side task. But what about a harder task, where you just show a single video?”

Vitak: Compared on an individual basis with the videos they used for the test, the algorithm was slightly better. People were correctly identifying deepfakes around 66 to 72 percent of the time,whereas the top algorithm was getting 80 percent.

Groh: Now, that’s one way. But another way to evaluate the comparison—and a way that makes more sense for how you would design systems for flagging misinformation and deepfakes—is crowdsourcing. And so there’s a long history that shows when people are not amazing at a particular task or when people have different experiences and different expertise¹¹, when you aggregate¹² their decisions along a certain question, you actually do better than the individuals by themselves.

Vitak: And they found that the crowdsourced results actually had very similar accuracy rates to the best algorithm.

Groh: And now there are differences again, because it depends what videos we’re talking about. And it turns out that, on some of the videos that were a bit more blurry¹³ and dark and grainy, that’s where the AI did a little bit better than people. And, you know, it kind of makes sense that people just didn’t have enough information, whereas there’s the visual information that was encoded in the AI algorithm. And, like, graininess isn’t something that necessarily matters so much, they just—the AI algorithm sees the manipulation, whereas the people are looking for something that deviates¹⁴ from your normal experience when looking at someone—and when it’s blurry and grainy and dark—your experience already deviates. So it’s really hard to tell. But the thing is, actually, the AI was not so good on some things that people were good on.

Vitak: One of those things that people were better at was videos with multiple people. And that is probably because the AI was “trained” on videos that only had one person.

And another thing that people were much better at was identifying deepfakes when the videos contained famous people doing outlandish things. (Another thing that the model was not trained on). They used some videos of Vladimir Putin and Kim Jong-un making provocative¹⁵ statements.

Groh: And it turns out that when you run the AI model on either the Vladimir Putin video or the Kim Jong-un video, the AI model says it’s essentially¹⁶ very, very low likelihood that’s a deepfake. But these were deepfakes. And they are obvious to people that they were deepfakes or at least obvious to a lot of people. Over 50 percent of people were saying, “This is, you know, this is a deepfake.”

Vitak: Lastly, they also wanted to experiment with trying to see if the AI predictions could be used to help people make better guesses about whether something was a deepfake or not.

So the way they did this was they had people make a prediction about a video. Then they told people what the algorithm predicted, along with a percentage of how confident the algorithm was. Then they gave people the option to change their answers. And amazingly, this system was more accurate than either humans alone or the algorithm alone. But, on the downside, sometimes the algorithm would sway people’s responses incorrectly.

Groh: And so not everyone adjusts their answer. But it's quite frequent that people do adjust their answer. And in fact, we see that when the AI is right, which is the majority of the time, people do better also. But the problem is that when the AI is wrong, people are doing worse.

Vitak: Groh sees this as a problem in part with the way the AI’s prediction is presented.

Groh: So when you present it as simply a prediction, the AI predicts 2 percent likelihood, then, you know, people don’t have any way to introspect what’s going on, and they’re just like, “Oh, okay, like, the eyes thinks it’s real, but, like, I thought it was fake. But I guess, like, I’m not really sure. So I guess I’ll just go with it.” But the problem is that that’s not how, like, we have conversations as people. Like, if you and I were trying to assess, you know, whether this is a deepfake or not, I might say, “Oh, like, did you notice the eyes? Those don’t really look right to me,” and you’re like, “Oh, no, no, like, that—that person has, like, just, like, brighter green eyes than normal. But that’s totally cool.” But in the deepfake, like, you know, AI collaboration¹⁷ space, you just don’t have this interaction with the AI. And so one of the things that we would suggest for future development of these systems is trying to figure out ways to explain why the AI is making a decision.

Vitak: Groh has several ideas in mind for how you might design a system for collaboration that also allows the human participants to better utilize¹⁸ the information they get from the AI.

Ultimately, Groh is relatively¹⁹ optimistic about finding ways to sort and flag deepfakes—and also about how influential²⁰ deepfakes of false events will be.

Groh: And so a lot of people know “Seeing is believing.” What a lot of people don’t know is that that’s only half the aphorism²¹. The second half of aphorism goes like this: “Seeing is believing. But feeling is the truth.” And feeling does not refer to emotions there. It’s experience. When you’re experiencing something, you have all the different dimensions that’s, you know, of what’s going on. When you’re just seeing something, you have one of the many dimensions. And so this is just to get up this idea that, you know, that that seeing is believing to some degree. But we also have to caveat²² it with: there’s other things beyond just our visual senses that help us identify what’s real and what’s fake.

Vitak: Thanks for listening. For Scientific American’s 60 Second Science, I’m Sarah Vitak.

点击

收听单词发音

1 neural
adj.神经的，神经系统的
参考例句：
The neural network can preferably solve the non- linear problem.利用神经网络建模可以较好地解决非线性问题。 The information transmission in neural system depends on neurotransmitters.信息传递的神经途径有赖于神经递质。

2 eerily
adv.引起神秘感或害怕地
参考例句：
It was nearly mid-night and eerily dark all around her. 夜深了，到处是一片黑黝黝的怪影。来自汉英文学 - 散文英译 The vast volcanic slope was eerily reminiscent of a lunar landscape. 开阔的火山坡让人心生怪异地联想起月球的地貌。来自辞典例句

3 constraints
强制( constraint的名词复数 )；限制；约束
参考例句：
Data and constraints can easily be changed to test theories. 信息库中的数据和限制条件可以轻易地改变以检验假设。来自英汉非文学 - 科学史 What are the constraints that each of these imply for any design? 这每种产品的要求和约束对于设计意味着什么？来自About Face 3交互设计精髓

4 clump
n.树丛，草丛；vi.用沉重的脚步行走
参考例句：
A stream meandered gently through a clump of trees.一条小溪从树丛中蜿蜒穿过。 It was as if he had hacked with his thick boots at a clump of bluebells.仿佛他用自己的厚靴子无情地践踏了一丛野风信子。

5 varied
adj.多样的，多变化的
参考例句：
The forms of art are many and varied.艺术的形式是多种多样的。 The hotel has a varied programme of nightly entertainment.宾馆有各种晚间娱乐活动。

6 criteria
n.标准
参考例句：
The main criterion is value for money.主要的标准是钱要用得划算。 There are strict criteria for inclusion in the competition.参赛的标准很严格。

7 binary
adj.二，双；二进制的；n.双(体)；联星
参考例句：
Computers operate using binary numbers.计算机运行运用二进位制。 Let us try converting the number itself to binary.我们试一试，把这个数本身变成二进制数。

8 hunch
n.预感，直觉
参考例句：
I have a hunch that he didn't really want to go.我有这么一种感觉，他并不真正想去。 I had a hunch that Susan and I would work well together.我有预感和苏珊共事会很融洽。

9 proclivities
n.倾向，癖性( proclivity的名词复数 )
参考例句：
Raised by adoptive parents,Hill received early encouragement in her musical proclivities. 希尔由养父母带大，从小，她的音乐爱好就受到了鼓励。来自《简明英汉词典》 Whatever his political connections and proclivities, he did not care to neglect so powerful a man. 无论他的政治关系和脾气如何，他并不愿怠慢这样有势力的人。来自辞典例句

10 versus
prep.以…为对手，对；与…相比之下
参考例句：
The big match tonight is England versus Spain.今晚的大赛是英格兰对西班牙。 The most exciting game was Harvard versus Yale.最富紧张刺激的球赛是哈佛队对耶鲁队。

11 expertise
n.专门知识(或技能等)，专长
参考例句：
We were amazed at his expertise on the ski slopes.他斜坡滑雪的技能使我们赞叹不已。 You really have the technical expertise in a new breakthrough.让你真正在专业技术上有一个全新的突破。

12 aggregate
adj.总计的，集合的；n.总数；v.合计；集合
参考例句：
The football team had a low goal aggregate last season.这支足球队上个赛季的进球总数很少。 The money collected will aggregate a thousand dollars.进帐总额将达一千美元。

13 blurry
adj.模糊的；污脏的，污斑的
参考例句：
My blurry vision makes it hard to drive. 我的视力有点模糊，使得开起车来相当吃力。来自《简明英汉词典》 The lines are pretty blurry at this point. 界线在这个时候是很模糊的。来自《简明英汉词典》

14 deviates
v.偏离，越轨( deviate的第三人称单数 )
参考例句：
The boy's behavior deviates from the usual pattern. 这个男孩子的举止与一般人不同。来自《简明英汉词典》 The limit occurs when the ordinate deviates appreciably from unity. 这个限度发生在纵坐标明显地从单位1偏离的时候。来自辞典例句

15 provocative
adj.挑衅的，煽动的，刺激的，挑逗的
参考例句：
She wore a very provocative dress.她穿了一件非常性感的裙子。 His provocative words only fueled the argument further.他的挑衅性讲话只能使争论进一步激化。

16 essentially
adv.本质上，实质上，基本上
参考例句：
Really great men are essentially modest.真正的伟人大都很谦虚。 She is an essentially selfish person.她本质上是个自私自利的人。

17 collaboration
n.合作，协作；勾结
参考例句：
The two companies are working in close collaboration each other.这两家公司密切合作。 He was shot for collaboration with the enemy.他因通敌而被枪毙了。

18 utilize
vt.使用，利用
参考例句：
The cook will utilize the leftover ham bone to make soup.厨师要用吃剩的猪腿骨做汤。 You must utilize all available resources.你必须利用一切可以得到的资源。

19 relatively
adv.比较...地，相对地
参考例句：
The rabbit is a relatively recent introduction in Australia.兔子是相对较新引入澳大利亚的物种。 The operation was relatively painless.手术相对来说不痛。

20 influential
adj.有影响的，有权势的
参考例句：
He always tries to get in with the most influential people.他总是试图巴结最有影响的人物。 He is a very influential man in the government.他在政府中是个很有影响的人物。

21 aphorism
n.格言，警语
参考例句：
It is the aphorism of the Asian Games. 这是亚运会的格言。 Probably the aphorism that there is no easy answer to what is very complex is true. 常言道,复杂的问题无简易的答案,这话大概是真的。

22 caveat
n.警告；防止误解的说明
参考例句：
I would offer a caveat for those who want to join me in the dual calling.为防止发生误解，我想对那些想要步我后尘的人提出警告。 As I have written before,that's quite a caveat.正如我以前所写，那确实是个警告。

在线英语听力室_免费在线英语听力学习网站

科学美国人60秒--你比机器更擅长识别假货吗？