UCSD scientists developed a technique that fools deepfake detection systems
As deepfake technology evolves, it is becoming increasingly difficult to tell when a video has been manipulated. Fortunately, various groups have developed sophisticated neural networks to detect faked faces. However, computer scientists revealed last week that they have a way to fool even the most robust detection models into thinking a deepfake is real.
Researchers at the University of California, San Diego, have developed a technique that can trick algorithms trained to detect deepfake videos. Using a two-step method, the computer scientists take a detectable deepfake then craft and insert an “adversarial example” layer into each frame to create a new fake that is virtually undetectable.
Adversarial examples are simply manipulated images that foul up a machine learning system causing it to recognize an image incorrectly. One example of this we have seen in the past are adversarial stickers or even electrical tape used to trick autonomous vehicles into misreading traffic signs. However, unlike the defacing of traffic signs, UCSD’s method does not change the resulting video’s visual appearance. This aspect is significant since the goal is to trick both the detection software and the viewer (see video below).
The researchers demonstrated two types of attacks—”White Box” and “Black Box.” With White Box exploits the bad actor knows everything about the targeted detection model. Black Box attacks are when the attacker is unaware of the classification architecture used. Both methods “are robust to video and image compression codecs” and can fool even the most “state-of-the-art” detection systems.
Deepfakes have stirred up quite a bit of controversy since emerging on the internet a few years ago. At first, it was primarily celebrities outraged about their likenesses showing up in porn videos. However, as the tech improved, it became clear that bad actors or rogue nations could use it for propaganda or more nefarious purposes.
Universities were the first to develop algorithms for detecting deepfakes, followed quickly by the US Department of Defense. Several tech giants, including Twitter, Facebook, and Microsoft, have also been developing ways to detect deepfakes on their platforms. The researchers say the best way to combat this technique is to perform adversarial training on detection systems.
“We recommend approaches similar to Adversarial Training to train robust Deepfake detectors,” explained co-author Paarth Neekhara in the team’s research paper. “That is, during training, an adaptive adversary continues to generate novel Deepfakes that can bypass the current state of the detector and the detector continues improving in order to detect the new Deepfakes.”
The group posted several examples of its work on GitHub. For those interested in the minutiae of the tech, check out the paper published through Cornell’s arXivLabs.