• Joseph Dagher, Ph.D. Optical Sciences

Detecting Deep Fake Videos

The Problem

Have you seen Mark Zuckerburg brag about how he controls the stolen data for billions of people? Or how about a video of President Barack Obama insulting Donald Trump? While these scenarios are unlikely, videos of these alleged events do exist on the internet. To the untrained eye (and ear), these videos appear real. However, if you take a deeper look, you will find manipulated pixels and audio signals carefully synced and camouflaged to give an impression of a realistic video experience to the user.


The art of generating such experiences is called “Deep Fake,” where Artificial Intelligence and other Deep Learning-based algorithms are used to either modify or create video content that is different from what was originally captured by a camera.


Who Benefits

Reputation, personal, and political attacks have been the obvious initial motivations behind Deep Fake videos. The West, particularly the United States, is still the main target when considering attacks on public figures. Nowadays, the main applications of Deep Fakes are in the entertainment industry, constituting more than 95% of the number of Deep Fake videos on the internet.


How Are They Made?

The power of Deep Fake videos is in the simplicity of generating them. This process can be divided into a few steps. First, a program called the “AI encoder” is trained to find similarities between two different faces, corresponding to person A and person B. These features of similarity describe each person's face at a coarse level. To recover the detailed face of either person, another program, called the “AI decoder,” is trained to recover the face of person A (or B) from coarse features. This step generally requires thousands of images to train on.

To perform the face swap, the AI Deep Fake algorithm simply feeds similarity measures from an AI decoder trained on person A into the AI decoder trained on person B. This reconstructs the face of person B with the expressions and orientation of person A. This process has to be done on every frame of the video.


Other methods for generating fake videos exist but involve this general idea of pitting two different “face generators” against one another, thereby guiding the AI algorithm into generating new faces from either existing or non-existing faces. Additional touch up may be needed to correct processing errors and eliminate any obvious image or video artifacts.


Who Can Make Them?

Anyone! Generating Deep Fake videos was once the trade of researchers with access to supercomputers. Nowadays, almost anyone with access to GitHub, YouTube, and cloud computing (for faster processing) can generate videos in a matter of hours to days.


Can Deep Fakes be Detected?

The ability to detect manipulated video depends on the complexity of the method used to generate it. Some videos appear obviously fake based on the lack of realism in the stream: poor synchronization between audio and video tracks, patchy skin tone, jittery motion, flickery edges, eyes that do not blink, etc.


Fine details, such as hair, space between teeth, eyelids, and wrinkles, are particularly hard for Deep Fakes to render. This is especially true when the details are on the edge, between the face and the background. Poorly rendered jewelry can also be a giveaway, as can strange lighting effects, such as inconsistent illumination and reflections (or lack of) on the iris.


But, as is the case with technology, solutions get proposed as soon as problems are identified. Thus, in 2019, partnering with other industry leaders and academic experts, Facebook launched a Deepfake Detection Challenge (DFDC). The results from this competition varied: while some methods could accurately detect Deep Fake videos 82% of the time, the performance was significantly lower (65%) when the videos used in testing differed from those used in training.


The irony is that most of the proposed detection methods in the DFDC focus on using AI techniques to spot AI-manipulated videos. At Presage Technologies, we adopted a different approach and asked the following question: do the faces in the video contain any physiological measures of life?


Using unique proprietary methods, we have shown that it is possible to accurately and precisely extract Heart Rate (HR) and Respiratory Rate (RR) from subtle color changes in the human skin as captured by a video camera. This is the general approach used in contact-based photo-plethysmography (PPG) devices which monitor color changes in the skin in response to the fluctuation of human blood under the tissues.


Deep Fake videos lack PPG signals, and therefore, readily possible to distinguish a face generated with an AI algorithm from a real person’s face. A similar technique proposed by Morales et al. reported almost perfect accuracies (98-100%) when tested on the DFDC data set.


Conclusion

As AI methods get even more sophisticated, PPG signals will inevitably be added to a manipulated video. It will become increasingly critical to distinguish a fake PPG signal from a realistic PPG signal consistent with physiological processes across space and time.


We are continually developing and testing novel ideas that allow our government and institutions to avoid the insidious impact of Deep Fake videos and enable a society where people can easily distinguish truth from falsehood.



For more information, contact Presage Technologies. We would love to hear about your intended use case.


39 views0 comments

Recent Posts

See All