Synthesis Report: Expert meeting and knowledge exchange between i) leading researchers in media forensics and detection of deepfakes and other new forms of AI-based media manipulation and ii) leading experts in social newsgathering, UGC and OSINT verification and fact-checking.
In February 2019 WITNESS in association with George Washington University brought together a group of leading researchers in media forensics and detection of deepfakes and other media manipulation with leading experts in social newsgathering, UGC and opens-source intelligence (OSINT) verification and fact-checking.
New artificial intelligence-based tools are increasingly able to create convincing simulations of authentic media, including sophisticated audio and video manipulations called “deepfakes,” and more subtle manipulations. These media forms have the potential to amplify, expand, and alter existing problems around trust in information, verification of media, and weaponization of online spaces. You can learn more about this area here.
As the threat of more sophisticated, more personalized audio and video manipulation emerges, we see a critical need to bring together key actors before we are in the eye-of-the-storm, to ensure we prepare in a more coordinated way and to challenge “technopocalyptic” narratives that in and of themselves damage public trust in video and audio. We have an opportunity to ‘prepare, not panic’, and to handle this next wave of disinformation better than previous incidences.
Participants in this meeting included researchers working on media forensics and counter-forensics, on understanding provenance and detecting deepfakes, as well as experts on media manipulation ecosystems. The expert meeting brought them together with leading experts working on different timescales and types of identification, verification, debunking and fact-checking around photo, video and text content including people working within the news cycle, on longer investigations using open-source intelligence (OSINT) and on exposing and identifying networks of bad actors.
The overall goal of the meeting was to establish better common understanding and connective tissue on how to better prepare, communicate and collaborate around new threats of AI-generated audiovisual misinformation and disinformation.
The specific goals for the meeting were to:
1) Identify how emerging approaches to detecting deepfakes and synthetic media manipulation are accessible and understandable to a community that will use them in the real world for journalism, human rights and transparency
2) Build connectivity between key researchers and frontline debunkers to ensure preparedness for potential real-world usage of deepfakes
3) Identify how emerging tools and approaches to detection could be incorporated into existing workflows used by journalists and open-source investigators
This meeting built on previous meetings where WITNESS has convened stakeholders in this area (see other reports at Prepare, Don’t Panic: Synthetic Media and Deepfakes) including the report from our first expert convening in June 2018.
Key questions we focused on:
- What are key vulnerabilities and high-perception risks around established processes for finding and verifying videos, audio and images? Where do new forms of manipulation expand existing challenges, introduce new challenges, alter existing challenges, reinforce other threats?
- What would be most useful to OSINT/verification/truth-finding practitioners on the image/video manipulation detection side?
- What are the technologies being developed that are most relevant to some of these dilemmas?
- What could we do to learn from each other going forward?
Background on WITNESS’ work on deepfakes and synthetic media within our Emerging Threats and Opportunities initiative
For over 25 years, WITNESS has enabled human rights defenders and civic journalists, and now increasingly anyone, anywhere to use video and technology to protect and defend human rights and share trustworthy information.
The explosion of access to video, online social networks, and mobile technology over the last decade has been accompanied by a set of opportunities and challenges for individuals and communities who work to advance justice and accountability around the world. In today’s information ecosystem, these digital tools have the potential to increase civic engagement and participation – particularly for marginalized and vulnerable groups – enabling civic witnesses, journalists, and ordinary people to document abuse, speak truth to power, make their voices heard, and protect and defend their rights. Unfortunately, bad actors are utilizing the same tools to spread mal-information, identify and silence dissenting voices, disrupt civil society and democracy, perpetuate hate speech, and put individual rights defenders and journalists at risk.
This workshop took place as part of WITNESS’ Emerging Threats and Opportunities initiatives – focused on proactive action to protect and uphold marginalized voices, civic journalism, and human rights as emerging technologies such as artificial intelligence (AI) intersect with the pressures of disinformation, media manipulation, and rising authoritarianism. WITNESS believes that through early engagement civil society can inform and shape usages and ethics via doing research, modeling best practices for companies and early users, prototyping technology, bringing key actors together, and advocating for strong rights-respecting solutions. We bring a pragmatic perspective deeply grounded in grassroots experiences of human rights and use of technologies for activism, as well as expertise in articulating threats to human rights and journalism and in engaging directly with companies on their products and policies.
WITNESS is strongly committed to challenging existing power structures in the technology world that lead to harms to marginalized communities. Like other technologies, the world of AI is missing many voices, in particular those from outside of the US and Western Europe. This is a core principle of our Emerging Threats and Opportunities work.
Key reports that relate to this convening include:
- Summary of Discussions and Next Step Recommendations from June 2018 “Mal-uses of AI-generated Synthetic Media and Deepfakes: Pragmatic Solutions Discovery Convening”: For an extensive background on tech, solutions and key areas to focus on, emerging from an expert convening in June 2018 co-organized with First Draft
- Solutions survey: A survey of solutions across disciplines to address mal-use of AI-generated synthetic media and deepfakes
- UPCOMING: A report on authenticity and provenance-focused technical solutions and pros and cons of this type of technical infrastructure approach
- UPCOMING: A survey of threat scenarios identified in threat modeling workshops by journalists and other actors
This report includes the following sections for:
This Chatham-House report includes video and audio excerpts from interviews with leading OSINT, verification and social newsgathering practitioners and the following sections:
It is not new to be able to change images and video – for example, by adding or getting rid of layers such as text, or to make significant alterations. Historically, propaganda institutions have always looked at how to use these types of tools to manipulate. Among the elements that are changing are that this is happening with a time of so much visual content that when good (or semi-good) quality manipulated media can be created it makes it hard to differentiate what is real, and critically, it takes time to do so. It is also getting easier to manipulate content in more subtle and more transformative ways.
Alongside actual media manipulation problems, the widespread presence of more easy media manipulations allows people to dismiss claims of the real. This has implications for both high-stakes settings where verification of the integrity of the image matters – e.g. legal investigation, prosecution, intelligence – but also impacts our day-to-day as we interact with visual media of any sort.
For the moment, the capacity to simulate actual faces completely realistically in videos still requires a significant team of people with specialized training and tech. However, we’re in a stage of moving from manual synthesis of imagery (more consistent but time-consuming and specialist) to automatic synthesis (still rough but much faster and not requiring specialist skills). This is facilitated by the quantity of media created through social media: this provides more training data to improve deep fake tools. For deep fakes, you need training data and computing power, but otherwise the capacity to create is becoming more accessible.
For detection approaches you need a quantity and a quality of training data or additional information, and this is not always available around a particular technique. Researchers sometimes don’t have the data to analyze certain misinformation cases and need both more information to train detection tools but also more good quality information.
It is harder to detect when a video is partially manipulated (as opposed to entirely manipulated or not at all). However, fully synthetic video leaves a lot of digital indicators.
Audio is a different set of challenges to detect, and it’s more of a gap area in terms of research.
At the moment, there are more resources (people, investment) to do manipulation than detection (this is one of the reasons the DARPA MediFor program was launched). However, more researchers are acknowledging that this is a problem. We should not set expectations of getting to a perfect level of performance in detection as we see continuing escalation of deepfakes adjusting to forensics methods (for example, identifying ways to incorporate micro blushing or blinking) and the nature of the learning process for forgeries is to keep improving. Instead it would be better to define success in terms of making detection tools that make it as hard for as many people as possible to make a compelling manipulation that can evade detection.
We recognize we could address this but it would pose a censorship issue that needs more attention. There also seems to be a disconnect between tech platforms and what this specific research community is producing. There are also models within the platforms for handling data on problematic content with trusted partners – e.g. around terrorist content or child exploitation imagery and we need to explore what parts of this are relevant to deepfakes and misinformation.
Countering deep-fakes needs to be a multidisciplinary effort: this requires seeing it not only as a technical problem but also a psychological, human, and journalistic problem. To do this, need to ensure that social psychology research feeds into use and development of tools, and that we build research focused on video, not just on text.
This includes expertise from existing open source intelligence (OSINT) communities as well as existing media forensics communities. We also need to incorporate systems thinkers, cognitive scientists and more frontline groups’ and affected communities’ perspectives into the mix as well.
As an example the DARPA MediFor program is trying to combine multiple different measures of media integrity that reflect elements including digital integrity (for example, is their consistency in pixels or compression), physical integrity (for example, are the laws of physics violated with inconsistent shadows, lighting or movement), and semantic integrity (for example, if weather is inconsistent with the known weather in a location at a particular date and time). They are trying to offer a technique to reason across these layers, combining them into a single integrity score. Here, the idea is to put an increased burden on the person generating the manipulations because needs to be consistent in different layers.
Better data sharing will accelerate the refinement of tools to the type of data that OSINT community is tackling with.
Researchers need more case studies to help understand where to look for real instances of synthetic media and the current obstacles that current practitioners face
Both groups agree that there are data privacy issues in the work (including with data found via open source methods in social media sites) and have concerns for how to protect sources providing material for analysis. This is related to questions of disclosure of methods. We need to understand better the limits on research and distribution of techniques and yet still identify ways to disclose methods between researchers and key frontline communities.
Participants from investigatory groups also highlight that academic researchers may not yet appreciate the risks for individuals revealing fakes and misinformation. This includes being smeared for work, misinformation campaigns directed against them, and being targeted and harassed or doxxed. There is a need for support systems and doing work collaboratively if someone is in crisis or being attacked for public interest work.
In the current misinformation environment it’s not typically about creating fakes most of the time; more typically it’s about re-contextualizing and recycling existing visual content and presenting genuine content in misleading way. Like with deepfakes, it’s much harder to detect when some content is authentic, and it’s not 100% fabricated. Also, when the media itself is authentic but the surrounding messaging or framing is disinformation this poses a different problem: how can we better detect a genuine image but framed in misleading way by a problematic account?
While staged videos are encountered, they are not nearly as common as recycled content or lightly edited audio, video or one of two. Lightly edited content often involves changes of text in an image (for example, a Miami Herald story shared on Snapchat related to the Parkland shooting but with photoshopped text and then someone took screenshoots and reshared), or use of fake screenshots of articles on WhatsApp.
Conditions of circulation on social media means images don’t need to be high quality to be used for manipulation, and anyway a lot of current disinformation tactics are done through text and storytelling and do not require too much technical capacity.
Visuals and memes are the most dangerous vehicles of information disorder. Memes are easy to reproduce, simplify concepts and thus impacts conversation. For purposes of detection, checking and challenging they are a combination of text and images so you have to analyze both.
A bigger question is how to tackle biases. People want to believe what they want to believe and bring their own cognitive frame to a story. What is the place of forensics and media verification in situations where all that’s needed is to insert doubt, not create flawless forgeries.
It only takes inserting a doubt to delegitimize something in this context, not a complex fake.
Investigators look to find date, location, source for creation and sharing and then cross-corroborate with other reporting and open-source data (e.g. satellite imagery). Metadata is the first level to detect inconsistencies, but need to go beyond it because it also can be falsified.
When there is less content it’s more challenging. It’s also more challenging for social newsgathering approaches the further you get from original source and the time of creation as it gets harder to complement research with talking with eyewitnesses and individuals involved in filming and sharing. Forensics are also used in in-depth investigations to understand the inter-relationship of videos or images and to map them in physical space within forensic architecture and spatial analysis practices.
An example of the BBC’s work in this space, identifying the perpetrators of an extrajudicial killing in Cameroon is documented clearly here:
This relates to questions of how to debunk information and use plain language to present information derived from detection tools to people. People don’t necessarily trust fact-checking and news institutions, and fact-checks don’t reach people who consume misinformation and disinformation, so newsrooms and fact-checkers need to also focus on audience development and communication strategies that reflect networked information ecosystems (and learn from the effectiveness of disinformation networks in strategic amplification of messages), as well as how to customize and adapt fact-checks for different demographics.
It’s important to understand that the level of virality of a piece of content may determine what the tipping point is where you move from providing oxygen to a rumor by discussing it, to needing provide information to audiences to debunk it. You have to move quickly but not too early. Debunking will give a rumor oxygen if it didn’t have it already because increased exposure to the rumor gives it a familiarity makes it that people remember it and some might start believing it. For example, during the collaborative project on the Brazilian elections, the Comprova coalition debunked around 300 media items, but published only 150 or so, as the others were too niche and hadn’t crossed the tipping point.
A series of collaborations between new organizations around elections (such as Verificado in Mexico, Comprova in Brazil and Africa Check in Nigeria) have demonstrated the value in verification and debunking of collaboration between organizations on identifying and communicating around misinformation and disinformation, as well as the power of increased public trust created by multiple news participants.
A networked propaganda model looks at who’s involved in creating and amplifying content, rather than waiting for fact-checks on the content. This approach looks at how content originates in and moves between the dark web, secure network, anonymized networks and open networks. Disinformation actors will look to originate content in secure, dark or anonymized networks, and plan out strategic communication approaches (what meme to use, what hashtag to push etc) in the same spaces. Then they will do tactical dissemination in anonymous and open networks, and rely on amplification in open networks from not only the community that pushed the content but also regular people and fact-checkers. This is often complemented by amplification by mainstream media.
Much disinformation is sourced in closed messaging spaces and the anonymous web, and discussed in chat-boards like 4Chan, 8Chan and Discord.
Closed messaging apps are on the rise, forming networked small groups of trust. Within these apps, groups are a particular problem as the only heuristic you have for whether something is true is that someone you trust has sent it to the group. They are also hard for journalists to access, and there are no internal mechanisms in the tools to flag audiovisual or other content as inauthentic.
It’s ethically challenging to make decisions about accessing those groups (even if they are technically open) to scrape data, or identify misinformation examples, rather than waiting for people within those groups to submit or identify content themselves. What should we do with communities that are known to be orchestrating disinformation campaigns? In order to navigate the line between newsgathering, investigation and surveillance, investigators need ways to engage with content while still protecting privacy – this could be supported by tools for users reporting-out a suspect content item, or an integrity tool that flags known misinformation as it enters a channel.
Journalists have to handle real-time feeds that require fast decisions and so require tools for real-time analysis and manual verification takes too much time. Meanwhile the forensic tools for non-technical people are not readily available and most current verification is done manually. Investigators also deal with both a high quantity and low quality of content and the reality that the compression created by platforms and messaging apps reduces opportunities for forensic analysis. It’s also often the case that a forgery does not need to be perfect – it just needs to insert doubt or confirm beliefs, not be flawless.
Currently most tools are not accessible to non-technical people. The gap is not closing fast enough in terms of availability. There needs to be more coordination between journalists, media companies and computer scientists to vet a set of good tools, not just rely on open-source image forensics tools, and to make available tools the platforms develop to other users.
The OSINT community would benefit from having better tools useable and available within the platforms (social media, search) themselves for the OSINT community to use, and for users to notify or check on content they question – for example, a query tool out of WhatsApp that does not compromise overall end-to-end encryption but allows users to check an image or video against existing, known misinformation sources.
Social media feeds and information in them are happening in real time, mostly not filtered, so from a journalist’s point of view, there is a sense of urgency to tackle this.
Particularly given the closed nature of WhatsApp and Facebook Groups (See “WhatsApp as breeding ground for mis/disinformation” below for more). An approach to this could include also options to tag audiovisual content as suspect and find ways to track it across platforms (noting challenges of how to handle this in closed groups with encryption).
In order to both be able to effectively use tools (and know their limits) and also to communicate back to public how and why they were able to detect fakery. This is because for journalists, the challenge is not just getting to the conclusions themselves, but digesting them into a storytelling package for a general audience. There is always the question of how do you become accountable/actionable with the information you’ve learned about a piece of manipulated media.
for how to review manipulated images/video.
(preferably ones that are already built, without coding skills requirements) that are publicly available today and that have been peer-reviewed and tested
Journalists and OSINT investigators use tools to quickly gather evidence and classify it and debunk if necessary. It would be useful to have a tool that quickly analyzes content to show for example, if content was reposted on different social media and the provenance or type of cameras that was used.
if an image/video is fake or to flag potential anomalies to a human reviewer.
and to figure out if a part of image has been processed differently than other areas.
or identify what social networks they have passed through.
This is because the ability to proving if the person is the person it is supposed to be could emerge as critical.
There is critical existing need for the ability to search for previous usages of a video (including lightly edited versions of the same video) that predates deepfakes and relates to the vast majority of repurposed and otherwise zombie media.
Investigators need ways to do image tagging as suspect on open platforms and in closed groups and to have this carry across platforms. This is related to tools for provenance, use and trajectory tracking across platforms so can identify deep fakes and misinformation at the original source. Images are laundered across platforms and so its key to be able to determine where images came from and trace back to source. In misinformation, going back to original source (who uploaded, when, other metadata) is key.
on planning and implementation of disinformation campaign across dark web, secure spaces, anonymous web and open web, rather than focus on the content.
With different communities as consensus nodes and the ability to add confirmed metadata to audiovisual content. Journalists wouldn’t rely on these exclusively and would want to run their own verification processes but could provide signals of trust and enable better source verification.
to show who is who is creating, collecting and distributing propaganda and manipulated media (key actors).
A web-based tool to evaluate integrity of images/graphs and provide a flag to a researcher to do further analysis. Examples exist for example like IThenticate: that detects plagiarism in text, and are being developed for images published in scientific journals.
For a range of reasons, the researchers and the practitioners are not talking to each other. This is because of a lack of resources to support this type of conversation, and the question of how to match the incentive structures of academia, civil society, government, funders and journalism as well as understand the distinct ethical concerns of each sector. Structures of collaboration are not well incentivized or aligned. For journalists, it’s also in the context of shrinking resources and newsrooms. However, both groups see a need to be better connected. A closer connection and community would allow the two groups to understand each others’ tools and disseminate research in a practical way.
The meeting has shown that there are a lot of resources/capabilities available that are untapped and journalists don’t know about, however there is a question of how to make them useful and work within journalist workflow as products, and vice-versa technical researchers want to understand real-world scenarios and grapple with real-world data.
There are clear knowledge gaps between what the forensic community takes as givens and what OSINT community knows exists. An example of this would be existing tools for copy-paste detection as well as tools under development to track digital fingerprints of particular cameras, of what tool modified an image, and what social networks it has passed through. That said, many technologies are not yet commercialized or made user-friendly outside of research contexts. Better data sharing would accelerate this to refine tools to the type of problems that OSINT community is tackling.
We need to find ways to support having local image forensics and OSINT expertise in the diverse range of settings where misinformation and disinformation are encountered globally, since understanding local context matters.
We need to connect, meet more often. Examples of events this could be integrated into include ongoing community-building events WITNESS continuing series of events bringing together researchers and OSINT, global south-based meetings about different misinformation patterns, and work on journalistic preparedness. There could also be smaller, more local meetups to discuss how to concretely move forward on collaborations.
Some ideas of how to do this include:
- Using a human-centered design/design thinking/systems thinking format explore each other’s needs and work in-depth and to have structured time to come with a prototype or the beginning of ideas for possible solutions that can then be worked on
- Share practical examples of the research and tech in action – identifying manipulations, why it’s manipulated and how it operated
- Hands-on training on particular media forensics tools
- Workshops around specific scenarios, including testing tools
- Workshop how to translate research in a practical way for journalists and publics
- Ongoing knowledge exchange to help understand each others’ timescales and workflows
- Knowledge exchange placement where someone spends a certain amount of time working with someone else or in a specific org helping them develop tools
- Share understanding of threat models for doing this work that affect researchers and investigators
- A notification system/email list/twitter bot to let key OSINT investigators, deepfakes researchers know that manipulated video or deep fake has been identified, or to share feedback on successes or failings of tools
- Alert list for when the ‘misinformation event’ happens and we need each other’s help
- Rolodex of contacts/sources to connect with: Participants can build up contact source list of subject experts through meetings like this
- Slack team/email mailing list/Whatsapp or other forum to share knowledge, tough cases or examples, to develop projects, to ask questions of other experts (note: as long as we have agreed guidelines and ethics)
There are issues that are already a challenge, and others that will grow in importance where structured conversations between sectors could be valuable: for example, on how we prevent knowledge being used by nefarious actors; or on how we consider and feedback on pending legislation.
Since not everyone agrees on common terminologies around mis/disinformation this makes it hard to create clean data sets for analyzing.
We need to learn from each other in structured settings. One example of a training opportunity is the new deepfakes forensics course being offered by the National Center for Media Forensics at UC Denver. There are also ongoing projects and training opportunities in real-world contexts and scenarions in the OSINT/journalistic/human rights world.
Within ethical requirements, sharing real-world data and content examples to inform research could be critically useful. We also should note that some data and some examples cannot be released publicly and identify what can be ethically shared to learn and understand what to do better around the issue. This may require a third party to facilitate. There also needs to be an equal relationship between parties involved, so not to build power imbalances on who benefits from data.
One option to explore collectively could be to look at models of a data escrow service that provided a way to ethically use each others’ data but with clear protections and limits.
For example, building a country-wide or global collection of electrical network frequency (ENF) signals that vary by location and are detectable in videos to better enable identification of where a genuine audio or video was recorded.