Get all your news in one place.

100's of premium titles.
One app.

Start reading

Get all your news in one place.

100's of premium titles. One news app.

Start reading

The Guardian - US

World

Sean Craig

Why data sleuths are archiving the Jeffrey Epstein files: ‘We want to provide some clarity’

Jeffrey Epstein DOJ Sergey Brin Emma Best Ghislaine Maxwell Google YAHOO

Jeffrey Epstein reclines in a wicker chair outdoors, holding a light blue garment against his chest — Jeffrey Epstein in a photo released by the Department of Justice. Photograph: Department of Justice

Before the US Department of Justice (DoJ) missed a legally mandated, December 2025 deadline to release unclassified files related to the prosecution of Jeffrey Epstein, the Denmark-based data scientist and bioinformatician Tommy Carstensen was not especially concerned with the case of the accused sex trafficker.

“I hadn’t even watched the Netflix documentary,” he said.

“It did not interest me because I thought he was ‘just’ another monetarily wealthy pedophile,” he added, noting the only Epstein associates he was aware of were Ghislaine Maxwell and Britain’s then Prince Andrew.

Now Carstensen oversees one of the internet’s most sophisticated archives of material on Epstein, who died by suicide in 2019 while awaiting trial. He has published interactive graphics of the financier’s properties and financial transactions, an analysis of over 1m documents released by the DoJ that groups them into subject areas, court records, transcripts of audio and video files from the releases, and a facial recognition tool that lets anyone upload an image of a face to see if it appears in any images in the files.

He spends as much as 50 hours per week maintaining the archive, on top of a full-time job, he said. Journalists and researchers have praised his efforts.

Carstensen was motivated to build the archive after lawmakers accused the justice department in December of failing to comply with a law that mandated the declassification and release of files related to Epstein to the maximum extent possible by 19 December 2025.

Volunteer sleuths like Carstensen, who also joined online efforts to identify participants in the January 6 insurrection earlier this decade, are not alone.

An increasing number of journalists, researchers and activists have applied technical analyses to the Epstein files that draw out information not readily available in the DoJ’s raw dumps of material.

The latest is a searchable database of faces of individuals who appear in original images in the Epstein files, published earlier this month by the non-profit Decoherence Media.

The founder, Tristan Lee, said the new database, which also includes a visualization that shows people who appear in the Epstein files, reveals images of more than 100 individuals who are not mentioned in Epstein’s email files and nearly 200 who have not been reported on, among them a Hollywood agent and the head of a large fitness chain.

Appearing in the Epstein records does not indicate any wrongdoing.

Faces in the database were identified in part using facial recognition technology, notably with Amazon web services’ Rekognition. Lee said he adhered to Rekognition’s recommended 99% similarity threshold for law enforcement matters, essentially ruling out all but exact matches in order to screen out false positives. Rekognition also had useful features, including the ability to recognize public figures and detect nudity.

Considerations about the technology’s known shortcomings were also weighed.

“Facial recognition models are notoriously less reliable for non-white faces, so there were multiple cases where we discarded a match because we weren’t confident in it,” he said. IDs have also been double checked with multiple recognition models and manually.

While the technology has drawn criticism and concern for its potential use in surveillance, Lee said “Epstein’s network [is] particularly well-suited for using facial recognition.”

“Many of the people in Epstein’s network are notable in some way, prominent enough to have news articles written about them or be featured on company websites,” he noted, alluding to the vast amount of material that can be used to identify individuals in these cases. “The second factor is that Epstein’s network contains multiple well-defined social circles. The third is that we have nearly 20 years of Epstein’s emails and communications, so matches can be cross-checked.”

For example, one previously identified person that the new database of faces uncovered additional material on is Google co-founder Sergey Brin. Three undated photographs in Epstein’s files that Decoherence identified show Brin, who remains a board member at Google parent Alphabet, at a resort on Guana Island.

On Thursday, 28 December 2006, Epstein’s associate Maxwell emailed him stating that Brin would be on Guana. Epstein inquired as to the timing of his visit, saying he would be there on “Saturday.”

Brin, who court documents suggest used Epstein for tax advice, did not reply to a request for comment made through Google.

Lee was motivated to build the database, he said, because “there’s still so much confusion about who Jeffrey Epstein was, who was in his network, and what his crimes were. I’ve seen viral TikTok videos about how Epstein was actually a cannibal, or elaborate conspiracy theories linking unrelated people to him. We wanted to provide some clarity, to help regular people, as well as journalists and policymakers, better understand who is actually part of Epstein’s social circle, and how these elite networks of power and influence actually operate.”

In the course of documenting those networks, researchers have also had to contend with the DoJ previously removing or retroactively redacting documents, in some cases because they had erroneously failed to redact identifying information on victims.

Carstensen wrote a code to monitor the DoJ website for changes, and maintains a list of victim names and names of victims’ family members who are automatically redacted from his archive. Images of known survivor’s faces, as well as minors, are also redacted, and he has been responsive to takedown requests from survivors.

“The DoJ did a disgraceful job with redactions,” said Lee. “The worst example I saw was that they released nearly 100 naked photos of one outspoken victim.” While the DoJ conceded it made some errors during the release of the files, officials maintain they have ultimately complied with Epstein Files Transparency Act. A departmental watchdog is investigating the issue.

Emma Best, a co-founder of Distributed Denial of Secrets (DDoS), which publishes leaked and hacked datasets, has been particularly careful in their approach to an archive of more than 20,000 unredacted emails from Epstein’s Yahoo account that the group obtained.

DDoS has only made the complete archive available to vetted researchers and journalists, which has resulted in a wide range of reports on and revelations about Epstein’s personal and business dealings.

“We’ve kept the standard to access the Epstein emails higher than usual for many other datasets because it’s especially sensitive and salacious,” Best said, noting that those who are cleared to access the files are “only to republish emails which are relevant to their reporting, and that unless it can be determined otherwise the assumption should be that people in the emails are potential victims.”

The Yahoo emails, with redactions to protect victims and minors in accordance with DDoS’s wishes, are being released by Jmail, a browser-based archive of Epstein’s emails and other files from the DoJ and other sources developed by a group of volunteer tech workers and engineers.

Added Best: “We’ve been fortunate that most journalists understand the duty we have to Epstein’s survivors.”

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here