Let’s Talk About National Secrets
Here’s how it works right now: American history is being kept secret, then shredded, without ever seeing the light of day. Potentially important documents are lost before we ever knew they existed, simply because it’s too much trouble to keep them, much less read each one to figure out if releasing it might endanger national security. Only a fraction of the massive archive of classified documents gets any kind of attention at all before it’s lost forever.
Introducing The Declassification Engine
A group of historians, computer scientists, and statisticians at Columbia University is trying to fix this by building The Declassification Engine [http://www.declassification-engine.org/index.py...]. This groundbreaking partnership aims to create an intelligent virtual archive of our nation’s former secrets.
The Declassification Engine discovers patterns in government documents and cables using natural language processing and machine learning. When millions of documents are archived and processed, The Declassification Engine uncovers stories never before seen by the public.
How Can We Use the Declassification Engine?
There are several real-world applications of the Engine, which will come in handy for journalists, historians, Hollywood screenwriters, and anyone else who’s interested in mining our nation's history for hidden nuggets.
For instance, we can track the number of secret diplomatic cables sent day-by-day, hour-by-hour, looking especially for spikes that show when history starts to accelerate. More than a month before the helicopters started to lift off the roof of the American embassy in Saigon, a burst of cables signalled the oncoming crisis. We can also discover events that have been largely unknown until now. Using The Declassification Engine, we can find what terms are predictive of a document being withdrawn from the files so researchers cannot see it. A single word, “Boulder,” keeps showing up in the subject line of still-secret documents.
What is Boulder? Turns out, “Operation Boulder” was a covert operation under Nixon to investigate Arabs applying for US visas. It usually takes historians many years of combing through archives to make these kinds of discoveries. The Declassification Engine pulled this needle out of the haystack instantly.
There are plenty of other useful things we can do when we have a huge archive of declassified textual data to work from. One of the coolest: we believe we’ll be able to predict the content of blacked-out, redacted text in declassified documents. We can literally put the secrets back in, and even identify anonymous authors.
The best part about The Declassification Engine: It will be free, online, and open to all. You’ll be able to analyze documents yourself, or help the archive by submitting your own documents to be analyzed by our algorithms.
Why do we need it?
Why should I contribute?
If your house caught fire, you’d rush to grab photo albums and keepsakes, to save the invaluable and irreplaceable first. We’re fighting for what’s invaluable to us and to future generations. There are thousands of stories waiting to be told, and The Declassification Engine will work to bring these stories to the people.
Our aim is to help the U.S. government become more accessible, accountable, and transparent. Our goal is not to expose every official secret (we’re not WikiLeaks), but to automate the process of identifying what really does need to be safeguarded. In essence, The Declassification Engine will work to preserve our nation’s heritage.
How Will My Contribution Be Used?
The funding will go to support student researchers who have been working to develop The Declassification Engine and get it ready for public release. This includes funding for research, site design/UX, web development, and hosting fees. This will be a free service, open to all, funded by your generous donations with additional grants from Columbia University.
Because every dollar counts on this project, we’re keeping perks to a minimum: T-shirts for all donors from $50 and up. As you donate, please specify your T-shirt size in the Notes field. And thank you so much!
How Else Can I Get Involved?
The team behind The Declassification Engine will be hosting a conference at the Columbia School of Journalism May 10, bringing together some of the country’s leading historians, data scientists, and transparency advocates. You can register for the conference at www.declassification-engine.org and join the conversation. The website will also host some of the tools we are starting to develop, like the (De)Classifier, the (De)Sanitizer, and the Sphere of Influence, an interactive visualization of more than a million State Department cables. Help us test these tools and tell us how we can improve them. And if you have skills in NLP or designing web apps, and would like to develop new tools, write to us about your background and how you can pitch in. When we have a prototype ready to launch, we will also ask all of those who have declassified documents or have the time to analyze and annotate them to contribute to our virtual archive. This is the fuel that will feed The Declassification Engine and make it more powerful.
You can also learn more about the state of secret documents in the US in this presentation: