I’m a curious person, as I bet you are, too. I wonder about a lot of things, not dark matter, but simple things, like where are my documents? What happened to them? Where have they gone? A cursory scan of your hard drive may reveal the sobering truth: you have many old documents you barely remember. By some online analyses, you have thousands of documents, most unopened for years, and a relic of your past work. My hard drive reveals documents well beyond two decades old, some whose contents I have little memory of, but fodder for future data archeologists perhaps. But the real question is where else may they be?

The birth of a document life cycle

You sit ready to generate another brilliant work and press New. Ever wonder what you have just started? From version to version, to ultimate commitment of a complete first final draft (like the document you are now reading) you have just started a document on its long-lived journey with no end in site where it goes and who reads it. When you press Save, the journey has just begun to your hard drive. But what about your browser cache, or the word processor and your time machine backups? That one document is copied and stored in locations you don’t generally observe. But they are there.

By some analyses, each laptop or desktop has thousands of dusty documents sitting at rest. Long forgotten, and never really gone. When you press Delete, do they really disappear? Remember, caches and backups store data, too. Delete doesn’t generally flush that document from all its hiding places. Worse, perhaps, is if that document was emailed, or uploaded, all bets are off.

Consider a very real example. In your laptop, your TAX folder likely has subfolders named /2010, /2011, …, with last year’s /2016 folder, soon to be joined with a new folder you are about to create, /2017. (It is near tax time, so my apologies for reminding you of the upcoming un-pleasantries.) A peak inside /2016 reveals 1040_TY2016.pdf, alongside various spreadsheets and receipts. That 1040 document is the key to your financial privacy, yet it sits in your laptop at rest and unprotected. Who last opened that document? Who read your secrets? Clearly, the US IRS and your accountant both have copies, but does anyone else? Would you like to know the answer to that mystery? Is it possible to know who last read that document?

Send is the end of your control

As the Moody Blues would say, what became of that letter you never meant to send? Off it went in your email to a trusted recipient. Your company may have invested heavily in DLP technology to watch if their sensitive documents you created were inadvertently sent to your home email, or worse a competitor. But some of those sensitive documents you legitimately provided to your company’s trusted law firm, or accounting firm, or intellectual property counsel’s inbox, all covered under a corporate NDA agreement. No problem. All legitimate. But do you really know where they went from there? In the trusted third party’s zeal for efficiency, they hired temporary workers who likewise in their zeal to do a good job, read them at home on those laptops they brought on vacation. And there are those pesky local caches again, even if they deleted the documents, the cache went on vacation, too.

So where are they now?

Most of the document files I’ve kept the past twenty years hardly interest me, much less anyone else. But we all have documents that matter and I’m sure, like me, you’d feel a sense of panic or embarrassment if your half-finished memoir or legal files ended up elsewhere. But beyond tracking down your personal files, companies are on the hook for new compliance regs that control the flow of data. The issue is more than just satisfying curiosity, it’s a significant new liability in view of existing compliance requirements, and worse, the upcoming GDPR regulations. Your company of course must comply and risk the stiff penalties of GDPR, if it has any business in the EU.

Tracking a document with beacons

Documents can now be tracked with beacons. A beacon is an object embedded in a document, that survives editing, that signals home when the document is opened. Think of it as GPS for your data! There are many ways of implementing a beacon, some that challenge the ambiguity of various laws and regulations, others that are entirely legal.

Legitimate readers can freely review the document, but illegitimate readers can be revealed. You can know where your documents went, and how many copies were made. Imagine that. Your documents may flow on the internet for days, weeks or more, and the curious can know where they are. Each document becomes its own Voyager roaming endlessly on the net. This eye opening new technology can finally answer the question, where did my document really go? Who actually read my 1040_TY2016.pdf? I really want to know the answer to that question.

This article is published as part of the IDG Contributor Network. Want to Join?

Salvatore Stolfo is a tenured Columbia University professor, teaching computer science since 1979. He is the co-founder and CTO of Allure Security, a DARPA-funded cybersecurity startup specializing in data protection and the prevention of data breaches.