New Microsoft privacy framework lets lawyers, developers and their code speak the same language

Microsoft Research has developed a new framework for automatically figuring out which lines of code inside massive systems might conflict with corporate privacy policies. It’s an important goal in today’s technology world where ever-present threats of data breaches and lawsuits, as well as the specter of looming government regulation, have smart companies preparing for whatever might come their way.

The really novel thing about Microsoft’s framework is that it was designed to bring together teams of personnel that might never interact directly otherwise, so that the compliance process is faster and less prone to errors. The system involves a high-level language called Legalease, which lets lawyers and policy employees encode corporate privacy policies into a machine-readable format, and a tool called Grok that inventories big data systems and checks them against those policies.

“Ultimately, the truth about what’s happening with this data is in the code,” researcher Saikat Guha explained.

A comparison of compliance workflows. Source: Microsoft Research

But with millions of lines of code (a fair amount of which changes daily) in a product such as Bing — on which the Microsoft Research project was prototyped — it can be difficult to figure out what data is being stored where, how it’s being used as part of any given job and whether that usage complies with privacy rules. The current compliance process is time-consuming: lawyers write policies, privacy personnel interpret them to developers, developers write code and then auditors periodically check in to make sure the code complies.

The sample below shows how various lines of Legalease code — which is currently limited to attributes around the type of data, where it’s stored, who can access it and for what purpose — map to certain clauses in the Bing privacy policy as of October 2013.

Source: Microsoft Research

Guha and his team hope the new framework will speed the process and make it more accurate by letting all of these steps occur in parallel. Lawyers and privacy personnel can encode their policies using Legalease and run them against Grok to identify code that might be affected. Privacy managers can then, for example, approach developers with the couple thousand lines of code that might be affected. This way, developers don’t waste time trying to figure which code needs changing and how to change it, and auditors can continuously check whether code is in compliance.

Guha analogized the project to creating a map of the city, where Grok is the map itself and Legalease is the legend. Lawyers and privacy managers might be like city planners who, upon identifying a problem or crafting a new policy, could approach developers and say, “‘We understand which part of the city you’re working in, where your neighborhood is and exactly what you need to do,'” he explained.

If they get lost while working, the policies encoded using Legalese should help them figure out they need to do.