To address that problem, the Department of Homeland Security created the DHS Data Framework, which consists of two Hadoop data lakes (or data management platforms) that can handle large volumes of information. It also uses attribute-based access controls so that designated users can see data while protecting privacy, civil rights and civil liberties.

“There are a number of different problems that we’re looking to solve with the data framework,” said Paul Reynolds, director of the DHS Data Framework. “Many of them can’t be solved unless you bring the data into one location.”

Law enforcement officials who are investigating a terrorism suspect, for instance, need to look at classified and unclassified data. Until the data framework, there wasn’t an efficient way to do that, especially not in real time, Reynolds said.

The system takes the unclassified data and moves it up to the classified networks, “so the data itself is still unclassified, but it's sitting in a classified spot,” he said.

The classified and unclassified data sit in two separate Hadoop data lakes that use a cross-domain guard to share data in near-real time. When the framework is fully operational, DHS officials expect to have 20 to 25 databases in the lakes. Right now, four are fully operational and nine are being populated.

And they aren’t small databases. Reynolds said one of them has about 70 billion records in it.

The framework is currently only being used for counterterrorism purposes, but he said he expects that it will ultimately be used for additional mission areas.

About the Author

Matt Leonard is a reporter/producer at GCN.

Before joining GCN, Leonard worked as a local reporter for The Smithfield Times in southeastern Virginia. In his time there he wrote about town council meetings, local crime and what to do if a beaver dam floods your back yard. Over the last few years, he has spent time at The Commonwealth Times, The Denver Post and WTVR-CBS 6. He is a graduate of Virginia Commonwealth University, where he received the faculty award for print and online journalism.