Updated as of August 2014, this practical book will demonstrate proven methods for anonymizing health data to help your organization share meaningful datasets, without exposing patient identity. Leading experts Khaled El Emam and Luk Arbuckle walk you through a risk-based methodology, using case studies from their efforts to de-identify hundreds of datasets.

Clinical data is valuable for research and other types of analytics, but making it anonymous without compromising data quality is tricky. This book demonstrates techniques for handling different data types, based on the authors’ experiences with a maternal-child registry, inpatient discharge abstracts, health insurance claims, electronic medical record databases, and the World Trade Center disaster registry, among others.

Understand different methods for working with cross-sectional and longitudinal datasets

Assess the risk of adversaries who attempt to re-identify patients in anonymized datasets

Reduce the size and complexity of massive datasets without losing key information or jeopardizing privacy

Khaled El Emam

Dr. Khaled El Emam is an Associate Professor at the University of Ottawa, Faculty of Medicine, a senior investigator at the Children's Hospital of Eastern Ontario Research Institute, and a Canada Research Chair in Electronic Health Information at the University of Ottawa. He is also the Founder and CEO of Privacy Analytics, Inc. His main area of research is developing techniques for health data de-identification/anonymization and secure computation protocols for health research and public health purposes. He has made many contributions to the health privacy area.

Luk Arbuckle

Luk Arbuckle has been crunching numbers for a decade. He originally plied his trade in the area of image processing and analysis, and then in the area of applied statistics. Since joining the Electronic Health Information Laboratory (EHIL) at the CHEO Research Institute he has worked on methods to de-identify health data, participated in the development and evaluation of secure computation protocols, and provided all manner of statistical support. As a consultant with Privacy Analytics, he has also been heavily involved in conducting risk analyses on the re-identification of patients in health data.

The animals on the cover of Anonymizing Health Data are Atlantic herring (Clupea harengus), one of the most abundant fish species in the entire world. They can be found on both sides of the Atlantic Ocean and congregate in schools that can include hundreds of thousands of individuals.

These silver fish grow quickly and can reach 14 inches in length. They can live up to 15 years and females lay as many as 200,000 eggs over their lives. Herring play a key role in the food web of the northwest Atlantic Ocean: bottom-dwelling fish like flounder, cod, and haddock feed on herring eggs, and juvenile herring are preyed upon by dolphins, sharks, skates, sea lions, squid, orca whales, and sea birds.

Despite being so important to the ecology of the ocean, the herring population has suffered from overfishing in the past. The lowest point for the Atlantic herring came during the 1960s when foreign fleets began harvesting herring and decimated the population within ten years. In 1976, Congress passed the Magnuson-Stevens Act to regulate domestic fisheries, and the Atlantic herring population has made a great resurgence since then.

Herring fisheries are especially important in the American northeast, where the fish are sold frozen, salted, canned as sardines, or in bulk as bait for lobster and tuna fishermen. In 2011, the total herring harvest was worth over $24 million. Fisheries in New England and Canada do especially well because herring tend to congregate near the coast in the cold waters of the Gulf of Maine and Gulf of St. Lawrence. As long as the current regulations on fisheries stand, the Atlantic herring will continue to be a very important member of both the Atlantic Ocean's ecosystem and our worldwide economy.

I recommend this book to anyone who works with patient health records. The authors describe their risk-based methodology to protect a patients privacy while still providing quality data for secondary use.

The book is well written. It's easy to follow and the authors do a god job explaining their points.

The first 2 chapters are the most important as they describe the methodology in detail. The rest of the book uses the same approach against real examples the authors have experienced.