I don't need a list of software (I know shopping questions are off topic), but my google-fu is failing me at this. Basically looking for software that will scan a filesystem looking for files that may contain Protected Health Information. Target OS doesn't really matter right now, if someone has a search term I could use that would also help.

2 Answers
2

You probably want to look at two classes of products: Data Leakage Protection (DLP) and intrusion detection systems (IDS). To detect ePHI on a system, you are more or less going to be checking for some common things based upon patterns. For example, in your EMR if you identify patients using a 10 digit code that starts with 4 letters followed by 6 numbers, you would configure the tool to look for that pattern. However, ePHI may not always be so high level, you may need to look for descriptive patterns as well such as "patient test results", etc.

One concerns would be looking inside proprietary apps or files, as well as binary only files that need a special tool to be read. Furthermore, you need to be able to do image reading and possibly OCR.

You may also want to look into web filtering, email rule detection, etc. Health practitioners may email files or notes outside of the EMR which could lead to exposure.

Detecting what is already out there is going to be a pain, because likely no one cared before. If you can get these modules built into EMR or other tools which can be more easily controlled it will save you time in the long run. You can also look at locked down environments, limit tools, etc.

There are lots of vendors in this space who are selling things for HIPAA (as they actually start enforcing HIPAA, there will be more vendors willing to sell you products). You could try just running basic regex search tools against each host (e.g., agent ransack), but that could be a pain.

Exert control by enforcing encryption of ePHI in common business
processes such as email and storage in databases. Prevent the breach
in the first place by blocking its transmission via email and Web, or
block copy from end-user applications to peripheral devices. Simplify
enforcement using policy templates and reporting, which is both built
in and customizable.

We evaluated a bunch of tools like this in a classified setting and found at best they were about 40%-60% effective. The top number we got when we tuned them with a lot of information. Thus the answer I gave.
–
Tek TenguApr 27 '13 at 9:53

I am sure there is... Question is if it is that important you invest in the big $$'s I am sure such software would cost - which I know the cost of HIPAA violations here in the US, are you sure you would feel it is reliable? Seriously? Think about it... PHI is not just terms, it is the association of terms, and other terms. It is also specific types of terms (people, designators/ids, medical conditions) to events, etc. It could also be the mere presence of certain types of record types alone in the context of a record set that contains a term. For instance there are test types that are exclusively associated with a single disease type and that doctype associated with any other document containing my information would be revealing - regardless of the information it contained.

So the bottom line to what I am saying, if you really have to scour and ensure you have file systems that are clean, I would suggest, for your locale you take stock tools and write them yourself. Regex, grep, and the like are your friends. To be honest with you, having spent the past two years in the intelligence community dealing with a lot of exotic intelligence sets and software systems solving a similar problem the tools got in the way.

You make a good point, the software would only be reliable in a certain range, and it would be far cheaper to write the software in house. Even if such software only found names and/or possible SSNs, that would help.
–
BigHomieApr 26 '13 at 16:59