I try to find datasets related to Data breach. Searching does pull only reports.
I was thinking to use social media, news search and indexing all data, cleaning, mining but I do not have HW resources for running this jobs.

I think you'll find it difficult to find this data for two reasons: 1) there is no commonly accepted data model for recording breaches, so even if someone has made a dataset for research, they may or may not have chosen the right data fields to collect for each breach, and 2) there are a lot of forces with an interest in keeping specifics of data breaches private, either because of shame, or because of a belief that too much easily accessible information would cause more breaches to happen.
– Joe GermuskaMay 18 '15 at 18:20

What kind of data are you interested in? A) The leaked data itself? B) Metadata about the breaches, for instance leak date, public announcement date, name of company, number of leaked records, degree of data sensitivity?
– Nicolas RaoulMay 19 '15 at 3:15

1

@Nicolas, I do look for "B" option Metadata. I try to do advanced analytics. Was thinking to indexing but will need HPC power to index all the raw data related to data breach. "Information is Beautiful" is good start. Thank you.
– n1tkMay 19 '15 at 11:54