Posted
by
timothyon Friday October 15, 2010 @07:27PM
from the very-very-carefully dept.

and so forth writes "Cornell lost a laptop last year with SSNs. Now, they've mandated scanning every computer at the University for the following items: social security numbers; credit card numbers; driver's license numbers; bank account numbers; and protected health information, as defined by HIPAA. The main tools are Identityfinder (commercial software for Windows and Mac), spider (Cornell software for Windows from 2008) and Find_SSN (python script from Virginia Tech). The effort raises both technical questions (false positives, anyone?) and practical issues (should I trust closed source software to do this?). Have other Universities succeeded at removing confidential data? Success, here, should probably be gauged in terms of diminished legal liability after the attempted clean up has been completed." Note: this program affects the computers of university employees and offices, rather than students' personal machines.

It only seems like a good idea. Its likely to miss things, and have false positives.

A better idea is...mandate full disk encryption. I have done it on my linux based laptop for years, 3 years before my company mandated it. Now, its mandatory. They rolled out a canned solution for departments that want it and don't know any better, and to the rest of us just say "its your ass if its not encrypted" and they make everyone certify, every six months, that if they use a laptop for work, its disk is encrypted.

However, if this scanning were implemented, then your own personal data would cause false positives.

Also, while thats a great long term goal, it means that any work that currently involves storing this data on the local machine needs to be changed. That could be a problem in one place, for one reason, or it could be in many places for many reasons. In short, thats going to take time to sort out and migrate over to the new model.

In the mean time, you can mandate disk encryption and solve the most common case

1. The process takes entirely too long and if the person doesn't wait and walks away or just turns it off, the thief could still get the data. They used rdist when I was in college for campus kiosk computers. It was fucking miserable to wait for one of these bastards to boot or shutdown in the case of there being a problem which required a reboot (at the time a frequent necessity).

2. The computer isn't permitted to store any data and thus becomes pretty use

That is, until you, as a professor, go to the slopes of Mt. Kilimanjaro for a month to do research. At that point, the assumption of 'always connected' is incorrect, and you must carry data with you. Frequently, you must also carry some forms of student information, too, in order to respond to emails that you get from students when you are in town at the internet cafe once per week.

Something tells me that the configuration used for desktops (which pretty much anyone can use in most cases and can be taken as always connected) and laptops (which typically are used by a single person and not always connected) will differ.

"1. The process takes entirely too long and if the person doesn't wait and walks away or just turns it off, the thief could still get the data. They used rdist when I was in college for campus kiosk computers. It was fucking miserable to wait for one of these bastards to boot or shutdown in the case of there being a problem which required a reboot (at the time a frequent necessity)."

Eww, yea that's not the best way to do it at all (Having to wait on anything that is.)

It's actually something I quite liked as an idea - taking a look at Stelios' Easy Internet Cafes in the UK - when the user logs off, the PC becomes unavailable for about 6-7 mins, while it completely restores the harddrive, from a multicast broadcast on the local network.The storage on the machine is strongly limited - but you're sure that unless someone were to infect the master (difficult) that noone could leave anything on the machine that you don't like.

Yes, everyone is moving this direction now. They don't use local storage for their lesson plans, they use a NAS. Now instead of the data being stuck on your desktop when you forget it, it's on an everywhere accessible network location.

That's pretty dumb. Removable storage just means some bad guy can walk away with the data on the external drive.

What Identityfinder does (our university mandated putting it on all faculty and staff computers at the beginning of the year) is force you to decide to either remove the data from the machine, or *encrypt it*.

Your solution is entirely inadequate. I am a professor at a Big10 university. I have 41GB of files and data in my Docs directory. The point is, how do I know which of these files are protected? Your quip of "Anything you need saved goes on removable storage" is ridiculous. How do you protect the removable medium? What happens if you lose it? It was the loss of a removable disk 4 years ago that caused our state (and ergo our university) to attempt this same exercise. All of the tools are pretty useless. Its

I know a lot of scientists who would be quite annoyed if the people from the IT department (who are clueless policy-obsessed wankers at my institution) came in and wanted to search through a bunch of simulation results and LaTeX files looking for SSN's.

a) too fucking bad.b) Sign this waver that says you are legally responsible if your repository of data were to contain information such as SSN/Credit Card etc.

I don't get the premise of the article. Scanning for credit card data and SSN is quite easy and simple. It's no more intrusive than a virus scan. Being opened, or closed source doesn't make any bloody difference either.

Intrusion detection systems should also be running and scanning for data that conforms with SSN or creditcard formats.

b) Sign this waver that says you are legally responsible if your repository of data were to contain information such as SSN/Credit Card etc.

Unless he then shoves the waiver up the manager of the IT department's nose, that waiver won't do anything, the IT department will refer him to a secretary who will refer him to some policy and the comittee for something or other who will meet once a year and won't discuss it with him. Universities are usually more bureaucratic and inflexible than your local DMV.

Which is why Cornell will try to scan every computer on campus, not just those ones which are likely to have student or employee information on th

The only difficulty with this attitude is that it's only going to work for the Russian and Dance Departments. If you try it in Physics or Chemistry or Engineering, where a generic professor can be responsible for $1 to $2 million a year in no strings attached research overhead that goes straight into the university's hungry coffers, you will be quickly educated in the different levels of deference applied to cost centers (like IT) and profit centers (like research departments).

When you are the legal entity responsible for the data you get to draw up the rules.And, while these types of people like to think they are above and beyond rules and regulations the equipment they use is not their personal property. Property purchased, supplied, or on the premises of such an organization is subject to the rules and regulations.

This stuff isn't a game. It's not about building pyramids, pleasing egos, or otherwise.

By the way, being responsible for a couple million dollars / year is jack-squa

Because, point of fact, they do not know how to be responsible with it.

A professors priority is his/her research.

If the professor was held solely responsible for the data in the eyes of the law then there probably would not be an issue. Instead the organization that actually owns the facilities is responsible for it.Professors are employees. I know they don't like to consider themselves as such, but they keep accepting paycheques and using the facilities.

Sure, if you don't mind a false positive rate of 99%, which is what a colleague of mine got when he ran an automated tool on his machine that contained hundreds of GBs of particle physics data. Shocking, but its not hard to find 9 digit numbers (SSNs) or 15-18 digit numbers (CC) when you look at large repositories of quantitative data.

So the problem is, if you are mandated to run such software, and you get 1 million possible #s, what do you do

And a) is the reason my department does not trust IT cowboys with any of our data. This is data that cost actual money to generate, not some shit we downloaded off BitTorrent for fun. I hope you get fired.

Well are you an arrogant and self-important little bugger. The fact is that improperly retaining and losing privacy act data costs money and reputation too (just ask the Veterans Administration). Potentially a lot more than some professors grading data where he stupidly tracks students by their full soc number. Or the sociology researcher keeping a huge database of personal info on their test subjects. The mandate for this action did not originate with the IT folks, but they were tasked to implement the

As someone who works rather intimately with the department-level IT guys at a university, this would be a disaster. They don't have time to install automatic encryption software on everyone's disk, and *we* don't have time to wait on the computer to run crypto on it. Yes, disk encryption software is pretty fast -- but it's slower than the RAID we're pulling the data from, and the CPU is busy doing other things.

If you can't trust your professors to follow *reasonable* instructions about the protection of per

Well, I can tell you that at Ohio State University, this is exactly what has happened. Effectively, every single machine that _may_ have ever had 'sensitive' (FERPA or HIPAA or Grant-defined) data on it must be encrypted. If it is lost & not encrypted, then it is the owner's burden of proof to prove that no sensitive data was on the machine; which is only possible if you have a complete & recent backup.

So, it can be done, but it is very expensive (though much cheaper now--BitLocker is really nice on

Potentially a lot more than some professors grading data where he stupidly tracks students by their full soc number.

How would a professor get the students' SSNs in the first place? The university should have no need for SSNs assigned to anyone except employees.

Obviously you are younger. It was very common practice for schools, universities, public libraries, health professionals, and even some small businesses to request and use your soc number as your ID. A good deal of this cleanup is to find that old data that probably isn't even being actively used anymore.

Also keep in mind that in some cases the University does need your soc number for doing tax forms and dealing with some govt grants. Obviously the profs don't need it and it shouldn't be your uni library

Frist of all, there is no reason that there would be any privacy information on a research lab computer. The only thing that is on my research lab computers is the software and data for the research. The grants that paid for the equipment prevent anything else from being stored on it anyways. Second of all, if professors know the social insurance number of their students then there is something even more wrong with the administration of the university. Students at our university are identified by a student

It depends on the field of research. Medical researchers will often have 'sensitive' (HIPAA in the US) data on their test subjects. My university, like many others, until recently, indexed all students with SSNs, and if I downloaded a roster for use in Excel, I got the SSNs with no option to delete them. That's what really angered me; I didn't want or need the data, but they (the Uni) shoved it down my throat & then threw a fit years later and pushed the cost of fixing the problem down to the department

And that "I know better" attitude is precisely why the university is going to be putting this program in place. To say nothing of the reputation damage, HIPAA violations ain't cheap. So your "this data cost money" argument falls completely flat when doing nothing can cost money as well.

I have 20 years of IT experience, including bringing a companies into PCI compliance after a breach.

Scanning data, identifying what it contains and locking it down are not difficult tasks. 99% of the data scanned is unlikely to trip false positives and is a complete non-issue. The remaining data can be quickly categorized as likely, or unlikely to be relevant with an appropriate perusal. The remaining data, actual non-compliant data, will consume the most time in dealing with properly.

Because not so long ago, it was common practice to use a student's SSN as their student ID number. In ~2001 and ~2004, I attended schools which changed their policies on this matter in those years, respectively. For each school, I started with a student ID that was the same digits as my SSN, and when I was graduated, I had a new student ID that was an unrelated string of digits.

Using the SSN as an ID is very convenient. For every incoming person, you have a unique number that they probably already have

Seems like you need to stop using SSN's as ID numbers, then, or tell students that they have the option of choosing another student ID number (and that if they use their SSN and Bad Stuff happens then it's their issue).

I know a lot of scientists who would be quite annoyed if the people from the IT department (who are clueless policy-obsessed wankers at my institution) came in and wanted to search through a bunch of simulation results and LaTeX files looking for SSN's.

As someone who has worked in an academic research group, I can attest to that. If such a program were instituted at my university, myself and others in our group would probably be less than forthcoming about the number and location of computers in our group. We certainly wouldn't relish the idea of giving folks from the IT department root access to all our Unix/Linux boxes which they would probably need to perform the kind of scan they're trying to perform.

I work at a university, I generally agree with your assessment. The vast majority of academic types get uncomfortable with any kind of monitoring. They do seem to accept that IT has admin rights on most things. What's great is that they refuse to accept any kind of content filtering on the campus network connection. I've also heard of professors having their connections shutdown for excessive bandwidth use who raised hell because it interfered with their academic freedom. I remember one story about a profes

Yes it does include professors. As a Ph.D. doing medically related research at a university, I've got some PHI data I need to include in some studies. It's encrypted and stored on secured servers. That's the way it's supposed to be. All the scanning software does is make sure you have it encrypted and not just lying around. THAT'S A GOOD THING.

Other reasons professors who aren't working with medical research need to do it.. Some of our departments used to use student SSNs for a lot of things. Data te

Then the correct policy is "Don't haphazardly store personal data on machines without considering what you are doing". There is no reason to barge into Dr. Smith's office, who's madly creating his slides for the conference next week while trying to babysit a supercomputer at Berkeley while fending off emails from his students, and insist in a very bureaucratic tone that you have to scan his workstation, the RAID, his other computer, his student's computer, and the two computers used to monitor various instr

Then the correct policy is "Don't haphazardly store personal data on machines without considering what you are doing". There is no reason to barge into Dr. Smith's office, who's madly creating his slides for the conference next week while trying to babysit a supercomputer at Berkeley while fending off emails from his students, and insist in a very bureaucratic tone that you have to scan his workstation, the RAID, his other computer, his student's computer, and the two computers used to monitor various instruments (which the other students are taking data on) for SSN's.

Unfortunately, Dr Smith is taking his laptop to the conference. He's much too busy to go on travel without taking all of his data with him on the laptop, such as his students grading info (SSNs) or info on the other proprietary projects he's working on. He he's too important to worry about such trivialities such as data protection policies issued by those idiots on the Board of Directors. After all drive encryption slows things down too much he hears, but in truth he doesn't know how to set it up. Of course his laptop gets stolen and now the University has to report that data was compromised. Suddenly Dr Smith is no longer an asset to the university but rather a liability.

Sorry, but anyone who has worked in IT or even law enforcement knows damn well that users will ignore written policies unless there is some level of monitoring and enforcement. Just scroll up a bit and you'll see examples of those guys posting stuff like "just store the ssn as an integer so they scripts don't find it".

No one barged into anyone's office at a moment's notice. They notified us months before that this would be coming down, and gave a few more months to install the scanning software. Why is everyone on the other side of the conversation so out of touch with reality?

All employees must acknowledge their custodial responsibility for the university information on the computer(s) and associated storage they use in the conduct of university business, whether university property or personally owned. This includes:

* The internal drives of their workstations, both laptops and desktops;

* External drives;

* Mobile devices such as smart phones and PDAs;

* Portable media, such as USB flash drives, used to store or transport university data;

* Email messages and associated attachments, including copies stored on an email server;

I have, in the past, used my desktop system to order things from suppliers. That is clearly "doing university business". Sometimes I save mail messages on my file servers. I sometimes plug a USB stick into one of them. I have about two dozen USB sticks.

Is that while they can have sensitive data, they need to protect it. That does often mean working with the "meal 'ole IT department." I highly doubt Cornell will say "Nobody can have any private data, at all, ever." Not only would it is dumb to preclude them from getting some grants, it wouldn't work since their student records, payroll, etc will have said info. What they are probably doing is making sure that if you do have it, you secure it.

No not so much. I'm not an IT administrator, just an IT support guy, and I work for a very nice, very accommodating boss. I also work with a lot of idiotic researchers. Well ok not a lot, but a few. Most of the researchers are pretty reasonable people, but some aren't. They just want to do their own thing and not be bothered, and then wail and howl when something gets infected and we have the gall not to make it the #1 highest priority to fix (by departmental policy, research systems are 5th priority).

It's the people who believe they are above these rules that usually end up spilling personal data.

I've taught at a university. I can tell you right now I would definitely audit profs machines.

And to be honest to bad if they are annoyed. Suck it up as they say.

Or another alternative is simply to lock the laptop in a desk drawer and when the IT guy asks if you have a computer in your office, say no.

Seriously though, why would you expect a professor to have credit card numbers or SSNs on a research computer? Would you mind if your home computer were searched to make sure that it isn't storing expunged criminal records from juveniles in Botswana?

"Seriously though, why would you expect a professor to have credit card numbers or SSNs on a research computer? "

This is exactly the reason for the scan.

My home computer is not a institutional computer. It does not carry a burden of trust. Thus is not under the dual responsibility of institutional governance and private. It's the institutional scan that is being discussed. They, the institution has a responsibility to scan it's equipment.

My home computer is not a institutional computer. It does not carry a burden of trust.

If you mean by "institutional" that the computers are the property of the university, then I suppose they are but do they carry a "burden of trust"? I hardly think so.

I think it's relevant to ask what is the probability that these computers will be storing credit card numbers and personal data. I don't see why a physics professor's research computer would. Should the university also examine the contents of the professors pencil sharpener and coffee pot for state secrets? They are after all "institutiona

A physics professor is not exclusively involved in physics research. At some point almost every professor will also be and educator. As an educator they will come into contact with personal information that is put into there trust.

There are no exceptions. As coming into contact with personal information is likely for most institutional workers.

-- Note this article was about personal information. Not state secrets. That is a completely different matter when it comes to information handling. Thus the to

We're going to be doing this where I work. I believe it does, if the computer is University owned/administered. IIRC the software is not going to autoclean anything, just generate a list, which can then be followed up on by humans. The whole software installation and execution is centrally automated for desktops. Servers will probably need special attention, and as such, if a server cannot be reasonably expected to interact with personally identifiable information, it likely won't be scanned. However l

I'm at University of Arizona. We switched to PeopleSoft this summer, at the cost of $60 million or so (which we can't afford at all -- maybe you have read about our crazy governor?).

Semester starts, grad students don't get paid (sometimes) for a month, grad students get bills for tuition we're not supposed to and get charged late fees, secretaries can't do their jobs helping students... TOTAL NIGHTMARE, and everyone blames it on this PeopleSoft thing.

I'm 100% for this. Personal computers account for very little in data losses. It's these "work" machines that account for the majority of the major information losses around the world.

As long as people are dumb / lazy enough to keep documents in the clear on their machines there will be losses.

I would also go as far as to make certain quantities of types information on a machine illegal as well. For example: 1,000 SSN's, stored on a portable data device un-encrypted is a fine of $10,000. 100,000 SSN's

It sounds like they are looking to catch accidental leaks.I would like to know if they have examined their policies to reduce over-collection of unnecessary data.If they never collect it in the first place, then they never have to worry about losing control of it later on.

It sounds like they are looking to catch accidental leaks.I would like to know if they have examined their policies to reduce over-collection of unnecessary data.If they never collect it in the first place, then they never have to worry about losing control of it later on.

Most leaks aren't accidental. It's laptops which never should have had the data saved locally getting stolen, or systems getting hacked into. First step is understanding where the data is, the second step is removing it from where it shouldn't be, and third providing adequate protection for the areas it must remain (ie encryption).

You have an excellent point about it being overcollected in the first place. Technically the way most institutions use SSNs isn't legal anyway. In the US its only legal to use

And the war continued, with progressively more redundant copies usingprogressively more of the disk farm, and the encryption methods evolvingunder the selection pressure of the system administrators' decryptionefforts.

Disk encryption has some serious usability and productivity issues. Specifically: to have even 128 bits of encryption, you must have a twenty-character completely random password. And performance will be hurt with it--ALL data must travel through the CPU. There's no DMA with encryption.

We should also keep reminding ourselves that encrypted data is inherently full of false-positives, for any sort of data test you feed it to. Yes, the sane thing to do would be to note that a file is encrypted and thus not amenable to string-comparison testing. But experience shows that this is not the way it'll be done.

I've seen any number of false positives that resulted from scans of binary executables by programs that didn't verify that their data was even vaguely formatted like the expected data. The

Seriosuly. There are a couple of full-volume encryption options available right now, Windows (Vista and up) even has a built-in one. Lacking that, encrypt individual files (all versions of Windows since 2000) or use an encrypted folder or volume image created using any number of third-party options.

At some businesses, especially ones handling sensitive data, having unsecured sensitive data (present in clear text on a removable device, including a laptop) is grounds for termination. I don't think a universit

Ohio State relies on their institutional data policy [osu.edu] and Disclosure or Exposure of Personal Information policy [osu.edu]. Essentially, any protected information has to be kept on encrypted devices. That worked fairly well, except once they had all their computers encrypted they quit paying the license fees to PGP. They didn't know the software, which they thought was only pre-boot authentication, phoned home and had a DRM time-bomb in it to automatically drop everything Windows was doing, and spend a couple hours decrypting the whole drive after a certain date if the subscription wasn't renewed. I'd be pretty weary of trusting that kind of task to proprietary software, especially if it requires a subscription like ours did. Posted AC for obvious reasons. If it's closed source, you never know what kind of trick the vendor might be able to pull on you.

We did this where I work recently, small-ish private university, lots of science, a hospital, etc. All the faculty and staff had to run IDF. The tech guys came in and installed it and showed everyone how to run it but weren't allowed to see it being run. The person was required to run it and sort through the results themselves. All of my department ran it fine, no problems, no complaints, other than spending time sorting results. It really wasn't that big a deal.

All I have to do now is infect the (probably windows-based) servers that host the scanning software and scan the memory for patterns resembling SSN#'s, ets. and make off with potentially an entire university's personal information? I say memory, cause I know no one would be dumb enough to search for that sort of sensitive information and then actually just log it into a centralized location for no reason. Right? Right?

Although it is good to make sure that any computer does not have any unnecessary personal/private data, and also good to have searching software that might help locate some or most of it. It is unrealistic to except to be able to insure that such data will be kept off all computers, especially when there might be some situations where there is a legitimate need to have access to such data offline.

The best solution is to use whole disk encryption with the free opensource TrueCrypt software.

Although it is a shame that TrueCrypt does not support whole disk encryption on the Mac yet. At least there are some less trust-worthy closed options like PGP Whole Disk Encryption, which would be better than nothing.

One place where PGP has a genuine advantage is manageability. With Truecrypt, the best you can do to recover from someone forgetting their passphrase is to keep a copy of a rescue CD (which is generated at encryption time) and hope they never get curious enough to decrypt and re-encrypt their drive - a function which can't be locked down. If they do, new keys will be generated and the CD will be useless.

Oh yes, and you need to store that CD somewhere safely. One CD for every computer. In a university wh

The OP says that a practical issue is whether one should trust closed source software to do this?
Because, of course, being closed source should implicitly invoke gloomy music, dark clouds and cause people to break out in a cold sweat? Seriously, enough with this bullc*** already... There's nothing inherently wrong with running closed source software, nor is a given piece of software magically better by virtue of being open-source, nor are open-source developers somehow better than those who develop closed-source software.
There's legitimate arguments to be made that open-source has advantages. That open-source is, somehow, more trustworthy, isn't one such argument. And it's high time we stopped peddling it as one, or accepting it as one.

Should you trust closed source software to do this scan?
Should you trust the bank managing your transactions?
Should you trust closed source software in medical equipment?
Should you trust SAP to manage your financial transactions?
Should you trust a Windows computer for anything more important than your gmail password?
Should you trust Google Chrome when logging into your netbanking?

You know what? I think on the grand scheme of things trusting a piece of closed source software specifically designed to search for information made by a company which would literally be sued into oblivion if they did what the article was hinting at, ranks pretty damn low on the list of things I worry about.

too easy for someone to get the source and "tweak" it before compiling. Most problems happen on the "inside" - a rogue sys-admin could do anything with open source - no-one is going to be able to prove the binary doesn't match the source or who changed it

Point b is only true upto a point; from memory most of the 'os native' encryption only provides for upto baseline encryption standard - which isn't sufficient to, for instance, downgrade restricted to not protectively marked (least in the security classification scheme HMG uses); enhanced grade products are needed which is likely to mean they've been through a certification scheme and only maintain the certification when deployed in accordance with the criteria used to certify the product in the first place

so they plan on unzipping every zip zile, decoding every pdf and docx, sfw and every other crap-laden file format 'export to' option? Not to mention parsing email attachments in god-knows-what format (can't trust the mimetype header)? And where is the cpu time coming from? Do you expect users to sit there while their laptop turns into a george foreman grill? I just don't see how any of this is possible given most infrastructures.

I worked in IT for Cornell both as a student and for a short time after graduating (8 years ago). This honestly isn't news.

Hardly any employee computers have this kind of stuff on them to begin with. Most of it is stored on servers and not in a format you can say 'dump to Excel on my laptop.' I did some work with the admissions database as a student and I had to promise in writing in triplicate that I would be very careful with that data Or Else and even so, I never needed to download any of it. If I'd sudd

As a member of IT at a health care based company I can tell you that the machine sitting in the cube really isn't the problem. The problem is the laptops that get stolen off site, the CDs/DVDs of data that don't get disposed of correctly and the e-mails that flow with data that should never been seen outside of the company. This is to say nothing for those who try to take the data out on purpose.