Verizon detected porn in his backups.

A deacon at St. Joseph's Church in Fullerton, Maryland, a suburb of Baltimore, was arrested last week for possession of child pornography after Verizon detected images and videos on its cloud backup servers of children performing sexual acts.

William Steven Albaugh, a 67-year old retiree ordained as a Catholic deacon in 1996, was taken into custody by Baltimore County Police officers on March 1. He admitted to collecting child pornography since the 1970s, but he claimed not to have been involved in creating the porn himself. Police believe none of the images are of children from the church, its school, or the Fullerton community.

Verizon detected the pornographic images stored in Albaugh's Verizon Online Backup and Sharing account. The company reported his account to Center for Missing and Exploited Children, who in turn passed the information to Baltimore County law enforcement. Police investigating the case found files both on his Verizon account and on a flash drive, and authorities seized two PCs and an iPad. Albaugh said he used the iPad to view "nudist websites that include pictures of children," The Baltimore Sun reports.

I for one am glad this guy was an idiot because, regardless of anything else, this proves once and for all that providers are running automated scans on all the files we upload to the cloud. The ramifications are huge, because while it's pretty easy to agree that child porn is bad, we have no way of knowing what else they're looking for.

I for one am glad this guy was an idiot because, regardless of anything else, this proves once and for all that providers are running automated scans on all the files we upload to the cloud. The ramifications are huge, because while it's pretty easy to agree that child porn is bad, we have no way of knowing what else they're looking for.

Came here to saying something similar and am glad I'm not the only one that thought this.

I imagine the software is similar to Google's "Search by image" feature. Anyone know how it works--at a high level?

They probably just calculate a checksum for each uploaded file and match it against a list of known illegal files provided by a governmental agency (and maybe some entertainment megacorps as well?)

The checksum is needed for the backup operation anyway so it's not expensive to do an extra check against a "bad" list. I'd be surprised if they ran anything more complicated than that unless whoever is providing the list also funded some additional computing power...

Also when the list is updated they can do another quick check on all the stored files since the checksum is almost certainly indexed in a global database.

They probably just calculate a checksum for each uploaded file and match it against a list of known illegal files provided by a governmental agency (and maybe some entertainment megacorps as well?)

The checksum is needed for the backup operation anyway so it's not expensive to do an extra check against a "bad" list. I'd be surprised if they ran anything more complicated than that unless whoever is providing the list also funded some additional computing power...

Multiple files can generate the same checksum. If this weren't the case, a checksum would be the world's most efficient data compression algorithm. A checksum can generally guarantee that a file hasn't been corrupted, but not that files are identical.

I'm glad they're finding people like this a d bringing them to justice, but on hindsight like it has been said before, most corporations are betting that the future is cloud storage. If they don't own said cloud, they're pretty much giving away their information to whatever company owns and manages the cloud servers.

"Oh noes, a paedophile". The real story is "Verizon is scanning everything you store through them". The follow-up questions are "How are they scanning? What are they scanning for?" and "Where does the privacy and security of my files fit into this scanning?".

Multiple files can generate the same checksum. If this weren't the case, a checksum would be the world's most efficient data compression algorithm. A checksum can generally guarantee that a file hasn't been corrupted, but not that files are identical.

Even MD5 (the fastest checksum usable for that purpose) isn't remotely likely to give you an accidental collision ( 2^-128)

In any case I'm sure they don't send the cops until someone actually looks at the file

I imagine the software is similar to Google's "Search by image" feature. Anyone know how it works--at a high level?

In digital forensics, it is common to use pre-compiled lists of hash values of confirmed child pornography to detect child pornography. That's much easier on the investigator than thumbing through thousands of pictures. These lists of hash values (absent the images, of course) are available with many forensics software packages.

I doubt very much they have a human reviewing pictures, but I would not be surprised if they're calculating hash values on all the files you upload and comparing them with known child porn lists.

Multiple files can generate the same checksum. If this weren't the case, a checksum would be the world's most efficient data compression algorithm. A checksum can generally guarantee that a file hasn't been corrupted, but not that files are identical.

Even lowly MD5 is a 128 bit hash, meaning there are about 3.4e^38 possible values (2^128). True, you can create an MD5 collision in about 2^24 attempts, but if you're not trying to create an MD5 collision, even lowly little MD5 is a pretty effective hash.

Ensuring a file hasn't been altered is actually where MD5 is most broken and SHA256 should be used. It's too easy to create a malicious file with the same MD5 hash value as a legitimate file.

But, yes, it is possible for two files to have the same hash value. So, you wait for the second hit, then the third, then you call the police. Or, you have some poor sap actually look at the pictures to confirm.

From what i understand, clouds are optimized to reduce redundancy. Verizon probably already has an existing database of child porn, and whatever this bloke uploaded was simply linked to their file, instead of saving his own file. I doubt they 'monitor' any other kinds of content, but i can understand the paranoia that people have of cloud storage. Personally i just dont 'get' cloud storage period.

What is there to not get? It can be convenient and a form of protection, maybe even a redundancy. My problems with it are expense and privacy, but it is easy to see why it is desired.

Addendum: Have to chime in that the problem is Verizon not encrypting the backup before uploading, if they encrypt at all. It should be S.O.P. for all backup systems (including your own hack together.)

While that is easy enough to say...that is quite unacceptable actually.

No, it is both easy and acceptable to say. Look around you at the constant hacking. You are naive if you think placing data in the cloud is a safe thing to do. Worse yet if you somehow expect a contract to protect you from hacking.

The only limited security you get would be self encrypted objects placed in the cloud as mentioned above. Even then you are looking at a probably finite length of time that it remains secure.

They'll fight for the privacy of those suspected of MPAA copyright infringement but turn over someone transmitting child pornography. Sounds fair so far.

I'm not sure I understand what the problem is.People suspected of copyright infringement, and people who have uploaded child porn. Big difference. They won't be subject to innumerable irritating requests to identify dudes if they sign on to being anti-CP, but they will if they get into being copyright police.

While in this instance good - a deplorable behavior he was engaging in.

I ask this question - why and how do they even have any idea what's in there?

This is exactly why you shouldn't trust cloud services that don't explicitly outline why they have no access to your data.

Personally, there isn't a single "cloud" storage vendor that I have one ounce of trust in, right now.

Note that I would consider "cloud storage" and "cloud backup" two different things.

For example, the cloud backup service that I resell takes measures to ensure that the customer data is absolutely inaccessible by anybody without the pass phrase; and they absolutely don't have the pass phrase unless you explicitly opt-in to the pass phrase recovery option - and even then, it's encrypted in their system and they can't access it (only unlockable by correctly answering the 3+ security questions and entering the password).

Multiple files can generate the same checksum. If this weren't the case, a checksum would be the world's most efficient data compression algorithm. A checksum can generally guarantee that a file hasn't been corrupted, but not that files are identical.

Even MD5 (the fastest checksum usable for that purpose) isn't remotely likely to give you an accidental collision ( 2^-128)

In any case I'm sure they don't send the cops until someone actually looks at the file

Bulldawg is on the right track...there's a database of file signatures used to forensically identify child porn, it's just like virus signatures. There's almost no way people from Verizon put their eyes on it because child porn is contraband. It's illegal to have it on your PC in any way or form...in the temp Internet files cache or anywhere else. Think of these images as heroin...even defense attorneys can't have copies in their possession. My guess is that when Verizon gets a hit against that database they call the FBI, and the FBI handles it from there. Could be that Verizon has some kind of law enforcement on loan, or whatever, and that's the only "someone" from Verizon that would ever look at the positive results of a kiddie porn hit. In this case it sounds like there were multiple hits so the accidental-false-positive issue isn't relevant.

That contraband issue may also be why cloud storage is being scanned, and it could be the only thing being scanned because Verizon can be held liable for those images (digital heroin) sitting on their servers. They have a good reason to keep that crap out.

I hate pedophiles as much as the next guy, but are they really allowed to look at your stuff when you back it up with them? I don't trust clouds to begin with, and this really doesn't help their case.

You should certainly read the TOS if you are uploading private data to such a service. Here is the relevant clause from Verizon Backup Assistant+ TOS

Quote:

Verizon Wireless reserves the right to review information or materials uploaded to the Service or used in any materials and information, in any form, and to remove any such information or materials in its sole discretion, regardless of whether such material does or does not violate this Agreement or any Verizon Wireless policies, guidelines, or other codes of conduct which are applicable to the Service.

This gives them right to review everything you upload for literally any reason, they do not even need to be suspicious of illegal activity.

It is entirely possible this was not even an automated review. I used to work for a very large corporation that had access to numerous clients computers and all of their subsequent data. While there were rules in place to prevent snooping, lots of employees did not follow them. Worse still management did not seem to mind, presumably because the corporation got good press on occasion when an employee "incidentally happened across child porn" and reported it to the cops.

Multiple files can generate the same checksum. If this weren't the case, a checksum would be the world's most efficient data compression algorithm. A checksum can generally guarantee that a file hasn't been corrupted, but not that files are identical.

Even MD5 (the fastest checksum usable for that purpose) isn't remotely likely to give you an accidental collision ( 2^-128)

In any case I'm sure they don't send the cops until someone actually looks at the file

I'd guess an extra level to prevent random collisions would be to compare file sizes as well. It's one thing to have a MD5 collisions, but an MD5 collision on file with the same ~3.5 million byte size would seem especially unlikely (unless you're trying to get an MD5 collision.)

But I bet it was something more mundane that got him caught - probably by using very obvious file/folder names.

Even MD5 (the fastest checksum usable for that purpose) isn't remotely likely to give you an accidental collision ( 2^-128)

That's what I thought using MD5 to check my family pictures for duplicates until I found a collision between two distinctly different image files. "[Not] remotely likely" is not the same as "impossible"... I'm using SHA-1 now, keeping my fingers crossed.

theblop wrote:

In any case I'm sure they don't send the cops until someone actually looks at the file

I really hope that's true, but due to the whole legal mess around child pornography it might be dangerous for anybody inside the company to even look at the images (because they would have to create a copy in memory to do that, making themselves liable), so I'm not betting the farm on it.

The only thing that I can see that avoids this and prevents the provider from messing around in my data is client-side encryption. Prevents deduplication, but the only safe thing to do. I just hope my cloud backup service actually does that the way it promises...

I hate pedophiles as much as the next guy, but are they really allowed to look at your stuff when you back it up with them? I don't trust clouds to begin with, and this really doesn't help their case.

When you do a virus scan, are you worried about the company that created it having access to your data?

Think of it as a virus scan. Files are scanned to check for file hashes. If a file hash comes up a match for a file containing child porn, copyrighted content, etc, it gets forwarded to a specialized team/police for manual intervention.

I highly doubt verizon has a team of agents sitting at a computer looking through peoples files. As for them "having access" to your files... Of course they do. You are backing up your files to a server owned and operated by someone not yourself. Someone else is going to have access to them, there's no getting away from it.

Note that the service he used is labelled as an "online backup AND sharing" service.

That leaves the question as to how VZ "detected" it wide open. The dolt might have been uploading and sharing the stuff. I assume "free kiddie porn" on an unprotected share could generate an unexpected amount of traffic which probably does set off alarms at VZ (similar to if you uploaded some copyrighted content and shared it, generating a ton of traffic).

Not saying that's necessarily what happened, but it's in the realm of possibility.

As for the crime itself, fuck the catholic church right to hell. There's no other institution in this country that could get away with not just the amount of abuse they have but the mob-like maneuvering to conceal the abuse (shuffling "bad priests" from town to town, state to state; moving money around so that victims cannot be properly compensated, and on and on). Sorry for the OT on that.

Multiple files can generate the same checksum. If this weren't the case, a checksum would be the world's most efficient data compression algorithm. A checksum can generally guarantee that a file hasn't been corrupted, but not that files are identical.

Even lowly MD5 is a 128 bit hash, meaning there are about 3.4e^38 possible values (2^128). True, you can create an MD5 collision in about 2^24 attempts, but if you're not trying to create an MD5 collision, even lowly little MD5 is a pretty effective hash.

Ensuring a file hasn't been altered is actually where MD5 is most broken and SHA256 should be used. It's too easy to create a malicious file with the same MD5 hash value as a legitimate file.

But, yes, it is possible for two files to have the same hash value. So, you wait for the second hit, then the third, then you call the police. Or, you have some poor sap actually look at the pictures to confirm.

edit=theblop beat me to it.

How many users uploading how many files each do you need to achieve 2^24 "attempts"?

100,000 users with 100,000 files each gets you up to 10^10. 2^24 = 1.7x10^7. That means 100,000 users with 100,000 files each is 2^24 600 times. But that's based on trying to match one target. But if your database of target files contains, say, 10,000 files, then it happens 6 million times. (I have no idea how many users Verizon has for this system, how many files they upload, or how many match targets they are tested against, these are all wild guesses.)

Assuming these are system backups, each user is also modifying many of those files so the number of files they expose to testing for a match increases over time.

This also assumes the hash is truly random, I don't know how safe that assumption is. If it's based on the contents of the files also being random, that's not a safe assumption.

Bottom line, unlikely events can become likely when you are dealing with a large sample size, and it's not always intuitive how large your sample size is.