XHTML

Insider Threats and Small Storage Devices

Danny Angus writes about the potential threat posed by small storage devices with large capacity [1]. His post was prompted by a BBC article about Hitachi’s plans for new hard drives [2], they are aiming for 4TB of data on a single drive by 2011 and a 1TB laptop drive. One thing I noticed about the article is that they made the false claim that current drives are limited to 1TB, the storage capacity is determined by the total surface area which is proportional to the square of the radius and the height of the drive (AFAIK there are no practical limits to the number of platters apart from the height of the drive). So if a 5.25 inch hard drive was to be manufactured with today’s technology it should get a capacity equivalent to at least three times the capacity of the larger 3.5 inch drive.

The reason that 5.25 inch drives are not manufactured is that for best performance you want multiple spindles so that multiple operations can be performed concurrently. Using 3.5 inch drives in servers allows the use of more disks for the same amount of space in the rack and the same amount of power. The latest trend is towards 2.5 inch (Small Form Factor AKA SFF) disks for servers to allow more drives for better performance. With 3.5 inch disks a 1U system was limited to 3 disks and a 2U system was often limited to 4 or 5 disks. But with 2.5 inch drives a 2U server can have 10 drives or more. I know of one hardware vendor that plans to entirely cease using 3.5 inch drives and claims that 2.5 inch disks will give better performance, capacity, and power use!

In regard to Danny’s claim (which is entirely correct) about the threat posed by insiders. I don’t believe that a laptop with 1TB of capacity is the threat. In a server room people notice where laptops get connected and there are often strictly enforced policies about connecting machines that don’t belong to the company. I believe that the greatest threat is posed by USB flash devices. For example let’s consider a database with customer name (~20B), birth-date (10B), address (~80B), phone number (~12B), card type (1B), card number (16B), card expiry (5B), and card CVV code (3B). That’s ~155 bytes per record in CSV or TSV format. If you have data for a million customers that’s 155M uncompressed and probably about 50M when compressed with gzip or WinZip (depending on which platform is being ripped). No-one even sells a USB flash device that is smaller than 50M, I recently bought a 2G flash device that was physically very small and cheap (it was in the bargain bin).

The next issue is, what data might be worth stealing that is large enough to not fit on a USB device? I guess that if you want to copy entire network file shares from a corporation then you would need more than the 16G that seems to be the maximum capacity of a USB device at the moment. Another theoretical possibility would be to copy the entire mail spool of a medium to large ISP. For the case of a corporate file server you could probably get the data at reasonable speed, 1TB of data would take 10,000 seconds or 2.8 hours to transfer at gigabit Ethernet speeds (if you max out a GigE link – it could be as much as five times that if the network is congested or if the server is slow). It’s doable, but it would be a rather tense three or more hours waiting by an illegally connected laptop. For the mail server of a large ISP there is often no chance of getting anywhere near line speed, it’s lots of small reads and seek performance is the bottleneck, such servers are usually running close to capacity (and trying to copy data fast would hurt performance and draw unwanted attention).

Another possibility might be to copy the storage of an Intranet search device. If a company has a Google appliance or similar device indexing much of their secret data then copying the indexes would be very useful. It would allow offline searches of the corporate data to prepare a list of files to retrieve later.

It would probably be more useful to get online access to the data from a remote site. I expect that an unethical person could sell remote access to someone who is out of range of extradition. All that would be required would be to intentionally leave a flaw in the security of the system. In most large corporations this could be done in a way that is impossible to prove. For example if management decrees that the Internet servers run some software that is known to be of low quality then a hostile insider could make configuration changes to increase the risk – it would look like an innocent mistake if the problem was ever discovered (the blame would entirely go to the buggy software and the person who recommended it).

A large part of the solution to this problem is to hire good employees. The common checks performed grudgingly by financial companies are grossly inadequate for this. Checking whether a potential employee has a criminal record does not prevent hiring criminals, it merely prevents hiring unsuccessful criminals and people who have not yet been tempted enough! The best way to assess whether HR people are being smart about this is to ask them for an estimate of how many criminals are employed by the company. If you have a company that’s not incredibly small then it’s inevitable that some criminals will be employed. Anyone who thinks that it is possible to avoid hiring criminals simply isn’t thinking about the issues. I may write more about this issue in a future post.

Another significant part of the solution to the problem is to grant minimum privileges to access data. Everyone should only be granted access to data that they need for their work so that the only people who can really compromise the company are senior managers and sys-admins, and for best security different departments or groups should have different sys-admin teams and separate server rooms. Of course this does increase the cost of doing business, and probably most managers would rather have it be cheap than secure.

12 comments to Insider Threats and Small Storage Devices

Hmmm….with regards to the multiple small drives thing, I wonder if it would be possible to create (or if anyone has already created) a RAID system consisting of a bunch of 2.5″ disks and a RAID controller inside a 5.25″ enclosure.

Replacing individual disks in the array wouldn’t be possible, but you could get redundancy and a warning when one (or more) of the internal disks goes bad so you have time to make a replacement. In fact, you probably won’t want to expose the RAID controller to the outside. Just fix the RAID settings at the factory (have different RAID settings correspond to different product codes?) and make it look like one big disk with great read performance.

> consisting of a bunch of 2.5″ disks and a RAID controller inside a 5.25″ enclosure.

Of course it would be possible, but what would be the purpose? RAID for consumers?
Remember: RAID is *not* backup

> So if a 5.25 inch hard drive was to be manufactured with today’s technology it should get a capacity equivalent to at least three times the capacity of the larger 3.5 inch drive.

I’m not sure about that. Larger platters are less stable.

> The reason that 5.25 inch drives are not manufactured is that for best performance

Consumers don’t really use multiple spindles, so that’s only part of the reason. Larger platters can’t spin as fast (more rotational latency, less transfer rate) and require longer seeks.

> The latest trend is towards 2.5 inch

Platters in 3.5″ 10000 rpm drives are already smaller than platters in 3.5″ 7200 rpm drives and the ones in 15000 rpm drives are again smaller. So using 2.5″ form factor doesn’t even mean the platter size is reduced that much.

> It’s doable, but it would be a rather tense three or more hours waiting by an illegally connected laptop.

What laptop drives can sustain gigabit ethernet writes?

> For the mail server of a large ISP there is often no chance of getting anywhere near line speed, it’s lots of small reads and seek performance is the bottleneck, such servers are usually running close to capacity (and trying to copy data fast would hurt performance and draw unwanted attention).

Olaf: Good point about laptop drives not sustaining GigE speeds. My >3yo Thinkpad can sustain about 30MB/s, I expect that newer laptops have faster disks and that the performance of laptop drives will approach the current server drive speed of ~80MB/s. However most Thinkpads support removing the DVD drive and replacing it with a second drive. If you stripe data across two disks you should be able to double the performance so my 3yo Thinkpad (which incidentally has Gig-E on the motherboard) could sustain something approaching 60MB/s and a newer Thinkpad with faster disks could conceivably sustain close to GigE speeds if it has a second disk.

Also let’s assume for the sake of discussion that every technology that increases the capacity of disk also increases the speed. It’s a reasonably assumption while Hitachi is not releasing details.

No, RAID is not backup. But how many people have decent backup plans, keep to them, and test that data can be restored from their backups properly on a regular basis?

Being told “Hey, part of you hard drive is failing. You need to get a new one before it gets any worse” could give people a lot more chance to not lose than they currently get with their non-existant backups. And I’m sure that some HD mfr could come up with some catchy title for this to sell it. “Now with SafeStore(tm) technology – the disk will tell you in plenty of time before it dies, giving you the opportunity to replace it *before* you lose all the photos of your kids.”

But it’s not just that – what about capacity? How much data can you get on a 3.5″, or 2.5″ drive? 500GB? 750GB? OK, now put 4 (2×2) or 8 (2x2x2) of them in a 5.25″ enclosure and RAID 5/6 them. You could end up with between 2TB and 5TB of storage. If you’re computer is doubling as a PVR, that could be handy.

And as etbe points out, you get more speed. Striped RAID configurations give much better read speeds than single disks as you pull data of multiple disks at once. Most data is read more than it is written, so although writing doesn’t get any faster, the “typical” use-case will improve. No, it won’t improve volume-volume copies, but it will allow the source box to remain more responsive while you’re doing one.

> Being told “Hey, part of you hard drive is failing. You need to get a new one before it gets any worse” could give people a lot more chance to not lose than they currently get with their non-existant backups.

I’m not sure what the rate of HDD failures is currently. They seem to be pretty reliable.

> But it’s not just that – what about capacity? How much data can you get on a 3.5″, or 2.5″ drive?

1 gbyte on 3.5″

> OK, now put 4 (2×2) or 8 (2×2×2) of them in a 5.25″ enclosure and RAID 5/6 them. You could end up with between 2TB and 5TB of storage. If you’re computer is doubling as a PVR, that could be handy.

http://etbe.coker.com.au/2007/08/25/designing-computers-for-small-business/
Karellen: Your point is good, logically people should do such things. My wife’s computer (which is also a server for some of my data) has a RAID-1 and my Thinkpad (my main machine) is regularly backed up. Most people don’t do such things however so there is little economic demand. Consider my post about designing computers (see the above URL), it makes sense but I expect Dell to continue selling non-RAID systems to small businesses which will be used as servers…

As for the number of disks, you could have two hot-swap 2.5 inch disks in a half-height 5.25 inch bay or probably six in a full-height bay.

Olaf: If the capacity increase is due in equal measure to both thinner tracks and getting more data on a single track then the contiguous IO speed would go up as a factor of the square-root of the capacity. If one of those factors is being improved more than the other the contiguous IO speed could increase by anything between 0% and 100% of the capacity increase. Also let’s not assume that the rotational speeds will remain constant.

But the example you cite of the Travelstar does seem to solve the problem. Two of them striped with appropriate caching should allow close to GigE speeds on a modern laptop.

Depends on what part of desktop performance you’re measuring. Common measures I’ve seen are time to boot to login screen, time from login to usable desktop, and application startup time. These are often dominated by transfer rate. There’s also the “restoring an application that’s been inactive for an hour while you’ve been working on something else” that requires paging a load of swap into memory. That can often be fairly dependent on transfer rate, although seek time is likely to be more of a factor than in the other cases I mentioned.

(OTOH, if the raid is intelligent enough, it might be able to notice a storm of separated reads and keep the individual disks seeked(conj? suck?) to different locations. Lower transfer rate but smaller seek times might improve overall throughput. Wonder if anyone’s tried that…)

The disk is, by far, the slowest thing in your computer. And it typically reads more than it writes. If you can speed up disk read performance by some margin, your computer will feel snappier in a lot of cases.

No, it won’t help for *every* desktop performance measurement. I wouldn’t dream of claiming that. But I’d be wary of suggesting that it will make *no* difference, which is what you appear to be claiming.

> Also let’s not assume that the rotational speeds will remain constant.

Why not?
Even if we don’t, the performance jump would not be that big and it’d be a one-time jump.
The STR jump would be even less (or non-existant) as the platters would have to be made smaller at the same time.

Take for example a 160 gbyte drive and a 1 tbyte drive. They do 60.2 and 86.9 mbyte/s. Capacity increase is 6.25x, max STR increase is 1.44x.

> Common measures I’ve seen are time to boot to login screen, time from login to usable desktop,

How often do you boot per day? I do it once, so that’s no good measure for me and I think for many others.

> and application startup time.

That’s not an issue either, it’s a few seconds.

> These are often dominated by transfer rate. There’s also the “restoring an application that’s been inactive for an hour while you’ve been working on something else” that requires paging a load of swap into memory. That can often be fairly dependent on transfer rate, although seek time is likely to be more of a factor than in the other cases I mentioned.

I think you don’t have enough memory if that’s often an issue. Even then, how many minutes/hours per day does it save you?

> The disk is, by far, the slowest thing in your computer. And it typically reads more than it writes. If you can speed up disk read performance by some margin, your computer will feel snappier in a lot of cases.

I don’t agree. Keyboard is slower… ;)
If your disk was twice as fast, could you do the same amount of work in half the time?

> No, it won’t help for *every* desktop performance measurement. I wouldn’t dream of claiming that. But I’d be wary of suggesting that it will make *no* difference, which is what you appear to be claiming.

Basically, I am. I think it’s not a cost-effective way of improving performance. And I think it rarely doesn’t even improve performance.

Karellen: I think that for sustained contiguous IO to really matter you need to be doing bulk backup/restore operations or copying a few TB of someone else’s data. ;)

In desktop machines reads are usually more common than writes. In many server operations writes are more common. For example consider an ISP mail server. When mail is received it is written to the queue and then in most cases delivered to the mail store while still in cache. The users who receive the most mail are on broadband connections with their machine doing POP connections every 5 minutes – again the mail never leaves the read cache. Last time I looked at the performance of a big mail server writes outnumbered reads by a factor of 6:1.

[…] Russell Coker: Insider Threats and Small Storage Devices Danny Angus writes about the potential threat posed by small storage devices with large capacity [1]. His post was prompted by a BBC article about Hitachi’s plans for new hard drives [2], they are aiming for 4TB of data on a single drive by 2011 and a 1TB laptop drive. One thing I noticed abou… […]