I'm about to build a home ZFS server because I want the utmost protection I can get for my data. It's going to serve iSCSI to my Mac Mini, which will share files to my home network. In addition to serving iSCSI it will also be compressing the blocks and scrubbing my data weekly.

It will house 6 drives in a striped triple-mirror configuration. Two of the drives will be hot swappable so that I can split the mirrors monthly for rotating offline backups.

I want the case to be as small as practical and as quiet as practical. I don't care about cost.

It's been a while since I've built a computer from scratch so I'd like you to please sanity check my choices.

Re the PSU -- I don't see the point of going fanless, with so many HDDs and in this particular case. You are better off with a quiet fanned PSU. Go for a fanned Seasonic X if you like super high efficiency -- its fan may never spin up but at least you will have the security.

Also not sure why you are choosing this case -- surely there are more proven tested designs for quiet/cooling that can also hold 6 HDDs.

As for the CPU cooler, replace the stock one. It's cheap and simple to get cooler, quieter performance -- so why not just do it? The Coolermaster 212 is a cheap favorite.

I would hesitate most about those Hitachi HDDs. Along with "no head parking, TLER, or Advanced Format nonsense", it also doesn't seem promising for noise, as it is a 5-platter HDD. It is rated by Htachi to be 2.9 BEL typical -- I'm pretty sure that is sound power, not SPL, and I suspect it is optimistic, as it's the same rating Samsung gives for its 3-platter F4 2TB 5400rpm drive. Hitachi has not been in the top ranks of quiet drives for some years, and I doubt very much any 5-platter 7200rpm drive will come even close to either the WD GPs or the 3-platter Samsungs, both 5400rpm drives. The performance of these newer drives may actually be as good despite the slower spindle speed due to higher areal density. Personally, I'd look for a low-platter count 2TB model from either WD or Samsung if you must have 7200rpm drives (or a 5400rpm ones for that matter).

I yield to Mike when it comes to cases, fans and PSUs. And, at first glance, his drive advice makes sense too. It seems people have had trouble with the cheaper WD drives (or some of them anyway) but what about these new high-density Samsungs? They're going to draw less power than these Hitachis as well (do you know how much?).

The first revision of that board doesn't support that CPU but I suppose you already know about that.If you have definite evidence about ECC support, post it in the ECC support thread please because there's some uncertainty about that with people claiming ECC is (partially?) non-functional with that combination.

I'm sure you know what you're doing but I wouldn't equate "utmost protection" with monthly backups. Perhaps you've planned incremental or differential backups and you don't need hardware support. But if the data is so static as not to require more frequent backups, perhaps this rig is overkill. In particular, triple-mirroring might not be the best way to use these drives.In addition to more frequent backups, I think it would be wise to keep more than two or three versions of your data. Since you're backing to hard drives instead of tapes, this can be accomplished relatively easily with hardlinks (or perhaps snapshots). There's some ready-made software that does this but, considering the resources you're putting into this project, rolling your own with standard tools is not a stretch.In any case, for "utmost protection", more resources would have be put into backups than is apparent in the orginial post. I'm writing this mostly as a warning to others who might be inspired by your post but have not given due consideration to backups.

If you truely don't care about costs, perhaps you should consider having two storage servers with four drives each. This would be safer. You might also want to go for lower-power and possibly cheaper servers if you had a pair. I don't suppose you need more performance than a single server can provide so one of them could be the primary and replicate your data more or less frequenty (not in real time!) to the other. You'd have frequent online backups and you'd be able to make offline backup without disturbing the main server. You could keep two versions of your data on the second server without mirroring or keep many versions of your data on an array with less redundancy. The second server could be powered down most of the time to keep power consumption in check. That might be a bit safer as well.

If anything you should consider your strategy here. If you're so concerned about the data, are you taking into account a backup routine. ZFS parity is *not* a backup. RAID is about performance and uptime. I know ZFS isn't RAID and I would agree that some of its advanced features for data preservation are very appealing.

HFat summed it up nicely. Offline backups with switching drives sounds like more work than it really has to be. You might rethink your solution to be a simpler implementation, possibly with two simple, full servers containing the data. You likely won't need as much memory or CPU to reach performance limitations of gigabit Ethernet over iSCSI. Replace triple mirroring with either simple parity or no parity at all. 3 data drives, and thats that, or 4 data drives with single parity, backed up to 3/4 other data drives in a separate system.

I suppose the RAM was supposed to be a cache that would improve latency more than throughput (it might still help with throughput in some situations). But it would a humongous cache indeed. I don't know what kind of data/usage would benefit from that since we're talking about home storage.

I would agree, but unless he's moving to 10g, which seems unlikely since he said he's rolling a mac mini, you can easily saturate gigabit [~110-115MB/s after overhead] with iSCSI traffic with two disks striped.

Re the PSU -- I don't see the point of going fanless, with so many HDDs and in this particular case. You are better off with a quiet fanned PSU. Go for a fanned Seasonic X if you like super high efficiency -- its fan may never spin up but at least you will have the security.

Great, thank you. I'll go with the Seasonic X-560 instead.

MikeC wrote:

Also not sure why you are choosing this case -- surely there are more proven tested designs for quiet/cooling that can also hold 6 HDDs.

I'm choosing this case over the Lian Li PC-Q08 from your fantastic Silent Home Server Build Guide because I need 2 external 5.25" bays to hold the mobile racks. This excludes the Lian Li PC-V354 for the same reason. I considered the Antec Mini P180 but don't like that two of its 5.25" bays are near the floor, given that I'll be hot swapping drives from them every month. This isn't a show stopper, I suppose. Do you think it's that much better than the PC-A05N?

MikeC wrote:

As for the CPU cooler, replace the stock one. It's cheap and simple to get cooler, quieter performance -- so why not just do it? The Coolermaster 212 is a cheap favorite.

OK, done.

MikeC wrote:

I would hesitate most about those Hitachi HDDs. Along with "no head parking, TLER, or Advanced Format nonsense", it also doesn't seem promising for noise, as it is a 5-platter HDD. It is rated by Htachi to be 2.9 BEL typical -- I'm pretty sure that is sound power, not SPL, and I suspect it is optimistic, as it's the same rating Samsung gives for its 3-platter F4 2TB 5400rpm drive. Hitachi has not been in the top ranks of quiet drives for some years, and I doubt very much any 5-platter 7200rpm drive will come even close to either the WD GPs or the 3-platter Samsungs, both 5400rpm drives. The performance of these newer drives may actually be as good despite the slower spindle speed due to higher areal density. Personally, I'd look for a low-platter count 2TB model from either WD or Samsung if you must have 7200rpm drives (or a 5400rpm ones for that matter).

Thank you for challenging me. I chose the Hitachi because I read erroneous reports that they don't have the TLER/CCTL/ERC nonsense, however on deeper inspection I see they're no different from any other "desktop" grade drive. I'm OK with 5400rpm drives. I'm not OK with drives with 4K sectors that pretend to be 512 bytes, and it looks like all the 2 TB 3 platter drives are lying in that way (please correct me if I'm wrong).

I'm not sure yet whether I need to spring for "enterprise" grade drives, which are twice the cost, just to get a drive that honestly reports an error when it happens. I'm afraid that if I get a "desktop" grade drive that can spend >2 minutes blocking a read when it hits a bad sector that it will negatively impact my overall ZFS performance. I've posted a question on Storage Review asking for clarity about this.

If you have definite evidence about ECC support, post it in the ECC support thread please because there's some uncertainty about that with people claiming ECC is (partially?) non-functional with that combination.

I'm sure you know what you're doing but I wouldn't equate "utmost protection" with monthly backups. Perhaps you've planned incremental or differential backups and you don't need hardware support. But if the data is so static as not to require more frequent backups, perhaps this rig is overkill. In particular, triple-mirroring might not be the best way to use these drives.In addition to more frequent backups, I think it would be wise to keep more than two or three versions of your data. Since you're backing to hard drives instead of tapes, this can be accomplished relatively easily with hardlinks (or perhaps snapshots). There's some ready-made software that does this but, considering the resources you're putting into this project, rolling your own with standard tools is not a stretch.In any case, for "utmost protection", more resources would have be put into backups than is apparent in the orginial post. I'm writing this mostly as a warning to others who might be inspired by your post but have not given due consideration to backups.

I agree. I didn't include information about my backup strategy in my original post because a lot of thinking has gone into this machine and I don't want to overwhelm people with detail. In addition to monthly offline backups, I'll be using CrashPlan for incremental online backup, backing up changes every 15 minutes and storing every file version ever saved. In addition to CrashPlan I'll also be synchronizing all the files online with Wuala. I feel pretty well protected by two online backup systems plus local triple mirrors. The offline backup is to handle the threat case in which someone steals my passwords and is able to wipe out my local files and online backups all at once. In that unlikely case I'll still have my data as of a month ago.

I'm using double parity mirroring in part because I was inspired by the blog article Home Server: RAID-GREED and Why Mirroring is Still Best and in part because I'm going to be splitting the mirrors every month. After each split the mirrors will resilver and during the resilvering window I'll have single parity mirrors to protect against bit errors, which are increasingly likely in today's age of constant BER and increasing storage capacities.

HFat wrote:

If you truely don't care about costs, perhaps you should consider having two storage servers with four drives each. This would be safer. You might also want to go for lower-power and possibly cheaper servers if you had a pair. I don't suppose you need more performance than a single server can provide so one of them could be the primary and replicate your data more or less frequenty (not in real time!) to the other. You'd have frequent online backups and you'd be able to make offline backup without disturbing the main server. You could keep two versions of your data on the second server without mirroring or keep many versions of your data on an array with less redundancy. The second server could be powered down most of the time to keep power consumption in check. That might be a bit safer as well.

Hmm. Would you still suggest this now that you know more about my backup plans?

I don't know exactly how these Crashplan and Wuala backups are supposed to work (it might be educational for your readers if nothing else to post links not to promotional material but to descriptions of how one can make them work with multiple teras which is not the typical use case) and I don't know the characteristics of your data and of your usage so I can't tell you what I would do in your situation. But I can do generalities:

Mirroring is best of course but triple-mirroring is not very useful in your situation. It takes care of the risk of back-to-back drive malfunctions which is not the highest risk anyway at the cost of higher power consumption and noise. If you make changes you don't want to lose after splitting your array (it's a manual operation so you get to choose when it happens), you could add your backup drives before splitting. It's generally a good idea to have more drives outside of an array than inside unless you're after availability or you can not afford to lose a couple of hours' worth of changes. And if you're after availability, you need spares of everything and not only spare drives... hence the second server.

The main advantage of a second server from a data security perspective over stashing bare drives in a closet or something is that no manual operations are involved which allows more frequent backups. A remote backup server accomplishes the same thing and might bring additional security (depending) at the cost of lower speed. Only you can tell if the remote servers you're using as well as your link to them are reliable and fast enough for your needs. I would have guessed your home link isn't fast enough considering you have multiples teras but it depends on how static your data is. It's harder to justify server gear and ZFS for static data but it depends on your resources of course.

Generally and against the article you referenced, I advocate prioritizing having many copies instead of obsessing about securing the running copy which is vulnerable to many other threats anyway. Generally, you also want to be able to roll back to old copies and not only to the more recent one(s). <-small edit for clarity

If anything you should consider your strategy here. If you're so concerned about the data, are you taking into account a backup routine. ZFS parity is *not* a backup. RAID is about performance and uptime. I know ZFS isn't RAID and I would agree that some of its advanced features for data preservation are very appealing.

I just explained my backup strategy in another post. I agree that ZFS parity is not backup. I'm using two cloud based services for backup (plus my offline mirror splits). I'm using ZFS to protect against bit rot (which RAID doesn't do, to my knowledge) as well as drive failures.

protellect wrote:

HFat summed it up nicely. Offline backups with switching drives sounds like more work than it really has to be. You might rethink your solution to be a simpler implementation, possibly with two simple, full servers containing the data. You likely won't need as much memory or CPU to reach performance limitations of gigabit Ethernet over iSCSI. Replace triple mirroring with either simple parity or no parity at all. 3 data drives, and thats that, or 4 data drives with single parity, backed up to 3/4 other data drives in a separate system.

Well, part of what I like about having bare drives offline is that it's easy to make extra copies and dump them at different offsite locations, e.g., rotating yearly when I visit my mother in another state for Christmas, etc. Having a second server (even turned off part time) doesn't satisfy my desire for offline backup, because then at some point in the process all my data will be online at once. By swapping bare drives, there's never a window in which all my data is online (and therefore vulnerable to an online attack).

I have to ask, why the Mac Mini? Why not just serve files over SAMBA/NFS/whatever directly from your ZFS server? If you are so concerned about data corruption then the Mini is just another potential point of failure. Not that the Mini is any better or worse then any other commodity PC, but it doesn't support ECC so it could potentially corrupt data going through it.

Also, since the Mini doesn't support adding another GB NIC you have just halved your potential throughput.

A second server is somewhere between RAID and backups. That is, it doesn't replace backups. The idea was that you would periodically take drives out of the second server. This way you don't need racks and don't have to worry about hotplugging, breaking your mirrors and so on. Performance would not be affected and it would also allow you to take out drives with the consistent overnight state of the data (for instance) while your data is being modified. A second server has many additional benefits such as being convertible into a primary server in an emergency and the ability to store a very large number of versions of your data. It's not economical but it's a better investment than 16G of RAM or triple-mirroring (per dollar that is - it would be more time-consuming to implement).

You could take some time to figure out how much benefit you'd get from 16G of cache if money is not totally irrelevant. Google for the keywords filesystem cache hit miss. 4G should be more than enough and you can upgrade later if you pick high-capacity DIMMs.

I don't know exactly how these Crashplan and Wuala backups are supposed to work (it might be educational for your readers if nothing else to post links not to promotional material but to descriptions of how one can make them work with multiple teras which is not the typical use case) ...

They're pretty simple. CrashPlan offers unlimited storage for $5/month for home users, and based on my years of experience with them as a solid company I believe they can fulfill this promise on the order of 4 TB. The software automatically monitors any changes to the file system and backs up changes every 15 minutes, preserving every version of every file as well as all deleted files.

Wuala has a backup feature but I'm not using it--instead I'm using its synchronization feature to automatically synchronize my files to their servers over the Internet. It costs significantly more than CrashPlan (unless you share storage with it) and I will probably use it for the most important subset of my data to keep costs down.

Both programs encrypt the data before it leaves my computer.

HFat wrote:

Mirroring is best of course but triple-mirroring is not very useful in your situation. It takes care of the risk of back-to-back drive malfunctions which is not the highest risk anyway at the cost of higher power consumption and noise. If you make changes you don't want to lose after splitting your array (it's a manual operation so you get to choose when it happens), you could add your backup drives before splitting. It's generally a good idea to have more drives outside of an array than inside unless you're after availability or you can not afford to lose a couple of hours' worth of changes. And if you're after availability, you need spares of everything and not only spare drives... hence the second server.

I'm concerned not as much about back-to-back drive failures as much as by bit errors (see further below in this message). I'm also not too concerned about availability. Regarding the cost of higher power consumption and noise: We're talking about an additional 15W when idle, which I'm OK with, and I'm guessing the noise of 6 drives isn't going to be significantly worse than the noise of 4.

Let's get specific here about my plan so we know we're in the same conversation. Here's what I intend to do with the offline backups:

Once per month I'll do the following to split two of the drives (one from each mirror) out of the pool named "online" to a new pool named "offline":

Code:

$ zpool split online offline c1t0d0 c1t1d0# wait a few seconds for the split to complete# hot swap out the disks c1t0d0 and c1t1d0 from the mobile racks and replace them with new disks from my shelf# reattach the new disks to the "online" pool$ zpool attach online c0t0d0 c1t0d0$ zpool attach online c0t1d0 c1t1d0# the new drives resilver automatically and I can ignore it for another month

If I add my backup drives before splitting then that adds an additional step to my monthly routine.

HFat wrote:

The main advantage of a second server from a data security perspective over stashing bare drives in a closet or something is that no manual operations are involved which allows more frequent backups. A remote backup server accomplishes the same thing and might bring additional security (depending) at the cost of lower speed. Only you can tell if the remote servers you're using as well as your link to them are reliable and fast enough for your needs. I would have guessed your home link isn't fast enough considering you have multiples teras but it depends on how static your data is. It's harder to justify server gear and ZFS for static data but it depends on your resources of course.

I see. My uplink is slow (1 Mb/s) but my data is static enough for it to work. I believe ZFS is justified for everyone who cares about their data today because drive capacities (and the data we're storing on them) are rising fast while the BER is staying constant. See, for example, 56% Chance Your Hard Drive Is Not (Fully) Readable, A Lawsuit in the Making and many other articles like it.

HFat wrote:

Generally and against the article you referenced, I advocate prioritizing having many copies instead of obsessing about securing the running copy which is vulnerable to many other threats anyway. Generally, you also want to be able to roll back to old copies and not only to the more recent one(s).

I agree. I have many copies on CrashPlan (every version of every file) and another copy on Wuala in addition to my local running copy. I'm not obsessing about securing the running copy; I'm doing it because it's cheap in dollars (~$200 for two more drives), time (5 minutes every month), and attention (execute a few command lines--no software/scripts to write, debug, nor maintain).

I have to ask, why the Mac Mini? Why not just serve files over SAMBA/NFS/whatever directly from your ZFS server? If you are so concerned about data corruption then the Mini is just another potential point of failure. Not that the Mini is any better or worse then any other commodity PC, but it doesn't support ECC so it could potentially corrupt data going through it.

Also, since the Mini doesn't support adding another GB NIC you have just halved your potential throughput.

This is a good question. You're right about adding another point of failure.

The first answer is that my client computers are Apple computers and store their data in Apple file formats. I ran a test first of storing the files on my ZFS server and sharing them through Netatalk. What I discovered was a headache of configuration options in order to support ACLs, Unicode in file name conventions, etc., properly. I decided it is cleaner (and easier) to let Apple format the drive and share the files rather than tuning Nexenta to pretend to be an Apple file system and hoping that the open source program Netatalk is mimicking the Apple software well enough.

The second answer is that I'm running two online backup programs for my files, CrashPlan and Wuala, which support Mac OS as a platform but not Nexenta.

So another way of thinking about this is that my Mac Mini is my server and I'm creating a super reliable, expandable external drive for it.

A second server is somewhere between RAID and backups. That is, it doesn't replace backups. The idea was that you would periodically take drives out of the second server. This way you don't need racks and don't have to worry about hotplugging, breaking your mirrors and so on. Performance would not be affected and it would also allow you to take out drives with the consistent overnight state of the data (for instance) while your data is being modified. A second server has many additional benefits such as being convertible into a primary server in an emergency and the ability to store a very large number of versions of your data. It's not economical but it's a better investment than 16G of RAM or triple-mirroring (per dollar that is - it would be more time-consuming to implement).

I see. As I understand it, splitting mirrors with ZFS won't affect performance and can be done while data is being modified.

HFat wrote:

You could take some time to figure out how much benefit you'd get from 16G of cache if money is not totally irrelevant. Google for the keywords filesystem cache hit miss. 4G should be more than enough and you can upgrade later if you pick high-capacity DIMMs.

You've sold me. I'll start with two sticks of 4G each (to get the benefit of interleaving) and add more if I'm unhappy with performance. Thank you.

You want to store 4T on their servers for $5 a month? That'd be some hardcore leeching. Did you think about how much that would cost them? Maybe they can make it work if they have enough customers who pay $5 to upload 10G but you'd be operating on their sufferance. The day they decide they're losing too much money because of people like you, they'll crack down.Do you realize it would take you over a year to upload 4T over your link at max speed (which you'll never reach) 24/7? Even if they agreed to allow you to send them a couple of drives in the mail and to copy that to their system, there's no way you can back up large changes in anything close to 15 minutes. If you changed 0.1% of your capacity, uploading that would require at least 10 hours.They might silently throttle the throughput on your connection once your data reaches a certain threshold to prevent such abuse (assuming they don't throttle everyone to begin with). That or their software will not do its job. I understand you have to use their software. If it wasn't rigged, why wouldn't they let you upload using your own software, making their service compatible with all platforms? Don't take "unlimited" offers litteraly. They can only work as long as people don't use much of the resource and there are ways to enforce that.In other words, your remote backups are not complete or reliable and you need more frequent local backups before you can talk about "utmost protection".

Which is why I'm saying you should start by dropping triple-mirroring and keeping drives out of your array where they are somewhat protected (not as well as if they were disconnected). The only thing you need to change in order to achieve that is to split the extra drives from the array after resilvering instead of before splitting. You might also resilver daily or something. And it would be prudent to physically hotswap the drives more often (if you only have one server).Alternatively, you might do automatic remote incremental backups overnight (for instance). What an incremental would involve is to make a file containing all the changes since the last version you disconnected from your server and to upload that file. This is (a lot) more reliable than the way you plan to use Crashplan and would be quicker to restore as well. If ZFS doesn't have a facility to do that (with the help of a snapshot), it can be done on a second server or by keeping the same version of your data you have offline on your third set of drives (in which case you wouldn't have a use for hotplugging and racks anymore because you'd do offline backups from the third set of drives to an external dock over eSATA or something). Or you could find backup software which can keep an inventory of the files (with hashes) in the last version you took offline and dispense with the need to keep an additional copy of that version online at the cost of larger uploads. Possibly Crashplan can do this (I have no idea).Triple-mirroring would only offer minimal protection against bit errors. The third copy will only be called upon if two copies have bit errors on the disk at the same place. How likely is that to happen if you scrub somewhat regularly? Not to mention that disks are not the only components which are going to introduce bit errors (see lower). Yes, triple-mirroring easy to implement but it's also almost ineffective for your purpose!Splitting your array will not affect performance much but resilvering will. And resilvering times will also be affected by regular usage of the array. But that's not much of an issue if your gear is overkill for your needs to begin with of course. edit: Note that splitting while the array is in use might result in an inconsistent splitted copy.

ZFS will not protect you against all sorts of bit errors. That's a misconception. Your Mac Mini is of course a potential cause of errors but they are other potential sources of errors (see my previous posts). End-to-end verification with hashes created and verified by the client can handle that but no server-side technology can possibly do it. Manual creation of such hashes is best but automatic verfication is fine. ZFS takes care of errors introduced by drives which is valuable but it's not a panacea.This is why I'm saying that ZFS is harder to justify for static data. There are simpler and better ways to protect the integrity of static data. Which is not to say that ZFS is useless of course, only that it might not be worth the trouble (depending on the data and the resources available to protect it).

edit: nobody mentioned it so far but ZFS does not like to lose power so it would be prudent to have a battery unless your power is rock-solid.

edit: nobody mentioned it so far but ZFS does not like to lose power so it would be prudent to have a battery unless your power is rock-solid.

I would expand on this as another reason your triple mirroring is a bad idea. One power outage could corrupt the whole array, giving you three disks with corrupt file-systems.

Also, while ZFS is generally stable, it's fairly new and relatively untested. There are some pretty scary horror stories out there. You are really entering into a "putting all your eggs in one basket" situation with your local backup strategy. I would not use ZFS for my local backups, at least not the same file-system on the same box. A second server and/or using a different file-system for backups would be very wise. Not that it will happen, but if ZFS itself screws up or your file structure goes south your recovery options are pretty much restore from backup. Given that the same ZFS file-system is your backup then your could be very screwed. At least to my knowledge, ZFS does not have any good recovery tools like most other popular file-systems.

Do you really want to be restoring 4TB+ from online backups, especially given the concerns HFat has raised?

You want to store 4T on their servers for $5 a month? That'd be some hardcore leeching. Did you think about how much that would cost them? Maybe they can make it work if they have enough customers who pay $5 to upload 10G but you'd be operating on their sufferance. The day they decide they're losing too much money because of people like you, they'll crack down.

I have no desire to leech. I just sent them an email asking them this question directly, so we'll see what they think.

HFat wrote:

Do you realize it would take you over a year to upload 4T over your link at max speed (which you'll never reach) 24/7? Even if they agreed to allow you to send them a couple of drives in the mail and to copy that to their system, there's no way you can back up large changes in anything close to 15 minutes. If you changed 0.1% of your capacity, uploading that would require at least 10 hours.

Yes, but that's not my use case. I have about 300 GB now already in their system, and I generally don't generate more than 1 MB/s of data at a time. It would be interesting to measure my degree of data "staticness" in terms of MB/s somehow.

HFat wrote:

If it wasn't rigged, why wouldn't they let you upload using your own software, making their service compatible with all platforms? Don't take "unlimited" offers litteraly. They can only work as long as people don't use much of the resource and there are ways to enforce that.In other words, your remote backups are not complete or reliable and you need more frequent local backups before you can talk about "utmost protection".

The benefit of their software is ease of use, and they'd have broader support costs if they let people use their own software. That's a different market. Although I like using my own software (especially when it comes to encryption), I'm drawn to CrashPlan's fantastic ease of use. I'm waiting to hear back their answer to my question, but elsewhere they've written, "We do reserve the right to introduce a limit, however if we do, we'll give you fair warning. Our guess is, we can add disk faster than you can create versions."

HFat wrote:

Which is why I'm saying you should start by dropping triple-mirroring and keeping drives out of your array where they are somewhat protected (not as well as if they were disconnected). The only thing you need to change in order to achieve that is to split the extra drives from the array after resilvering instead of before splitting. You might also resilver daily or something. And it would be prudent to physically hotswap the drives more often (if you only have one server).

Let me be clear about the triple-mirroring. I'm not suggesting triple-mirroring because I think it's necessary to protect my data. I believe a normal mirror is fine for that. I'm suggesting triple-mirroring only to save my self a step every month when I split the mirrors:

This is what I believe you're suggesting:

Run single parity mirrors and once a month, do this:1. Add extra drives to create double parity mirrors.2. Wait for resilvering.3. Split mirrors.4. Remove drives.

This means I'll have to wait for the resilvering process to complete, whereas in my plan I won't.

Regarding resilvering daily, I plan to do it weekly, and in ZFS vernacular it's called "scrubbing".

HFat wrote:

Alternatively, you might do automatic remote incremental backups overnight (for instance). What an incremental would involve is to make a file containing all the changes since the last version you disconnected from your server and to upload that file. This is (a lot) more reliable than the way you plan to use Crashplan and would be quicker to restore as well. If ZFS doesn't have a facility to do that (with the help of a snapshot), it can be done on a second server or by keeping the same version of your data you have offline on your third set of drives (in which case you wouldn't have a use for hotplugging and racks anymore because you'd do offline backups from the third set of drives to an external dock over eSATA or something). Or you could find backup software which can keep an inventory of the files (with hashes) in the last version you took offline and dispense with the need to keep an additional copy of that version online at the cost of larger uploads. Possibly Crashplan can do this (I have no idea).

Your thinking has influenced me and I've decided to add another CrashPlan backup destination onto the computer in my office (CrashPlan's software allows me to do this for free). I'll still be limited by the speed of my home uplink but it will be another copy with full version histories and if I need to restore a lot of data at once, all I'll have to do is drive to my office.

ZFS does have a snapshot facility but doesn't provide (as far as I can tell) the ease of use that CrashPlan does, with the automatic snapshots every 15 minutes, encryption, and sending offsite.

HFat wrote:

Splitting your array will not affect performance much but resilvering will. And resilvering times will also be affected by regular usage of the array. But that's not much of an issue if your gear is overkill for your needs to begin with of course. edit: Note that splitting while the array is in use might result in an inconsistent splitted copy.

I believe the performance impact of resilvering and scrubbing will be mitigated by my overkill gear. Splitting the array while it's in use is a mild concern that's nagging at me. The Solaris ZFS Administration Guide admonishes me "Data and application operations should be quiesced before attempting a zpool split operation", which I don't plan on doing. I notice it also advises, "A good way to keep your data redundant during a split operation is to split a mirrored storage pool that is composed of three disks so that the original pool is comprised of two mirrored disks after the split operation", which validates my choice of a triple mirror.

Another way to achieve a split using ZFS snapshots that might be safer would be like this:

This would send only the incremental changes made during the last two months to the offline drive. Hmm. This would keep monthly snapshots on the running copy, require fewer writes than a resilver, and wouldn't need a triple mirror. I might do this instead.

HFat wrote:

ZFS will not protect you against all sorts of bit errors. That's a misconception. Your Mac Mini is of course a potential cause of errors but they are other potential sources of errors (see my previous posts). End-to-end verification with hashes created and verified by the client can handle that but no server-side technology can possibly do it. Manual creation of such hashes is best but automatic verfication is fine. ZFS takes care of errors introduced by drives which is valuable but it's not a panacea.

Yes, of course. Perhaps I shouldn't have used the word "utmost" in my original post. I'm looking for value, not a panacea.

HFat wrote:

This is why I'm saying that ZFS is harder to justify for static data. There are simpler and better ways to protect the integrity of static data. Which is not to say that ZFS is useless of course, only that it might not be worth the trouble (depending on the data and the resources available to protect it).

What are they? Will they work for quasi-static data (my use case)?

HFat wrote:

edit: nobody mentioned it so far but ZFS does not like to lose power so it would be prudent to have a battery unless your power is rock-solid.

I would expand on this as another reason your triple mirroring is a bad idea. One power outage could corrupt the whole array, giving you three disks with corrupt file-systems.Also, while ZFS is generally stable, it's fairly new and relatively untested. There are some pretty scary horror stories out there.

It's pretty shocking that ZFS isn't designed to handle power failures better. Dammit, just as I was drinking the ZFS Kool-Aid.

washu wrote:

You are really entering into a "putting all your eggs in one basket" situation with your local backup strategy. I would not use ZFS for my local backups, at least not the same file-system on the same box. A second server and/or using a different file-system for backups would be very wise. Not that it will happen, but if ZFS itself screws up or your file structure goes south your recovery options are pretty much restore from backup. Given that the same ZFS file-system is your backup then your could be very screwed. At least to my knowledge, ZFS does not have any good recovery tools like most other popular file-systems.

During my last reply to HFat I've decided to add a full CrashPlan backup to the computer in my office, which will be a second server and a different file system.

You want to store 4T on their servers for $5 a month? That'd be some hardcore leeching. Did you think about how much that would cost them? Maybe they can make it work if they have enough customers who pay $5 to upload 10G but you'd be operating on their sufferance. The day they decide they're losing too much money because of people like you, they'll crack down.

I heard back from CrashPlan on this subject. Here's their take on "hardcore leeching":

Kevin wrote:

You would not be unfairly leeching our service. Unlimited is unlimited. The only way you would be leeching the service was if you were backing up business data and not personal data. CrashPlan Subscriptions are for personal use only as described in the license agreement.

You would be able to restore the data, but remember that at internet speeds downloading 4TB of data will take quite some time. This is one reason we built CrashPlan as a multi-destination backup application. You can have your own local backup either by a connected drive or device or another computer on the same network. Restoring 4TB of data from a locally attached archive will be far faster then trying to do so on the internet.

Unlimited doesn't exist. Check the price of online storage! I guess they're throttling if that's their public stance. Proceed at your own risk. You may be OK as long as you're one of the few people to trust their claims. Get enough people to trust them and they will take advantage of this offer, something Crashplan can not afford. There is no way of telling encrypted business and personal data apart by the way.

Quiet Mind wrote:

It would be interesting to measure my degree of data "staticness" in terms of MB/s somehow.

It's not terribly hard. You already know how to do that with ZFS snapshots apparently. ZFS may also expose statistics you can look at without bothering with snapshots. Or you could search for your files modified since an arbitrary date and look at how big the whole pile is (this is not precise and could give you very wrong results in some cases).

Quiet Mind wrote:

This is what I believe you're suggesting:Run single parity mirrors and once a month, do this:1. Add extra drives to create double parity mirrors.2. Wait for resilvering.3. Split mirrors.4. Remove drives.

You don't have to wait personally. This can be automated.And I'm suggesting you repeat steps 1-3 more often than a once a month.

Quiet Mind wrote:

Regarding resilvering daily, I plan to do it weekly, and in ZFS vernacular it's called "scrubbing".

That's not the same thing. I'm talking about keeping your third pair out of the pool most of the time and occasionally performing steps 1-3 above. The point is that, in some scenarios, you might lose your active pool but not the logically disconnected mirrors which have been idle since the last time you performed steps 1-3. Obviously that's not as good as having the data off-server but it's better than having all drives in sync at all times (in your situation).

Or you could only use ZFS on your active pool and use something like duplicity to mirror your data to your third set of drives. I'm suggesting duplicity because it automatically keeps hashes which take care of some causes of corrpution including bit errors on the drives. But there's other software which creates mirrors and, like duplicity, is able to keep several versions on the same set of drives provided there's some free space and your data is relatively static.Or you could use Crashplan with your third set of drives as targets if you're confident it works well.

Quiet Mind wrote:

Another way to achieve a split using ZFS snapshots that might be safer would be like this:

This would send only the incremental changes made during the last two months to the offline drive. Hmm.

It's a different operation than a split. You could also send the changes to a file (without bringing the offline drive back online and therefore putting it at risk during the operation) or to another server holding the offline volume. It's pretty neat but keeping snapshots has a performance impact and will eat drive space if you keep them for long.You main worry about the safety of split involves the data being modified while you're splitting. Triple-mirroring is of no use to deal with that problem. And snapshots are useless as well. But splitting overnight would take care of the problem (assuming no one is modifying the data during the night).

Quiet Mind wrote:

my offline backups should be mirrored as well. This is getting more complicated.

It's generally better to have two backups holding different versions than a mirrored backup.

Quiet Mind wrote:

HFat wrote:

simpler and better ways to protect the integrity of static data.

What are they? Will they work for quasi-static data (my use case)?

Keeping hashes and/or parity data. Lots of software does that. I've suggested Bittorrent to keep hashes on another thread and someone else has suggested Parchive to keep parity data. Bittorrent can not repair, only detect corrpution. But you can repair it if you've got other copies. Parchive can detect and correct corruption but it eats extra drive space. Bittorrent is of course not a backup tool. I suggested it to show there are many ways of doing it and that it doesn't have to involve arcane software.You can work the same way with "quasi-static" data but, everytime it changes, you lose the ability to restore from existing backups obviously so more care and expense is needed for the same amount of protection. Changes to the data might also affect performance and/or require manual operations depending on how you do it.

There's something else you can do with very static data: backup to single drives and to move onto the next ones when they get filled up. You don't need complex hardware or software for that and your old backup drives can be securely archived because they will remain valid for a long time. That's about the lowest-cost backup method imaginable short of playing it like Linus Torvalds.

Last edited by HFat on Sun Oct 31, 2010 4:35 pm, edited 1 time in total.

It's pretty shocking that ZFS isn't designed to handle power failures better. Dammit, just as I was drinking the ZFS Kool-Aid.

I've used and investigated ZFS fairly extensively for my own use, in both home and business situations. A lot of it is Kool-Aid. The integration of RAID and FS is amazing and very useful. Copy on write is very useful in the right situation. Built in compression is also awesome. The supposed data integrity measures are only useful in very specific failure situations, and not for the reasons proponents claim. The robustness and resistance to failure is questionable at best, and kind of scary given the lack of even a "fsck" equivalent. The memory usage is horrible and sometimes crash prone.

My concerns about your triple mirror are that if ZFS or something else on your box fails then all three copies are potentially corrupt. While ZFS is a bit different than a RAID1, it's still a mirror and will happily mirror any errors. A separate box and file-system will prevent this.

Quiet Mind wrote:

During my last reply to HFat I've decided to add a full CrashPlan backup to the computer in my office, which will be a second server and a different file system.

Is your Office at the same location or off site? If off site then don't underestimate the value of a local backup server. If your main box goes down you can be back up in minutes.

It hasn't escaped me that everyone in this thread is telling me to use a second server.

Here's my new plan:

2 NAS devices with 4 drives each (credit: HFat)

Mac Mini

Clients will now talk to the primary NAS directly, eliminating the Mac Mini as a point of failure (credit: washu). The second NAS will back up the primary NAS nightly via rsync (or maybe duplicity) (credit: HFat).

The two NAS devices will run different file system technologies (credit: washu):

#1 will run FreeBSD with ZFS.#2 will run Linux with EXT4.

I'm downgrading from Nexenta to FreeBSD because I no longer need the zpool split feature and in exchange I hope to benefit from greater hardware compatibility that will allow me to buy instead of build.

Rather than build machines, I'm going to attempt to buy them. My main concern about doing so is that they won't be quiet, so I'm going to use the BIG QUIET HAMMER: They're both going in my relatively inaccessible basement. This also means I'm giving up on the mobile racks, and instead will connect an external drive to the Mac Mini (which is quiet enough to remain upstairs) once a month and rsync it to the secondary NAS. So far the only 2-bay quiet external enclosure I've found with FW800 and a large fan (8cm) is the Onnto DataTale.

I haven't found a COTS NAS running FreeBSD so I've asked the FreeBSD people for help. I'm expecting I'll have to buy a pre-built desktop computer, populate it with 4 drives, and install FreeBSD on it myself.

CrashPlan will continue to run on the Mac Mini, backing up the primary NAS remotely to my office computer and to the "no such thing as unlimited" service for $5/month. The Synology claims it has a feature for easy backup to Amazon S3; I'll investigate that too.

For the Linux NAS, my current candidate is the Synology DS411+, chosen over competitors because I want easy expansion ("Synology Hybrid RAID") and disk encryption.

I'll also look into software for keeping hashes of my files (duplicity, BitTorrent, Parchive, etc). I'm nervous that this may turn into a maintenance headache. I believe CrashPlan already does this automatically, and I'll have it back up the primary NAS onto the secondary NAS in order to realize this feature. It's not open source, however, and that always makes me nervous.

I'm not going to adopt Linus's backup strategy, thank you.

I look forward to the day when "keep my data safe at home and give me more storage for dollars when I want it" becomes easier than this.

I haven't found a COTS NAS running FreeBSD so I've asked the FreeBSD people for help. I'm expecting I'll have to buy a pre-built desktop computer, populate it with 4 drives, and install FreeBSD on it myself.

As long as you don't need to grow beyond 4 drives, the HP Proliant Microserver might work for you.

CentOS 5.5 x86_64. Bit of a faff, will expand later.Also tried a FreeBSD 8.1 live CD (actually sub.mesa's ZFSguru 0.1.7 preview ISO) and to my surprise, the NIC seems to have been picked up OK.Tried FreeNAS 0.7.2 - FreeNAS-amd64-LiveCD-0.7.2.5462.iso - i.e. the latest (as at 25/10/2010) FreeNAS build based on FreeBSD 7.3. Disks and NIC picked up OK.As detailed in a later post, NexentaStor Community Edition 3.0.3

I'll also look into software for keeping hashes of my files (duplicity, BitTorrent, Parchive, etc). I'm nervous that this may turn into a maintenance headache. I believe CrashPlan already does this automatically, and I'll have it back up the primary NAS onto the secondary NAS in order to realize this feature. It's not open source, however, and that always makes me nervous.

The advantage of file based checksums/PAR files over something like ZFS is that they are file-system independent. You can copy your files along with the PARs to anything and still verify their integrity. Once ZFS does its mostly useless checksum verification and hands off the data to the OS it is no longer protected. Also in the case of PAR, files can actually be repaired if they are damaged, assuming enough redundancy is in the PARs.

I agree that it can become a maintenance headache. For data that changes often it's not very practical, but it can be done. For data that is static it is not too bad. For example, I have lots of photographs from my digital cameras. New files are added all the time, but old ones never change (edited photos are saved separate from the originals). I have a simple script that will go through and look for new files and generate PAR files for them.

Good choice with FreeBSD. I use it fairly extensively and it has never let me down. If you were comfortable setting up Nexenta from the command line you shouldn't have too much trouble with FreeBSD. IMHO, FreeBSD has much more sensible device names than Solaris/Nexenta.

Why not simply buy servers instead of NAS boxes or desktops? You want to use them as servers, right? You could buy used servers to save money but low-end servers similar to the build outlined in your first post are not that expensive compared to the cost of 16G of RAM (for example). If you bought an identical pair you could diagnose hardware issues more easily or move your drives from the one that's not working to the one that works without any worries about drivers and so on.

The Microserver is not as powerful as you originally wanted, by a large margin. And it's not hugely cheaper than a ML110 (for instance). It's a neat box but it's more of an office server than a basement server. CentOS works fine on it but it doesn't have the drivers to take advantage of the hardware's features.

By the way, make sure you have a good gigE switch or at least a 100M switch that has one or two gigE ports (cheaper and easier to get fanless). You might also want to set up a dedicated link between you servers.

Amazon S3 is expensive. Duplicity supports it as well as lots of other stuff.

About this enclosure for your Mac: you don't need fans to cool low-power drives.

How is this Synology contraption and their proprietary schemes you might not know how to recover data from any better than bog-standard Linux on a server?

Duplicity doesn't force you to use its hashes or to manage them. It just creates them as part of the backup procedure. But they'll be there the day you need them. It's got front-ends that are supposedly user-friendly but I don't know if they're any good.It won't back up to Crashplan's "free lunch" servers but you might use it (or some other piece of free software) to backup to devices you control if you don't trust Crashplan's software. More than the lack of publically available code, it's the lack of technical documentation and interoperability of their software (so far as I know - it might actually be better than I give it credit for) which would concern me.

You're making it a bit harder than it needs to be but I doubt we'll ever see a product that does everything you want out of the box. If you don't want to bother with this stuff, you need a reliable professional or a reliable company to do it for you. But it wouldn't cost you $5/month! Maybe someone does something like this for small offices and the like in your area.Many businesses who rely on data for all of their income have worse hardware and procedures than what you're planning... sometimes much worse actually. They figure it's not the end of the world if they lose some of their data and they're probably right. I don't have anything like what you're planning for my personal use for the same reason. It probably wouldn't be the end of the world for you either but you obviously like overkill so have fun!

I will be following this build. Have you considered btrfs?It has matured significantly and, although it is not yet enterprise stable, quite good. Experimented with it a bit and will most likely migragte once it is fully released. I suggest you follow the development, especially if you already considered running linux on one of the NAS'. (NAS'es in plural sounds wrong, nazi?)

EDIT: Link aggregation (sometimes called 802.3ad or LACP) is pretty sweet if you can afford it and need the speed. Your speed to the www will not improve but locally might make a huge difference. This is if you have the others connected with higher speed as well and/or several of them connecting at the same time. I just thought of this because that Supermicro has two gigabit ports which could net 2Gbit/s to the switch.

Who is online

Users browsing this forum: No registered users and 2 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum