So I'm at the point where I'm feeling like I should set up an offsite backup of my files just in case (moving to a state where wildfires are a regular summer occurence has a way of doing that.) And what I'd like is to be able to just upload files using standard tools rather than having to use J. Random Vendor's proprietary one-touch fancy-schmancy-dancy duzitall wonder-app, so really what I'm looking for is inexpensive, no-frills online file storage with SFTP access and a reasonably good track record on security.

I've looked around a bit, but most of the solutions that pop up in a Google search are based around the aforementioned slices-dices-makes-Julienne-fries clients, or are like Amazon S3 where it appears to be designed as some sort of database-oriented large-scale data-store, and not just "a folder you can freaking upload files to." If nothing else, I could probably use my hosting on nearlyfreespeech.net, but their pricing model really isn't geared towards large-scale storage, and it would be a great deal more expensive than most of the dedicated storage solutions.

Does anybody know of a good solution for this?

"'Legacy code' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrupwww.commodorejohn.com - in case you were wondering, which you probably weren't.

Much more expensive than Amazon S3 and other "proprietary one-touch fancy-schmancy wonder apps" however, and is aimed at people who want a multiple geo-located highly backed-up solution.

I haven't figured out anybody else. I bet RSync.net manages to get such high prices because they're pretty much the only people who do simple SFTP / RSync backups, no questions asked... and done very well. Looks like $0.20 per gigabyte per month or so ($200 /month per TB??!??!?!?!)

----------

Cheaper backups (of varying quality) can be made through dedicated servers. It will take a bit of maintenance, but you can easily get a VPS with 50GB of storage for $5 / month. Or, you can get a $30/month 2x1TB dedicated server if you need more storage. (Europeans can use Kimsufi)

They won't have the reliability of a dedicated provider (like Backblaze, S3 or RSync.net), and they'll take manual labor to keep up to date and secure (updates / patches, etc. etc.), but who can argue with the price?

Well, that's still a lot cheaper that using nearlyfreespeech.net for general-purpose storage (which would run me $1/GB/mo. - not bad for my light web hosting requirements, but pretty prohibitive for the couple dozen gigs of stuff that's either irreplaceable or would be a huge pain in the ass to dig back up.) Thanks, I'll look into it.

"'Legacy code' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrupwww.commodorejohn.com - in case you were wondering, which you probably weren't.

I don't know how rapidly your "stuff" is changing, but keeping a hardcopy in a different building and using the online service merely for incremental diffs might reduce your storage requirements. As it is with online services, having a hardcopy cannot hurt either way.

Then again, I shouldn't be talking, since I've postponed setting up my own offsite backups for way too long, and the last DVD I burned has been a couple of years..

SFTP is over the SSH protocol, and is very different. And I still argue that SFTP is inferior to a protocol like RSync, which will automatically check timestamps and whatnot.

Commodorejohn clearly wants an "open source" solution to file uploads and backups (not necessarily using FTP). I'm sure if there was an lol WebDAV host that took files and backed them up, I assume that commodorejohn would be fine with it (correct me if I'm wrong of course). So looking for these sorts of things is still best done by word-of-mouth.

--------

Personally speaking, I actually use a NAS. But if he's worried about losing files to a wildfire, a NAS sitting inside of your house is not an adequate solution. I think low-end dedicated servers is the next cheapest option, as far as GB/month goes, but reliability is basically on you. Seems like he's pleased with RSync.net's prices however, so that's a wrap I guess. (RSync.net is on the expensive end but has a huge reputation. They're one of the best online, geo-located backups on ZFS with multiple layers of redundancy, fully compatible with all of the common Linux sys-admin tools) As long as people are fine with their expensive pricing scheme, its hard to find a better product.

KnightExemplar wrote:But if he's worried about losing files to a wildfire, a NAS sitting inside of your house is not an adequate solution. I think low-end dedicated servers is the next cheapest option, as far as GB/month goes, but reliability is basically on you.

Yeah, that's the thing. I'm much less worried about drive failures than I am about having my house burn down and losing all of my data. (Bad enough to lose my tangible possessions, I don't need to lose all my works-in-progress on top of that!) So a copy in my house doesn't do anything, and I don't have another location available far enough away that it wouldn't be in just as much danger if I did lose my computer to a wildfire. Ergo, proper off-site backup (ideally in someplace like Utah where nothing ever happens) is the solution I want.

Seems like he's pleased with RSync.net's prices however, so that's a wrap I guess.

Well, I'm certainly open to hearing about other alternatives, I just want to avoid services that require you to use their own proprietary client software. (Not so much for FOSS purity as because I hate fancy duzitall stuff, and want to be able to back up from any of a variety of computer systems, several of which are not going to be supported by proprietary clients basically ever.)

"'Legacy code' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrupwww.commodorejohn.com - in case you were wondering, which you probably weren't.

KnightExemplar wrote:Well, FTP is ancient and really shouldn't be used anymore.

I expect that FTP and SFTP are used as synonyms in that list. (Though age is not much of a problem with FTP. Being unencrypted is.)

KnightExemplar wrote:And I still argue that SFTP is inferior to a protocol like RSync, which will automatically check timestamps and whatnot.

Unless you tar, compress and encrypt your backups before uploading, in which case rsync provides a reliable speedup of zero. There's no host in the world I would trust with unencrypted copies of my personal files, but YMMV.

duplicity seems worth testing, according to the docs it automatically creates incremental diffs (using the rsync protocol), then tars and encrypts them before uploading them via a dumb protocol.It supports uploading to several services without sftp access, so if duplicity is for you, then your choice of storage provider isn't as limited as you thought.

I noticed that fileserve.com now offers 500GB with ftp access for their free accounts. Deleting files after 60 days means that you shouldn't let your incremental backup chains get too long, but you shouldn't do that anyway.

Have you considered our lord and saviour cooperative storage cloud? It has many faces, e.g. sia and storj, but nearly all have in common that they employ open-source (CLI) clients that you can build on your desktop, Raspberry or NAS. And they're free if you share enough of your disks with the network (without having to fear for drive failure).

Moreover, you'd be supporting decentralization and privacy and you get to be my guinea pig (because of course I'm advocating something I haven't tried myself yet)!

In any case, like Tub said: encrypt the files yourself (and put them in one archive while you're at it), then there's no need for fancy-schmancy SFTP or rsync or webDAV-over-TLS.Well, alright, FTP o.a. has the downside that if your credentials are MITM'ed, your the availability of your backup is not guaranteed (and you'd have to reset your password on the website after every transaction to counter that).

Tub wrote:Unless you tar, compress and encrypt your backups before uploading, in which case rsync provides a reliable speedup of zero. There's no host in the world I would trust with unencrypted copies of my personal files, but YMMV.

Why tar things up before encrypting? Encrypt things individually and then upload them. The amount of things that I really care about being encrypted is rather small (tax returns, backups of my wallet / ids / insurance cards), and I can understand encrypting those stuff. But like... the vast majority of things it just ain't worth the wait.

gpg does compress data already before it encrypts btw. So that's solved. I guess compressing is more effective if you tar first, but the convenience of being able to access my files individually without decompressing a .tarball (as well as the ability to update files as they are changed thanks to RSync timestamp checks) is a big benefit.

----------

I mean, I don't really want to think about how to properly manage a tarball. Do you upload the entire backup every single time a singular file changes? In a few months, my new tax documents are going to be done, adding maybe 20MB to my backup.

I think "gpg -c 2016_taxes" is a lot easier than untaring the old backup, adding my taxes to it, taring it back up, then compressing, THEN encrypting before uploading.

No reliability guarantees AND you will have other people using your bandwidth just when you're trying to stream a movie or fight the boss in an online game? I like the premise, of course, but actual server hosting has its advantages over home servers on weak DSL connections.

KnightExemplar wrote:Why tar things up before encrypting?

Because sensible information is not just in your file's contents, but also in their meta information. File name, file size, date etc. can reveal information you'd rather keep private. And I'm not just talking about your huge collection of spatula porn (remember: if it's worth keeping, it's worth backing up!), but also job applications, holiday photos (mine do contain the date and destination in the filenames), invoices, letters you've written and sent, maybe names of files one possesses without having a license. My chat logs would reveal who I've talked to and when and how much, my emails reveal when I've sent and received mails (and how long they are), my source codes contain name and structure of unreleased projects. And it's not just my personal privacy; I do have work data on my home computer which I'm contractually required to protect.

Now I could think long and hard if any of that meta-information might be problematic, or I could make a risk assessment about the hoster either losing, leaking or downright analysing and selling my data, but I'd rather just tar it up, encrypt the whole thing including metadata, and stop worrying.

KnightExemplar wrote:the vast majority of things it just ain't worth the wait.

For a nightly cron job, a bit of CPU time for encryption is not really a concern of mine. The limiting factor is DSL bandwidth for uploading, anyway. Diffing locally instead of remotely solves both bandwidth and privacy issues.

KnightExemplar wrote:I mean, I don't really want to think about how to properly manage a tarball. Do you upload the entire backup every single time a singular file changes?

No, you upload incremental backups, with each incremental diff being a new tar ball. It can be done with a simple bash script, but there's also existing software for it. I mentioned duplicity, it just takes a minute to read the introduction.

zpaq updates an archive by appending changes to it. To support remote backups without having to move huge files, zpaq can put the appended changes into a separate, numbered file that you would copy or move to remote storage. You can concatenate the parts to form a complete archive, or simply read them all at once by specifying a pattern in the archive name like "part???.zpaq". zpaq will then search for part001.zpaq, part002.zpaq, etc. and regard the concatenated sequence as a single archive.

Because sensible information is not just in your file's contents, but also in their meta information. File name, file size, date etc. can reveal information you'd rather keep private. And I'm not just talking about your huge collection of spatula porn (remember: if it's worth keeping, it's worth backing up!), but also job applications, holiday photos (mine do contain the date and destination in the filenames), invoices, letters you've written and sent, maybe names of files one possesses without having a license. My chat logs would reveal who I've talked to and when and how much, my emails reveal when I've sent and received mails (and how long they are), my source codes contain name and structure of unreleased projects. And it's not just my personal privacy; I do have work data on my home computer which I'm contractually required to protect.

Now I could think long and hard if any of that meta-information might be problematic, or I could make a risk assessment about the hoster either losing, leaking or downright analysing and selling my data, but I'd rather just tar it up, encrypt the whole thing including metadata, and stop worrying.

I'm going out on a limb here and going to say that most people don't care about my holiday photos. I don't really take dick-pics like Anthony Weiner, so ... yeah, its not really a concern of mine. Letters written / sent is mostly emails, and I can see the benefits of encrypting an email archive if you're concerned about that.

But for the most part, job applications / resumes are public information for my public identity. I explicitly want more people reading my resume for example. If someone wants my Tetris code, they can have it, I no longer believe my ideas are really worth keeping secret.

As far as contractually obligated files, those are locked up on company computers never to leave. I would hope that none of those files have ever so much as touched my personal computer. What I do for my company is setup a NAS for office use as well as encrypted VPNs for offsite backups (in another office which is physically secured by the other members of my company). I mean, yeah, "fuck the cloud" in that case. But I'm assuming we're talking about personal usage here.

KnightExemplar wrote:I mean, I don't really want to think about how to properly manage a tarball. Do you upload the entire backup every single time a singular file changes?

No, you upload incremental backups, with each incremental diff being a new tar ball. It can be done with a simple bash script, but there's also existing software for it. I mentioned duplicity, it just takes a minute to read the introduction.

I'm not a fan of incremental backups, as its difficult to verify that they're working 100%. After all, if its not a tested backup, then it isn't a backup at all.

But that's definitely a usable approach I guess. I definitely prefer the "backup this folder" approach. If I can read the files in that folder (and keep them up-to-date), then I know it works. Its easily verifiable / testable as well.

It all depends on your needs. If it isn't critical if you lose a week of data, but you would like more, then do a full weekly backup and then a daily incremental backup, or compromise and do a differential.

Thesh wrote:It all depends on your needs. If it isn't critical if you lose a week of data, but you would like more, then do a full weekly backup and then a daily incremental backup, or compromise and do a differential.

Well consider this:

My primary storage device in my home is my Nas4Free box (ZFS is a very good filesystem). "File versioning" is kept by the snapshot feature, which allows me to roll back to any previously snapshot time without any wasted space (in effect, its like an incremental backup except its live, online, and always testable)

Rsync.net similarly has 7-days worth of snapshots on their ZFS-stored server (as well as supporting some kind of direct ZFS-duplication service, looks pretty nice although I've never used it). So yeah, you basically get the effects of a full weekly backup + differential if you just use rsync on the folder you're interested in.

----------

The important bit is to be able to test your backups after doing them. So that you know that you can rely on it. Incremental backups mean it takes a long time to "unpack" to a particular timeframe to test. If you wanted to test Sunday's backup (and you do the weekly on Monday), then you need to unpack 7 files to see the backup.

I guess that problem can be solved with a log of SHA512 hashes or something similar (like the output of md5sum) to verify the condition of encrypted tarballs. But there's definitely a benefit to being able to access any file you've backed up in a short time frame.

KnightExemplar wrote:My primary storage device in my home is my Nas4Free box (ZFS is a very good filesystem). "File versioning" is kept by the snapshot feature, which allows me to roll back to any previously snapshot time without any wasted space (in effect, its like an incremental backup except its live, online, and always testable)

Archiving/Versioning isn't backup! If your array fails or data is corrupted or overwritten for any reason ("What the hell is going on? That was supposed to be sdc2 not sdd2!"), your house burns down, some virus fucks up your shit, your data is gone - that isn't a backup. If you only take a weekly full backup of that, sure you can snapshot to beginning of time... up until your last backup which can be up to a week ago.

KnightExemplar wrote:My primary storage device in my home is my Nas4Free box (ZFS is a very good filesystem). "File versioning" is kept by the snapshot feature, which allows me to roll back to any previously snapshot time without any wasted space (in effect, its like an incremental backup except its live, online, and always testable)

Archiving/Versioning isn't backup! If your array fails or data is corrupted or overwritten for any reason ("What the hell is going on? That was supposed to be sdc2 not sdd2!"), your house burns down, some virus fucks up your shit, your data is gone - that isn't a backup. If you only take a weekly full backup of that, sure you can snapshot to beginning of time... up until your last backup which can be up to a week ago.

Obviously. But this is a home NAS, not like a critical commercial appliance.

But you're wrong about a lot of that, because of how ZFS works.

Archiving/Versioning isn't backup! If your array fails or data is corrupted or overwritten for any reason ("What the hell is going on? That was supposed to be sdc2 not sdd2!"), your house burns down, some virus fucks up your shit, your data is gone - that isn't a backup.

Corruption is highly unlikely. ZFS is transactional (similar to a journal). Individual files are checksumed and self-healing against sector-level corruption. To protect against entire-drive failures, I've even set up mirroed vdevs across two different hard drives from different manufacturing batches.

So basically, I'm only going to lose data here if both of my NAS Hard Drives were destroyed in-between my monthly "ZFS-scrub" checks. Even a total single drive failure is protected due to the nature of mirroed vdev ZFS systems. So a fire, flood, or other natural disaster... maybe if I drop my NAS box down the stairs (but I'll do what I can to keep it safe). Bitrot is simply not going to happen because I personally make sure to regularly ZFS-scrub.

Due to ZFS snapshotting, it'd be insufficient for a Windows-level virus to delete everything on the CIFS mount. The first thing I did when I setup my NAS was make a snapshot, then delete everything on my Windows-box over CIFS to simulate a virus attacking my system (like Bitlocker). Guess what? The snapshot restored it all.

So basically, I'm only going to lose data against a FreeBSD / Nas4Free virus. And since my primary machine is Windows, I find it unlikely that a virus is going to be written to cross the OS boundary.

---------

Furthermore, all of the above is my second layer of defense. A lot of my important day-to-day data is on my Windows machine. (Obviously, anything important is backed up to my NAS: gigabit ethernet means 90MB/s transfers between my primary machine and my NAS box. USING my backup box is as easy as opening "Drive E" thanks to CIFS). Now my "Drive D" on my Windows 10 box mirrored ReFS (Microsoft's competitor to ZFS). Meaning my "Drive D" is similarly immune to bitrot (due to regular scans by Windows 10), immune to sector-level corruption, and so on and so forth. This isn't protected from Windows Viruses (which is why I made my NAS box).

Bonus points: ReFS automagically will work on any Windows8+ system. Even if my system drive were totally wiped out, it is easy to "harvest" a Windows ReFS drive and automatically set it up to be read on a different system. (subject to Bitlocker of course, if you're the encrypting type. I'm not the encrypting type though)

So anything short of my house getting destroyed (or robbed / looted) means my data is 100% safe. Either on my "Drive D" mirrored storage-space ReFS (~150MB/s)... or on my "Drive E" CiFS-connected Nas4Free ZFS box (~90MB/s). "Drive C" for my 500MB/s SSD has no redundancy. My most important files are of course on all of my drives: C, D, and E.

I can understand that some people are even more paranoid than I am... or are like commodorejohn here and may have a higher-chance of "house-gets destroyed" (wildfire country does seem like a risk...). So I can understand the need for geo-redundant backups.

But... I'm not convinced that geo-redundant is going to be very beneficial to me. I just don't think that protecting against fire / flood / thieves... against my data... is really worth it.

----------

I wouldn't call the above setup "professional", but its definitely far above-and-beyond the typical computer user. So its basically where I'm comfortable. A good balance of low-maintenance and redundancy. For anyone with a similar setup, RSync.net is basically the only thing that'd offer additional security without negatively affecting my routine.

KnightExemplar wrote:I'm going out on a limb here and going to say that most people don't care about my holiday photos.

Maybe those who wish to target you with travel ads. Maybe those who wish to compile a list of countries you've been to to assign a score for the likelihood of you being a terrorist. Maybe those who wish to know about your lifestyle for credit scores, risk assessment for the purpose of health insurance etc. These databases do exist. It's unlikely they'd scrape backups on a file hoster for information, but you never know.

Giving away personal information means giving other people power over you. If you choose to accept that cost for the convenience of a simple backup process, that's your decision. Just please don't pretend that the cost doesn't exist.

I'm not a fan of incremental backups, as its difficult to verify that they're working 100%. After all, if its not a tested backup, then it isn't a backup at all.

I think Thesh means the case where you pop in a new drive, do an extensive write/read-test, then discover that you've done the test on the wrong drive. I know I have destroyed at least one partition in my life by being careless with the command line, these things happen. You'll upgrade your raid eventually (newer drives, bigger drives, ..), and that's the perfect opportunity to screw everything up.Your mirroring and versioning improves data reliability, but it's not a backup.

Tub wrote:I know I have destroyed at least one partition in my life by being careless with the command line, these things happen.

I know I have. I added the second drive to take a copy of the first (prior to a routine upgrade, 'just in case') and did "format d:"1.

Except you know how "d" and "c" are a hair's-breadth2 away from each other on a QWERTY? Yeah, that.

UNFORMAT helped a bit...

1 There were previous compressed images (for full reversions, had the upgrade gone totally wrong) and straight xcopied directories (for easy re-adding if the upgrade went right but removed some personal data along the way). I'd done a few machines, already, and all the old stuff now provably not needed any more would have been a pain to manually del and rd, prior to the appearance in DOS of the /s(ubdirectory) switch to do in one go... Formatting was thus handier.

2 And dandruff, and finger grunge. And either salt or sugar, often, can be found rattling around in the cavity beneath and between the keys of a typical office keyboard, presumably due to messy al desko eating/drinking habits; some white crystaline powder, anyway, but I've never yet plucked up the courage to taste test which it actually is.

Tub wrote:I think Thesh means the case where you pop in a new drive, do an extensive write/read-test, then discover that you've done the test on the wrong drive. I know I have destroyed at least one partition in my life by being careless with the command line, these things happen. You'll upgrade your raid eventually (newer drives, bigger drives, ..), and that's the perfect opportunity to screw everything up.Your mirroring and versioning improves data reliability, but it's not a backup.

I haven't done that, but I did actually use dd on the wrong output partition when copying an image from an old computer, and I don't think zfs snapshots save you from overwriting partitions.

Last time I used shred to retire an old drive, I think I spent ten minutes verifying that the external drive was indeed /dev/sdc.

Tub wrote:I think Thesh means the case where you pop in a new drive, do an extensive write/read-test, then discover that you've done the test on the wrong drive.

Why would I do that on my NAS box? I've got a laptop and a desktop where I do extensive write / read tests for my hard drives before putting them in my NAS. The chances of this mistake is simply nil.

Old laptops are extremely good for this purpose. They have a singular SATA-port, and a DVD-boot or USB-stick boot into Kubuntu or GParted or whatever live-CD flavor of the month gets me there.

Tub wrote:I think Thesh means the case where you pop in a new drive, do an extensive write/read-test, then discover that you've done the test on the wrong drive. I know I have destroyed at least one partition in my life by being careless with the command line, these things happen. You'll upgrade your raid eventually (newer drives, bigger drives, ..), and that's the perfect opportunity to screw everything up.Your mirroring and versioning improves data reliability, but it's not a backup.

Step 1: Disconnect the old hard drives.Step 2: Connect the new ones.Step 3: Connect the old onesStep 4: Copy the data over.

The chances of messing up are straight up zero with proper procedures. Also, NAS4Free notes the model number of each connected drive.

Even if I didn't disconnect the drives on an upgrade, the gui of NAS4Free is simple enough to make those kinds of changes without mistakes. (And I don't plan to upgrade any time soon: I have 10TB of hard drives in a mirror-configuration for 5TB of usable storage. Its inconceivable how I'll fill this storage right now... and I'm a video editor who saves a lot of raws for hobby projects)

Thesh wrote: I haven't done that, but I did actually use dd on the wrong output partition when copying an image from an old computer, and I don't think zfs snapshots save you from overwriting partitions.

Last time I used shred to retire an old drive, I think I spent ten minutes verifying that the external drive was indeed /dev/sdc.

If I need new "partitions", I'd create ZFS-volumes or ZFS-data stores, which are served by the snapshot mechanism. Trust me when I say this, ZFS is quite awesome.

Tub wrote: These databases do exist. It's unlikely they'd scrape backups on a file hoster for information, but you never know.

Literally not a single thing you mentioned makes sense. I'll point-by-point you if you don't believe me.

Spoiler:

Maybe those who wish to target you with travel ads. Maybe those who wish to compile a list of countries you've been to to assign a score for the likelihood of you being a terrorist.

Because each time my passport is stamped and my VISAs are approved, they've already got that tracking information. Both the government of my home country (passport), and the government of the country I'm visiting (VISA request). And the airline company that brought me into that country (for advertising purposes).

Maybe those who wish to know about your lifestyle for credit scores

The credit score is the external tracking by three major banking companies. It took a freaking law for that information to be shared to the individual. Every credit-card swipe, every mortgage payment, every car payment is being tracked by my bank.

My banks aren't going to hack into my home NAS or into the cloud to figure out that information. They already got it.

risk assessment for the purpose of health insurance etc.

You can figure out diabetes from holiday photos? You know who has my health records? My doctor. Electronic Health Records ya know?

-----------

The fact of the matter is, the vast majority of data is not actually worth much to other people. The bits that are (social security number, tax information) can be kept under wraps rather easily. Or... they already have it (credit scores are not owned by us. They're owned by like... Equifax and that is shared with every bank you'll ever interact with in the rest of your life).

The people who do care about health records (ie: my life insurance company) either explicitly request health records from me, or make me swear under oath to list all of my health issues as they set my rates. I don't plan to lie so yeah, I tell the truth and give over my health information. Its the nature of doing business.

Tub wrote:Just please don't pretend that the cost doesn't exist.

And you don't pretend that the costs of worrying about this are nil either. You're unwilling to use the best features of the most convenient geo-redundant provider of remote backups because you're worried about giving data away. You're simply not going to be getting 2x-geolocated with ZFS-sync'd snapshots through Duplicity. Period. You're not getting the feature set that is being offered.

I'm not saying that encryption isn't worth it. I've got encrypted data stored on even my Home NAS and I'd definitely certain individual files. But there's actually very little information where "hard security" is worthwhile. And the cost of being unable to connect your system to the current state-of-the-art technology... and the cost of being tied down to a specific program (even an open source one like duplicity) is far greater than being tied down to an industry standard format like sFTP or RSync. (Duplicity has no good Windows client. And don't try to convince me to move my data through Cygwin+Duplicity, that crap messes up files and filenames all the time )

I'm not a fan of incremental backups, as its difficult to verify that they're working 100%. After all, if its not a tested backup, then it isn't a backup at all.

I assume this is a hash-based mechanism however. I'm talking about things like getting a specific file snapshot #5, or browsing through the data as it were on June 10th (or whatever snapshot was closest to it). With an incremental-backup solution, the only way you get this is by downloading the soonest "full" backup, then downloading each incremental, and then applying each incremental to a directory.

THEN you can browse through the filesystem from June 10th. Now I agree, Duplicity can probably automate this process, but with a decent amount of data (say... 150GB), that will take a long time to pipe through to really grab a specific snapshot.

---------

In contrast, I can mount snapshot #5 in ZFS and start browsing through the data over sFTP. The amount of turnkey convenience that ZFS-sync offers me is quite high.

KnightExemplar wrote:You're unwilling to use the best features of the most convenient geo-redundant provider of remote backups because you're worried about giving data away. You're simply not going to be getting 2x-geolocated with ZFS-sync'd snapshots through Duplicity. Period. You're not getting the feature set that is being offered.

I'm confused on this point; if I was following the Duplicity thing correctly, doesn't it use rsync on the storage-server end anyway? I'm not clear on how it would be missing something.

"'Legacy code' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrupwww.commodorejohn.com - in case you were wondering, which you probably weren't.

KnightExemplar wrote:You're unwilling to use the best features of the most convenient geo-redundant provider of remote backups because you're worried about giving data away. You're simply not going to be getting 2x-geolocated with ZFS-sync'd snapshots through Duplicity. Period. You're not getting the feature set that is being offered.

I'm confused on this point; if I was following the Duplicity thing correctly, doesn't it use rsync on the storage-server end anyway? I'm not clear on how it would be missing something.

Fair enough.

I guess I was talking about my specific use-case with ZFS-sync. If you aren't using ZFS, then that feature doesn't apply to you.

----------

My argument specifically here is that I don't like incremental backup solutions. In my experience, its too much of a hassle to restore specific snapshots to grab files from certain dates... especially since my current ZFS-based workflow basically negates the need for incremental backups. RSync.net's ZFS-sync is an absolute godsend, and a premium feature (even if its more expensive than I'd like it to be)

I was thinking though, and maybe a 50-GB slab from RSync.net ($10/month, the minimum purchase you can get from them) would be enough for tax-documents and other material... and would be small enough to be handled automatically from my NAS. Unfortunately, geo-data-backups at this size is easy. Its called "burn a Blu-Ray with 5x copies and leave them in your friend's house"

Yeah, I suppose if easy access to arbitrary snapshot dates is a requirement that would be a problem, but that's not really an issue for me; I just want to know that a copy of my data exists somewhere else if the worst should happen.

"'Legacy code' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrupwww.commodorejohn.com - in case you were wondering, which you probably weren't.

I'm currently playing around with duplicity (thanks for the "reminder" to set up my own backups, commodorejohn!). It seems to do exactly what I envisioned my backups to do, just better than a selfmade bash script would have done it. I'm almost glad I've been too lazy to implement that thing

It can mirror to an arbitrary number of storage providers (say, fileserve, s3 and google drive), if you're concerned about geo-redundant storage. Provider-redundant is even better. Though it seems that both commodorejohn and I keep plenty of local backups, so the offsite storage is only for the worst-case scenario when all other lines of defence have fallen. It doesn't need 100% reliability, just enough that P(house on fire) * P(storage provider on fire) < epsilon.

Duplicity can restore to any time when you made a backup, assuming you didn't delete the old archives. You get snapshots without being forced into a specific filesystem. It's not even a hassle (as KnightExemplar fears):> duplicity restore --time [time] url://storage.provider/ /home/mewhere [time] is a timestamp or something like 3W for 3 weeks ago. Arguments exist for restoring only a single file or directory. There's also a list-current-files command so you can look at the snapshot before downloading/restoring it.Considering that I've been doing versioned backups for years, and have never needed anything but the latest copy, and I've only ever needed that copy on drive failures, that's good enough.

and the cost of being tied down to a specific program (even an open source one like duplicity) is far greater than being tied down to an industry standard format like sFTP or RSync.

No worries there, rsync.net officially supports duplicityAnd if things really hit the fan, duplicity is just a wrapper around gnupg, tar, gzip and rdiff (rsync). It takes some manual work, but you can restore snapshots with nothing but those 4 programs (at least according to the docs, I haven't tried yet).

There's also a C# reimplementation called Duplicati, if windows compatibility is a concern. Not sure if the file formats are bit-for-bit compatible though. I don't keep backup-worthy data on windows partitions, so I'm not going to investigate further.

KnightExemplar, I'm not saying that you should switch to duplicity. You have a setup that works fine for you, stick with it. On the other hand, I'm thoroughly unimpressed by your solution; it solves problems I don't have, it imposes restrictions I don't want and it binds me to a specific service provider which doesn't even offer their services in my country. Duplicity isn't perfect, but it's a better fit for my existing setup, my needs and my paranoia.

Oh, and my favourite way to lose a partition was when I reinstalled a server, trying to use full disk encryption. I was almost done setting everything up, configuring the services and copying the data back over. I rebooted to make sure all the services started up correctly, and that's when I learned that you shouldn't keep the keyfile inside the encrypted disk.

Tub wrote:Oh, and my favourite way to lose a partition was when I reinstalled a server, trying to use full disk encryption. I was almost done setting everything up, configuring the services and copying the data back over. I rebooted to make sure all the services started up correctly, and that's when I learned that you shouldn't keep the keyfile inside the encrypted disk.

Tub wrote:No reliability guarantees AND you will have other people using your bandwidth just when you're trying to stream a movie or fight the boss in an online game? I like the premise, of course, but actual server hosting has its advantages over home servers on weak DSL connections.

What kind of reliability "guarantees" do you have with any provider? The amount of redundancy is just another setting in your config file, analogous to a different contract with your hosting company.And like I said, hosting stuff yourself is optional, but it makes it cheaper/free/"profitable" to use that network. Likewise, throttling the bandwidth or even taking your server offline while you play a game is also up to you. (you probably get penalized if some server tests your storage right then, but rightfully so)

Tub wrote:and that's when I learned that you shouldn't keep the keyfile inside the encrypted disk.

Flumble wrote:What kind of reliability "guarantees" do you have with any provider? The amount of redundancy is just another setting in your config file, analogous to a different contract with your hosting company.

The provider has a proven record of having operated reliably for X years. That's not a guarantee that it'll stay that way, but I feel safer using those than a random computer of a random individual that may be disconnected at any moment.

Flumble wrote:Hahahaha! I take it it was late at night?

I wish I could claim that, but no. I was a bit distracted; there were lots of waiting periods which I wasted on games.I just installed that thing again. With all the commands and configuration formats fresh in mind, it went faster than the first time.

KnightExemplar wrote:My question is: how long does that take? If you have a 150GB folder, I'm assuming that is going to take at minimum, 150GB of bandwidth before you can look at the folders and that data.

There is an encrypted manifest. For the filelist, you only need the manifest, which is ~500 kb even though there are ~50k files in my test-backups.To actually restore a file, more is needed. The archives with the diffs are split into 200MB chunks (actual size is configurable), so duplicity might be smart enough to only grab the parts containing relevant data.

For me, my first approach to restoring is to restore from local backups. If those are gone, something really bad happened (bad enough to trash 2 disks at the same time), and I need to download a full copy anyway. I really don't see a situation where I'd want to download a partial backup from the remote storage, so I'm not going to set up a benchmark.

KnightExemplar wrote:My question is: how long does that take? If you have a 150GB folder, I'm assuming that is going to take at minimum, 150GB of bandwidth before you can look at the folders and that data.

There is an encrypted manifest. For the filelist, you only need the manifest, which is ~500 kb even though there are ~50k files in my test-backups.To actually restore a file, more is needed. The archives with the diffs are split into 200MB chunks (actual size is configurable), so duplicity might be smart enough to only grab the parts containing relevant data.

For me, my first approach to restoring is to restore from local backups. If those are gone, something really bad happened (bad enough to trash 2 disks at the same time), and I need to download a full copy anyway. I really don't see a situation where I'd want to download a partial backup from the remote storage, so I'm not going to set up a benchmark.