Many of us have spent countless of hours transferring old disks and tapes to image files, an important task to preserve our common scene history. And let's not forget party photos, scans of papermags, votesheets and other kinds of scene-art either!

However, how do you handle this backuped data? Does it sit on your desktop PC or laptop without any kind of redundancy? If so, you risk having to do the preservation all over again - or even worse, loose data in case you do no longer have the original sources handy or they have since fallen victim to bit-rot.

I for one am not too eager to have to revisit the close to 300 Turbo 250 tapes in my collection all over again, just because of being sloppy...

The best way of ensuring that data is not lost is to avoid having a single point of failure. Spread your data to multiple individuals and sites. There is nothing more dangerous than unique data being stored in one location, in the care of one individual. Breakdown of the storage medium, or something happening to that individual, and the data could be lost forever.

That said, you might have data not suitable for anyone else. Unreleased demo-parts, unused snippets of code or old personal notes that you want to preserve solely for yourself.

With the knowledge of accepting a greater risk by not spreading the data around, there are of course actions you can take to reduce that risk, even if you wish to keep some (or all!) data to yourself.

How to make that kind of data a bit more secure is what this article is all about! However, this is by no means a complete guide, but hopefully you will at the very least get some tips on where to start to keep your data a bit more safe.

Backup plan and version handling

Everyone know how important backups are. Still, many are careless and only learn after already having lost some important data. Make sure you are not one of them!

Your first decision is whether to maintain your backups on your own or to back up to the cloud. Backing up to the cloud might be an attractive solution when it comes to cost, easability and accessibility, but consider the pros and cons carefully before you go down that route.

You might not want to store material that infringes on copyright on a random cloud service, and there is always the possibility of the service getting hacked to consider as well. Of course, getting hacked is a possibility when storing your data on premise as well.

Either way, you should create and maintain a proper backup plan/policy. Depending on what type of data you are backing up, the plan will differ. If your data is an archive that doesn't change often, a few full backups each year might be enough. In contrast, if your data changes often you might want to run incremental backups weekly, daily or even several times a day.

There are several ways of handling file versions that doesn't require a traditional backup solution, though. For instance, if your files are saved on a Windows 2008/2008R2/2012 or 2016 server with shadow copy enabled, you can easily revert to a previous version of a saved file.

Also desktop OS'es like Windows 7/Vista/8 and 10 contains shadow copy, but it depends on which edition and in several cases the feature is severely limited compared to the server operating systems.

Several NAS systems offer dropbox-like functions to sync locally stored files to your NAS, which also support version-handling. Examples of this is QNAP's Qsync and Synology's Cloud Station Drive. QNAP also has a nice solution called Cloud Drive Sync, which sync your on-premise NAS with a cloud service of your liking, such as Dropbox, Onedrive or Google drive.

For those who prefer building your own NAS boxes, there are other options too of course, like Syncthing coupled with FreeNAS.

However, be sure not to treat your NAS as a backup, unless you have your data backed up to more than one hardware device. Also, versioning is in no way a substitute for proper backups!

Backup media

Today I would not recommend anyone to back up to any other media than a hard drive, or some good quality USB pendrive for smaller backups. Hard drives are reasonably cheap nowadays, with exception for the larger SSD models. Considering we are focusing on backuping data related to the C64 and possibly other old computer formats, the space required is by no means massive anyway.

The days of backing up to tape or CD/DVD are gone, with the exception of tape for very large datasets where nothing can compete with the price/capacity ratio. There is the reason for large data centers to still hold a vast amount of their seldom accessed data on tape, which is then streamed to disk in realtime when being requested. Slow, but pretty darn faster than tape used to be. There have been quite some innovations in this field as well, but nothing that would interest the Average Joe backuper of course.

Depending on how much storage space you need for your backups, a mechanical disk is probably the wisest choice, but if the amount of data is small SSD disks might be an alternative. The mechanical WD Red series of HD's are slow but durable and designed for 24/7 NAS use, and could be used for backups in other installations as well. Also Seagate has their own competing NAS HD.

Backup location

Saving your backups in the same physical building as the original data is generally a big no-no. While it is fine to retain a local backup on the same site, you should at least have a fairly recent archive copy in an off-site location. This could be at your friends or relatives house, as long as it's someone you trust.

RAID - an extra layer of protection

As the old saying goes; RAID is not backup. That said, if you are storing your data on some kind of server (as mentioned before, some kind of NAS might be most commonly found in a home environment), you should employ RAID for fault tolerance. Most common is RAID 1 (gives you half the storage capacity), RAID 5 (which I recommend for any server with 3-5 disks) and RAID 6 (better protection but less available space).

RAID 1 works by writing the exact same data to (usually two) hard drives. You can then loose one of the drives without loosing any data, the penalty being that you loose half your disk capacity in exchange for the redundancy. RAID 1 normally see a performance gain on random reads, while write performance is nor better or degraded. This type of RAID is usually the only available option on smaller 2-bay NAS systems.

RAID 5 employs striping of the data together with parity distributed among the drives in the array. One disk in a RAID 5 array can fail without you loosing any data, as the data from a failed drive can be calculated from the parity of the other drives. You need at least three disks in your array to use RAID 5. Write performance takes a bit of a hit as parity needs to be calculated and written to disk. The more disks you have in the array, the less storage space you loose - but more drives also means an increased risk of more than one drive going bad before you have the chance to swap the first damaged one out, resulting in data loss.

RAID 6 is very similar to RAID 5 but adds one extra layer of protection as it uses two parity blocks distributed over the array members instead of one. In practice, that means that you can loose two simultaneous disk failures without loosing any data. Read speed is equal to RAID 5, but write is slower due to more parity data needing to be written.

Several NAS vendors, such as Synology, have their own RAID versions which are mostly derived from the above, often supporting the bundling of different sizes of hard disks in the same RAID array (with a large penalty on the storage space as a result).

Data integrity - what file system should I use?

The most common file systems on a NAS today are ext3, ext4 and btrfs (since most of them run one flavour or another of Linux). Windows servers use NTFS or ReFS, and also USB drives are usually pre-formatted as NTFS these days. One would think that the most widely employed file systems are as safe as its neighbour, but that is not always the case. For starters, if anyone is still using FAT32, just stop! FAT32 is way beyond its expiry date and should naturally be avoided at all costs - if you care about your data, that is.

NTFS and ReFS

NTFS (New Technology File System) is a proprietary file system from Microsoft, and not especially new technology any more. Available in every Microsoft server OS since Windows NT 3.1, and in every desktop OS since XP (although in different versions of course). It has some quirks, but even if the Linux guys will come after me with pitchforks and torches now, I must stress that it's a mature file system that is unlikely to give you much trouble in everyday use. However, note that NTFS doesn't do any data integrity checking and only rely on the built-in ECC checks in the hard drives of today.

Microsoft is planning to phase out NTFS with their new file system ReFS (which has much more in common with BTRFS and ZFS - more on those later). You can employ ReFS since Windows 2012 Server and Windows 8.1, but its so far not widely used in home environments.

EXT3

Ext3 is more seldom used today, but just a few years back more or less all NAS devices used this file system. Some older NAS models still sold might have ext3 even to this day. Limitations of ext3 is a maximum volume size of 16TB, and max file size of 2TB (which is not much of a problem considering we mainly are talking about storing C64 disk and tape images here). What's worse is that ext3 disks are prone to get fragmented, and its also not the fastest file system around. Like NTFS, it does not do any data integrity checks on its own.

EXT4

Ext4 is very common in NAS devices, and while backwards compatible to ext3 it offers better speed and employs delayed allocation to keep drives free from fragmentation. Other improvements include support for larger volumes (a staggering 1 EB) and file sizes up to 16TB - again less of an issue for our needs. More importantly, checksums are used for reliability, which makes it superior to both NTFS and ext3 for long term data storage.

BTRFS

BTRFS (or Better File System) is a more modern file system than ext4, but still not employed as widely. It has some interesting features such as de-duplication and compression which saves disk space. However, it's not tried and tested and still believed not to be ready for production environments. Too many stories about being slow, using lots of system resources and even data-corruption makes at least myself stick to ext4 for the foreseeable future. Still, one of the larger NAS vendors (Synology) is offering this filesystem as a choice next to ext4 for their newer devices.

ZFS

ZFS is a combined file system and logical volume manager designed by Sun Microsystems and released by them as open source in 2005. Since then Oracle have acquired Sun Microsystems and today there are several versions of ZFS, including closed source versions held captive by Oracle, GreenBytes and others, as well as the still open source OpenZFS.

ZFS is a very interesting file system, designed for maximum data-integrity. It includes checksumming of ALL data (including meta-data) to detect corruption, exactly what you need for long-term storage. This and other features protects against bitrot, phantom writes and other data corruption anomalies that most other file systems do not take into account. Power spikes and even cosmic radiation are sources for such problems, that ZFS is second to none in handling.

ZFS employs its own software RAID system, and it is not advisable to run it on top of any hardware RAID system. Designed for long term storage of data, this is an excellent choice for file system to host your precious data on.

However, one downside is that you cannot grow a ZFS RAID array one disk at a time, so you have to plan your array carefully before employing it. If you decide to build your own NAS using FreeNAS, you will be using ZFS as file system. Also, be prepared to stock up on plenty of RAM in your server, 8GB is a bare minimum to run ZFS somewhat efficiently, and its kinda CPU intensive as well.

Summary

Making sure your data is always safe is not an easy task. At least in Sweden, some public archive laws stipulate that certain data needs to be accessible for all future eternity. Good luck with that...

Realising that demand is impossible in so many ways it's hilarious, we still need to do our best - especially when it comes to scene related data.

To sum things up; Some kind of a NAS or fileserver with RAID is a good thing - but make sure you make backups of it in a regular fashion!

As for file system choice, if you do not need the flexibility and have enough horsepower, go with ZFS for long term storage of your precious data. If you need the flexibility and have less horsepower, stick with ext4 or NTFS (preferably ext4).

Also make sure you have one of your backups in a different location and perhaps even in the hands of someone you trust.