Posted
by
timothyon Saturday August 25, 2012 @01:26PM
from the when-birdwatching-goes-too-far dept.

An anonymous reader (citing "silly workplace security policies") writes "I'm in charge of developing for my workplace a particular sort of 'dynamic' file server for handling scientific data. We have all the hardware in place, but can't figure out what *nix distro would work best. Can the great minds at Slashdot pool their resources and divine an answer? Some background: We have sensor units scattered across a couple square miles of undeveloped land, which each collect ~500 gigs of data per 24h. When these drives come back from the field each day, they'll be plugged into a server featuring a dozen removable drive sleds. We need to present the contents of these drives as one unified tree (shared out via Samba), and the best way to go about that appears to be a unioning file system. There's also requirement that the server has to boot in 30 seconds or less off a mechanical hard drive. We've been looking around, but are having trouble finding info for this seemingly simple situation. Can we get FreeNAS to do this? Do we try Greyhole? Is there a distro that can run unionfs/aufs/mhddfs out-of-the-box without messing with manual recompiling? Why is documentation for *nix always so bad?""

There's no reason you need a union filesystem. Just mount the data at an appropriate point in a directory tree. Union file systems are designed to solve a different problem.

What you boot from has nothing to do w/ what you read the data from.

Samba is a really strange choice. Given the data volume I'd expect you to be using a large Linux cluster to process the data for which NFS would be more appropriate. It certainly sounds like microseismic data in which case the processing will benefit from making duplicate copies of the data and mounting read only via NFS so the first available server provides the data. Multiple ethernets are needed to get full benefit from doing that though.

*nix documentation is actually very good. But there is a lot of it, so you tend to have grey hair by the time you've read all of it.

After looking through your proposal you need 2 pieces. You need a WORKSTATION to accept the drives as well as cleanse( you are going to verify the data as non-malicious right?) and catalog the data and be able to shut down and boot up on command. Then you need a SERVER that hosts the data to be served. Thinking you are going to serve directly from the hotswaps is a bad idea.

For starters, I'm really tired of this/. *NIX is-too-hard ranting all the time on 'Ask Slashdot' posts. Don't be a n00b douche; if you don't get it, then spend some time and get it. Don't blame the documentation; dig in and figure out something for yourself for once. Sometimes you Nintendo-and-Mt-Dew generation make me want to throw up.

As for your solution, do-not go with some installable appliance-type distro like FreeNAS; yes it's *BSD under the hood, but you're at the mercy of what that 'focused' distro is goign to provide for you. Case in point: since you're undecided, go with a full-blown distro so you have some flexibility to grow and augment the mission and purpose of this server you're hosting data on.

Since you're clearly a n00b since it's coming to picking out a *NIX solution, go with anything Linux at this point, and set up the NAS services yourself (e.g. Samba/SMB, NFS, etc.) In turn, you'll be able to go to get better community support helping you out, you'll have more flexible OS configuration and growth, and you'll probably learn something to boot.

Also, you don't need to do union filesystem. Simple udev rules and auto mounting them under your top-level structure you're sharing out with your NAS services will do you just fine.

1) The cold plug is not the issue, rather, the server itself needs to be booted and halted on demand (don't ask, long story).

2) Because it's better? Do I really need to justify not using windows for a server on Slashdot?

3) The shares need to be easily accessible to mac/win workstations. AFAIK samba is the most cross-platform here, but if people have a better idea I'm all ears.

> Better yet, tell us what you need to do

- Take a server that is off, and boot it remotely (via ethernet magic packet)- Have that server mount its drives in a union fashion, merging the nearly-identical directory structure across all the drives.- Share out the unioned virtual tree in such a way that it it's easily accessible to mac/win clients- Do all this in under 30 seconds

I don't know why people keep focusing on the "under 30 seconds" part, it's not that hard to get linux to do this.....

1) The cold plug is not the issue, rather, the server itself needs to be booted and halted on demand (don't ask, long story).

Yes, I will ask why. Like why booted, and not hibernated, for example, if part of the reason is that it has to be powered off.If the server is single-purpose file serving of huge files once, it does not benefit from huge amounts of RAM, and can hibernate/wake in a short amount of time, depending on which peripherals have to be restarted.

2) Because it's better? Do I really need to justify not using windows for a server on Slashdot?

Yes? While Microsoft usually sucks, it can still be the least sucky choice for specific tasks. And there are more alternatives than Linux out there too.

3) The shares need to be easily accessible to mac/win workstations. AFAIK samba is the most cross-platform here, but if people have a better idea I'm all ears.

What's the format on the drives? That can be a limiting factor. And what's the specifics for "sharing"? Must files be locked (or lockable) during access, are there access restrictions on who can access what?For what it's worth, Windows Vista/7/2008R2 all come with Interix (as "Services for Unix") NFS support. So that's also an alternative.

- Take a server that is off, and boot it remotely (via ethernet magic packet)

That you want to "wake" it does not imply that the server has to be shut off. It can be in low power mode, for example - Apple's "bonjour" (which is also available for Linux) has a way to "wake" services from low-power states.

- Have that server mount its drives in a union fashion, merging the nearly-identical directory structure across all the drives.

Why? Sharing a single directory under which all the drives are mounted would also give access to all the drives under a single mount point - no need for union unless you really need to merge directories and for some reason cannot do the equivalent with symlinks ("junctions" in MS jargon).Unions are much harder, as you will need to decide what to do when inevitably the same file exists on two drives (even inconspicuous files like "desktop.ini" created by people browsing the file systems).Even copying the files to a common (and preferably RAIDed) area is generally safer - that way, you also don't kill the whole share if one drive is bad, and can reject a drive that comes in faulty.But you seem to have made the choices beforehand, so I'm not sure why I answer.

- Do all this in under 30 seconds

You really should have designed the system with the 30 seconds as a deadline then.

If I were to do this, I would first try to get rid of the sneakernet requirement. 4G modems sending the data, for example. But if sneakernetting drives is impossible to get around, I'd choose a continuously running system with hotplug bays and automount rules.

Unless the data has to be there 30 seconds from when the drive arrives (this is not clear - from the above it appears that only the client access to the system has that limit), I'd also copy the data to a RAID before letting users access it.

Sure, Linux would do, but there's no particular flavour I'd recommend. ScientificLinux is a good one, but *shrug*.If you need support, Red Hat, but then you also should buy a system for which RHEL is certified.