Posted
by
timothyon Thursday December 10, 2009 @09:36PM
from the now-you-can-sleep-nights dept.

An anonymous reader writes "The long-time Linux kernel module for block replication over TCP, DRBD, has been accepted as part of the main Linux kernel. Amid much fanfare and some slight controversy, Linus has pulled the DRBD source into the 2.6.33 tree, expected to release February, 2010. DRBD has existed as open source and been available in major distros for 10 years, but lived outside the main kernel tree in the hands of LINBIT, based in Vienna. Being accepted into the main kernel tree means better cooperation and wider user accessibility to HA data replication."

Lot of different ways to get similar results. You might say I'm cloudy on which of these is really equivalent, is a good idea or the best way to do it, or has good performance.

There is Gluster [gluster.org] which sits on top of any existing disk file system, via FUSE, I think. No kernel module needed, only runs a daemon. I tried version 2, and it worked fine, however I didn't demand much of it. They've just come out with version 3.0 that doesn't need libfuse anymore.

Are some of the new file systems under development, such as btrfs, going to have distributed, networked operation as a basic feature? I recall hearing that ZFS has some ability along those lines.

Not exactly. ZFS has send/receive functions that let you copy a filesystem snapshot (full or incremental based off a previous snapshot) to another location. These functions just use stdin and stdout, relying on rsh/ssh/nc/whatever for network communication. It's designed more for remote backup purposes, rather than high availability.

The big problem with send and receive is that if you have any bit errors in the data stream receive will back out everything. That means its useless for long term backup where you might only need to get one file off an tape since its an all or nothing. I think ZFS's biggest failure at this point is a lack of a way to do backups without modifying the meta data on the files.

If you are using zfs send/receive for backups, you should be using incremental replication. You take a shapshot on your live system, then use zfs send to replicate that snapshot on another system. For a long-term backup, you then dump the copied snapshot to a tape (reading a read-only snapshot doesn't modify anything). You don't want to use incremental backups for long-term backups because they multiply the chance of corruption.

If you use ZFS on FreeBSD then it fits into the GEOM stack. You can use ggate to provide physical devices that are exported over the network then add these to a ZFS pool. You can also do it the other way up and export ZVOLs via ggate and run some other filesystem on top of them, although that's probably less useful.

I have searched high and low for something truly equivalent to DRBD, and cannot find it.

Not only does DRBD provide replicated storage that can be shared among multiple nodes with synchronous writes, but it also has HA features, like supporting failure and restoration of a node without a loss in service.

About 15 years ago, I worked for a place that used Tru64. It offered very similar technology to this. Frankly, we found typical hardware solutions to work better. Software is better at some things, but for work like this, you want it done as much in hardware as is possible.

Yeah 10 years ago I might have bought a handful of cmos chips for a particular task. Now I just buy a couple of atmel atmega8 microcontrollers. Its cheaper than a couple of logic gates and a flip flop.

Doing it in software for purely virtual hardware is useful. I know it's been used to sync disks across the network on Xen hosts, the idea being that if the local and remote copies of the disk are kept in close sync, you can migrate a virtual machine with very low latency. Should be able to do similar tricks with other Linuxy VMMs. Having software available to do this stuff makes it easy to configure this sort of thing quickly, especially if you're budget-constrained, hardware-wise.

You can achieve live migration with iSCSI and AoE too, and if you use a SAN you will probably continue to use one of these network block device protocols.

What DRBD does it make it relatively simple to set up a redundant SAN, using commodity hardware, from which you can export iSCSI devices etc.

Of course if you are going to use local storage for your VPSs it is just as easy to set DRBD up on those hosts and forgo any network block device layer on top of it. Dual primary mode makes live migration in thi

I suspect that, like so many things, while there is room for the best way, there is a great deal of room for the "reasonably good and a whole lot cheaper" way.

A whole lot of progress in modern IT, especially on the server side, is less about exceeding the architectural sophistication of 70s-80s UNIX systems and mainframes, and more about making some of those capabilities available on sucktastic x86s.

I'm not about to dismiss your experience, but things have changed over the last 15 years so it might not be as relevant as it once was.

In that time processors have become much faster, memory has become much cheaper, commodity servers have also become much cheaper and a lot of software has become free. While that has happened hard disks have become only a little faster. As a result many people consider custom hardware for driving those disks to be unnecessary - generic hardware is more than fast enough and is significantly cheaper.

There might still be some compelling reasons to go with expensive redundant SAN equipment, but for many situations a couple of generic servers full of disks and running Linux and DRBD will do an admirable job. The bottleneck will most likely be the disks or the network, both of which can be addressed by spending some of the vast amount of money saved by not going with typical enterprise solutions.

"A little faster" is a bit of an exaggeration...see a 1994 hard drive [stason.org]. 13MB/sec vs. 2009's 6000 MB/sec on SAS. In 1994, people were running what, 50Mhz PCs? They haven't improved by the same amount, nor has the speed or quantity of RAM in the typical machine.

What are you talking about? Tru64 has nothing that functions like DRBD and never has. You need to re-read what DRBD actually does because you're getting confused. Also, 15 years ago Tru64 was only 1 year old, only it wasn't Tru64 back then it was DEC OSF/1 and it was really quite crude and buggy compared to the Tru64 in circulation today. So you would not have had a very spectacular experience with it.

We use DRBD for some very mission critical servers that require total redundancy. Combined with Heartbeat I can fail over from one server to another without any single point of failure. We've been using it for more then 5 years, and never had any major issues with it. It will be great to have it in the mainline kernel.

We use it only for mirroring the Databases, for mirroring files we use Mogilefs and other methods. The problem with DRBD is that once the primary is down, to check both machines and decide if is OK to resync the disks takes a lot of time, And only DB needs the low latency mirroring in our case.

We have used drbd 0.7 for some mission critical server, but it gave more headaches than a warm (or even cold) standby. The main problem is keeping you nodes synchronised for the disks that are NOT in the drbd (e.g./,/etc,/usr, etc). We put our software on drbd disk and the database on another. However, when adding services, it is easy to 'forget' to add the startup script in/etc/ha.d and the first failover results in not all services being started. Which leads to a support call.

That's why configuration management systems like Bcfg2, Puppet, Chef, Cfengine, etc. exist. They can guarantee that all the relevant configuration is identical across your systems.

As for services managed by the HA demon, with the modern configuration of OpenAIS/Pacemaker (even in Heartbeat 2.0) there's a CIB (Common Information Base) that shares the configuration between all the cluster nodes. It makes it pretty much impossible to not have the identical HA services configured cluster-wide.

The servers are connected back to back by a direct gigabit ethernet link, and we use DRBD in protocol B (memory synchronous).Thus all transactions are guaranteed to hit the disk, we get fast performance, and excellent re

Just what we need, yet another networking module built into the kernel. Creating a fresh config with the 2.6 series kernels has become even more of a hassle since there are so many modules that are activated by default. To stop the insanity I have to go through and eliminate 90% of what's there so that 'make modules' doesn't take longer than the kernel proper. Most of them are targeted for special applications and don't need to be in a default build.

Maybe stop building kernels by hand and you'll be a lot happier, then, eh? Seriously, there's virtually no reason to build a custom kernel unless you have some pretty unusual requirements. So quit wasting your time. And if you insist on building kernels by hand for no particularly good reason, quit bitching. It's not like you don't have a choice.

People who build (and test) their own custom kernels are important. Sometimes, a bug won't show up except with some weird combination of kernel options, because some code path dependencies are missed with the fully configured kernels that the distros build for you.

People who build (and test) their own custom kernels are important. Sometimes, a bug won't show up except with some weird combination of kernel options, because some code path dependencies are missed with the fully configured kernels that the distros build for you.

Well, that's very noble. Nevertheless, those who make the choice to build their own kernels, as valuable as they may be, are still making a choice, and that choice means putting up with the tedium of configuring and building the kernel out. Don'

i'm sorry to say, but that's not a good attitude. and i'm being polite here.

developers need testers. some arrogant assholes might claim they don't, but then they're known as ones. now, to attract testers you not only are polite to them, you also do not discourage them by breaking or ignoring things that hamper them (but might not concern casual users), you actually should build tools and other support functionality for testing.essentially, having less testers will impact quality of the software for everybody else, so casual users also should desire for the project to have more testers.

But "make oldconfig" is there since years.It's not tedious at all to configure your new kernel when you have your old config file. Only the new options or the modified ones will show up.So the tools are already there for those that build their own kernel.

How much do you actually remove? I've not compiled Linux for almost a decade now, but I used to compile a custom FreeBSD kernel after install. Now all of the things that I want are compiled in or in modules by default and everything else is in modules. The stuff I'm not using just doesn't get loaded. The only overhead you get from modules that are not loaded is a small amount of disk space and a slightly longer kernel compile time (which doesn't matter if you're not compiling your own kernel). Accordin

I generally only change my kernel config when I buy a new PC or add new hardware. If I am building a new PC I start with a vanilla kernel source and then go through enabling just the functionality I need, and screw all those modules I just build it in unless it has to be a module. This may result in kernel that does not fit on a floppy disk but why would I care? it doesnt fit on a punched card either.

Actually, custom kernels work better for most applications. It reduces the bloat of unwanted code that's been compiled in, and gives you exactly what you want.

If you're trying to save a megabyte of RAM on a modern computer, you're a tool. Building your own kernel was totally mandatory back in the 386 days, but it's totally unnecessary for most users. They derive a lot more benefit from knowing that DKMS will function. With that said, I do have a laptop that I've pondered building a kernel for, because it's got a so-far-unsupported processor (Athlon 64 L110) and if I want cool n' quiet I need a custom kernel. But as a user the best bet is to buy something already

No, but you have to have some compiled in. Most of the stock distros I've went looking through have things I don't want or need compiled in. The modules tend to be a little slower, and at very least I have to wait for all of them to load before things work. Why should you have a delay while they load, when they can be put in the kernel right off, and just work. On a one-off machine, you don't really care, but when you have a network of hundreds or thousands of machines, you don't want to

You want "make localmodconfig", which I think was also added recently, possibly to 2.6.32 actually. This builds a kernel using a local.config file, except that it only compiles modules that show up in lsmod. So if you boot off your vendor kernel with a squillion modules, let it load the modules you actually *use* then do make localmodconfig, you can make a kernel that only contains those modules. I don't know what it does if module names etc change, maybe you'd need manual fixup then - should still be less work than you currently are doing though.

They are called modules for a reason: You can add or remove at will, including whether or not you bother to build them at all. To say modules are "built into the kernel" is incorrect; module code is included with the kernel source code, but the modules themselves are only built and used if you choose.

As concerns the "insanity" of configuring a kernel, here again you have a choice: Use Ubuntu. But if you want a fast, lean, mean machine you really do want to craft your kernel to fit your specific needs.

I admin AIX systems for my day job... One thing that's really nice about AIX is that the filesystem and underlying block device is highly integrated. This means that to resize a volume you can run a single command that does it on the fly. For AIX admins who are new to Linux it seems a step backwards and they liken it to HP-UX or some earlier volume management...

Ahh, but the beauty of having separate filesystem and block device is that it's so damn flexible. I can build an LVM volume group on iSCSI LUNs exported from a another system. In that VG I can create a set of LUNs that I can use for the basis of my DRBD volume. In that DRBD volume I can carve out other disks. Or I can multipath them. Or create a software RAID.

Anyhoo, DRBD is a really cool technology. It gives the ability to create HA pairs on the cheap. You can put anything from a shared apache docroot there to the disks for Oracle RAC. With fast networking available for cheap, almost any shop can have the toys that were once only affordable to big companies...

Every time someone talks about how much they like some filesystems on Linux, someone pops up to tell us about how great ZFS is. Well, the license is shit, it was chosen specifically for GPL incompatibility, and sun can fuck off into the air. Stop trolling.

And it's not even a license problem it's a patent problem. Remember how ZFS is like 6000 LoC? It would have been re-implemented by now if not for patents - heck, it's far more useful than many other odd filesystems linux supports.

ZFS works great on Linux. Some of us don't care about the license, only if the software works. If you're a lawyer then I'm sure you love to get in a tizzy about the license, but for technical people with real work to do it's about whether the code works and is stable. It does and it is.

Every time someone talks about how much they like some filesystems on Linux, someone pops up to tell us about how great ZFS is. Well, the license is shit, it was chosen specifically for GPL incompatibility, and sun can fuck off into the air

I think the point is somebody is going on and on about their awesome new buggy whip and somebody pops up and says, "dude, it's the 21st century, buy a fucking car."

But anyway, you identify the real problem well, and there's some hope that Oracle will liberate the talent a

He was saying that in AIX it's all integrated and therefore easy and AIX admins tend to think of the way it's done in Linux as a step backwards, BUT with the Linux way of doing things it's much more flexible exactly because "every partition / [logical] volume can be partitioned again, and so on."

I dont like drbd (though i've used it for a while)... its a massive convoluted and complex mess and fairly inflexible.

Personally, im hoping dm-replicator gets near completion sometime soon though details of it are rather scarce (i do have a kernel built with the dm-replicator patches, but trying to do anything with it seems near impossible)...

I do a fair amount of work inside the storage world and drbd is just such a mess in so many ways.

I sounds very critical and so forth to drbd and thats not the way i mean to come across. What I really am trying to say is that its bloated for the small amount of functionality it does and with a couple of minor tweeks could do much MUCH more. Its a kewl piece of software, but like many FOSS projects has a hideous, weighty config prone to confusion (something you just dont need with DR).

Don't hold your breath for dm-replicator, it's still a way off. And even when it does hit you'll only get active-passive replication. Active-active isn't even on the road map yet and DRBD has that today. In addition there is no support today for dm-replicator in any of the popular linux cluster stacks where DRBD is very well supported and has been for a many years.

I'd love active-active for some of the systems I'm working on. However http://www.drbd.org/home/mirroring/ [drbd.org] seems to imply that it is currently complex, limited, and flaky. Did you find a better way, or are they just being cynical?

I'm not sure. It might just be their some pages on their web site are out of date. For example their roadmap page [drbd.org] says that 8.3 is a future release and features "Introducing mechanisms to better deal with temporary network failures for devices in primary-primary mode". But 8.3 is already out and a yum search shows 8.3.2 is available for F12 if I want it.

I implemented a DRBD/heartbeat mail cluster for a client about six years ago. At the same time I implemented a half-baked user replication solution using Unison when we should have been using LDAP. I picked up DRBD and heartbeat easily under pressure and found the config logical and consistent once I understood the underlying concepts. Certainly not bloated. Unison on the other hand caused major headaches. So quite clearly, like LSD, DRBD affects different users in different ways and perhaps you should stic

FreeBSD users have been doing it for 7 years with the default kernel. I guess that's one reason why it's more popular with companies that depend on HA, such as Bank of America. I love having ZFS as well, the combination is sooooo bad ass:-)

For those that run BRDB and want to try it, can read this [74.125.77.132].

It is interesting to compare with what VMS offered 25 years ago [wikipedia.org]:- VMS could have multiple nodes (can DRBD? It is not obvious from the web site.)- All VMS nodes have read and write access to the file systems- The distributed lock manager [wikipedia.org] helps with file locking in this case.- VMS has the concept of quorum [hp.com] to avoid the "split brain" syndrom mentioned on the web page.

Yes yes yes - but 99.9% of slashdot users have probably never seen VMS, never mind a VMS LAVC cluster, so they have no idea that even today their latest toys are still playing catch up. Hell, half of 'em probably weren't even born then.

It's a good thing the kernel supports modules, so that the 0.1% of users that use this feature can still have it supported without any performance or memory usage detriment to the other 99.9% of users.

I missed where its a module and not "To Be Included in Linux Kernel" as the title implied. If it's just a module that's fine by me. Just keep it as a module and don't compile it into the kernel. I do see benefit of including the source as official Linux versus the previous third-party status.

"Personally" - you got a lotta nerve representing yourself as having a valid opinion about what does and does not constitute a useful feature.

A closed mouth gathers no foot.

I'm sorry, I'm not allowed to have an opinion? (maybe I pissed you off by using bloat and linux in the same sentence?) Go back and read what I wrote. I did NOT say it was not a useful feature. I said the vast majo

I missed where its a module and not "To Be Included in Linux Kernel" as the title implied. If it's just a module that's fine by me. Just keep it as a module and don't compile it into the kernel. I do see benefit of including the source as official Linux versus the previous third-party status.

Do you even know how linux works and what is meant with the kernel tree? Just type lsmod to see your modules, and do man modprobe to see how modules are loaded and unloaded.

Obviously the distribution must compile the entire kernel, with all modules, detect your requirements, and then load automatically those pieces that are needed for you at startup.

Do you even know how linux works and what is meant with the kernel tree? Just type lsmod to see your modules, and do man modprobe to see how modules are loaded and unloaded.

Obviously the distribution must compile the entire kernel, with all modules, detect your requirements, and then load automatically those pieces that are needed for you at startup.

Yes I do. I am well aware that the kernel supports modules, and that most distros include that feature when they build they kernels. I am also aware that drivers can be included into the static kernel itself instead of as a standalone loadable module. You can compile a kernel without the module support, trimmed down to just what you need. I see this quite a bit in embedded systems, and sometimes higher security systems under the belief that disabling dynamic module loading improves security. I supposed

This isn't really something that the majority of Linux users need, or want compiled into their kernel.

So it's great that they don't need, nor they are forced to, have this kernel module in their kernel. And it is also great that, as everyone (including you) has access to the source code, it is possible to cherry pick what feature to have in the OS kernel. Isn't linux awesome?

Just cruise Slashdot for examples of people picking on MS for included features and support that few folks need. Particularly when they turn into vulnerabilities (doesn't help when those features are enabled by default). I blame MS for that, as I get to go disable all those features like remote dcom and posix support when I harden boxes. Or I simply use something like nlite to trim all the fluff away. Microsoft is getting better, mostly on the server side, of disabling or not installing all the features

There wis a local mid-sized company which recently migrated their workstations from Windows XP to Linux. Firefox, Thunderbird, OpenOffice...it did everything they needed to do, and it was free!

Productivity dropped sharply shortly after the migration. No prob, everybody thought, just a temporary result of the learning curve. Rolling out a standard backup image was a huge hassle because there were different brands and models of workstations.

Updates would break the entire operating system and the IT staff had to hire temps just to fix driver problems and roll the dice editing config files. Users were complaining about having to sit aside all day while their workstations were being "fixed". Users were becoming frustrated with not knowing how to do anything without getting "file permissions" errors, and some of them threatened to quit altogether after a training session showed them how to use the terminal to navigate to a word document and use sudo to open it, while the same action would have been only a double-click on Windows. It took 5 months before the computers were perfectly configured and everybody got the hang of using Linux, but it still didn't solve the problem of random OS lockups which caused a lot of lost data.

Why is Linux still locking up? Windows fixed that problem years ago with 2k/XP!

I've been using linux as my main OS for the past 6 to 7 years and in all this time I never experienced any linux lockup., not even back in the beginning where we couldn't do away with compiling software by hand (where the "you need to know how to program to use linux!" was born) and when the only way to make my DSL modem work was to run a weird, convoluted shell script through the command line. So, that "lockup" accusation is, at least, very odd, particularly in this day and age.

Moreover, that weird accusation of "file permissions errors" and the need to have IT staff hired with the sole purpose of "fixing drivers" and "edit config files" also sounds like bullshit to me, specially in today's world and even after the GP stated that their workstations worked with XP and win2k, a pair of OSes which are more problematic, less stable and with a less extensive hardware support than today's popular linux distributions.

And of course, let's not forget that the GP made a point in launching that long-winded anti-linux troll while intentionally keeping out fundamental details such as what linux distribution was supposed to be installed, not to mention that it was posted anonymously. To put it in other words, the GP wrote that post intending to attack the entire linux world, insinuating that that sort of problem affects each and all distros and not a specific one, and it did it so intending to be a troll.

So, it would only be seen as "-1 truth hurts" if you didn't read the post and you also considered a "your mother is a whore" type of post as "-1 truth hurts". It's not, it is meant to insult and it is perfectly void of any objective statement.

Why is Linux still locking up? Windows fixed that problem years ago with 2k/XP!

It isn't. In our mid/large company, we have hundreds of Linux workstations, and they've all been working for years without a single hitch, from day one. No permission problems, never had an update causing significant issues, don't even ALLOW users to get a command-line, etc. Vastly easier to debug when there is a problem, and has allowed the company to replace a large group of Windows experts with a small group of Linux experts, and the vastly improved productivity has allowed the company to significantly reduce the number of employees (or rather, just cease to replace them when there is turnover).

Just the other day I noticed the uptime on one of the Linux workstations was over a year at this point. No lockups. The few issues we've had with the systems have been directly traced to hardware problems.

If yours is a true story (which I seriously doubt) you should look at hiring at least one half-way decent Linux SysAdmin at a reasonable salary to fix the pathological issues with the installation which was likely done by minimum-wage idiots without a clue.

I think people think linux is "unreliable" because they don't attribute lockups to their binary video card drivers. I've been using Linux as my main OS at home since late 1992, and have run my home PC 24x7 since 2000, and can only remember one kernel panic in all that time - but then again, I've never run a binary kernel module. If you think it is normal to run 3rd party drivers, because you're used to that from the Windows word, and then you do so under Linux, and Linux fails, you're unlikely to attribute