This web page is no longer maintained. Information presented here exists only to avoid breaking historical links.The Project stays maintained, and lives on: see the Linux-HA Reference Documentation.To get rid of this notice, you may want to browse the old wiki instead.

General Issues

What is DRBD, to begin with?

DRBD, developed by PhilippReisner and LarsEllenberg, is aDistributedReplicatedBlockDevice
for the Linux operating system. It allows to have a realtime mirror of your local block devices on a remote machine. In conjunction with Heartbeat it allows to create HA (high availability) Linux clusters.

Which license conditions apply to DRBD?

DRBD is released under the GNU GENERAL PUBLIC LICENSE Version 2, Juni 1991 (GPL. Thus, within the conditions of this license it can be freely distributed and modified.)

Can I mount the secondary at least readonly?

Short answer: No!. But see also next question/answer.

DRBD would not care, but most likely your filesystem will be confused because it will not be aware about changes in the underlying device. This in general means that it cannot work, not with ext2, ext3, reiserFS, JFS or XFS.

Thus, if you want to mount the secondary, set the secondary as the primary first. Both devices mounted at the same time does not work. Actually, DRBD v8 does support two Primaries, see the next answer. If you need access to the data from both nodes, and an arbitrary number of other clients, consider using HaNFS.

Why does DRBD not allow concurrent access from all nodes? I'd like to use it with GFS/OCFS2...

Actually, DRBD version 8.0.x and later support this.

If you need not just a mirrored, but a shared filesystem, use OCFS2 or GFS2 for example. But these are much slower, and typically expect write access on all nodes in question. If we have more than one node concurrently modifying distributed devices, we have some "interessting" problems to decide which part of the device is up-to-date on which node, and what blocks need to be resynchronized in which direction. These problems have been solved. You need to net { allow-two-primaries; } to activate this mode. But the handling of DRBD in "cluster fs mode" is still more complex and cumbersome than "classical" one-node-at-a-time access.

An other option would be to have only one node active, export that device via iSCSI, then run OCFS2 on iSCSI.

Can DRBD use two devices of different size?

Generally yes, but there are some issues to consider:

Locally DRBD uses the configured disk-size, which has to be <= physical, and if not given its is set to the physical size. On connect the device size will be set to the minimum of both nodes. And here you could run into problems, if you do things without common sense: if you first use drbd on one node only, without disk-size configured properly, and later connect a node with smaller device size, then the drbd device size shrinks at runtime. you should find a message about Your size hint is bogus, please change to <some value> in the syslog in that case. This will confuse the file system on top of your device. Thus, if your device sizes differ, set the size to be used for DRBD explicitely. DRBD-0.7 stores information about the peers device size in its local meta data, therefore usage of disk-size is deprecated (and is disallowed in the configuration file).

Can XFS be used with DRBD?

XFS uses dynamic block size, thus DRBD 0.7 or later is needed.

How do the "local machine" and the "remote machine" need to be connected?

Can I put one machine in Los Angeles and the other machine in New York, connected only by a VPN link over the Internet? Or do they both need to be connected to the same local Ethernet network?

When I try to load the drbd module, I am gettin the following error: compiled for kernel version ''some version'' while this kernel is ''some other version''

The settings for your actual kernel and the .config for the kernel source against which drbd was build do not match. On SuSE Linux you can get the right config with the following commands: cd /usr/src/linux/ && make cloneconfig && make dep Ususally, you do not have to recompile your kernel, just drbd. But read INSTALL in the drbd tgz, to learn how to do it the proper way.

Can I use DRBD with LVM?

Yes. With LVM2, snapshots are writeable. So you can replay the journal on the snapshot. But see also

What about Xen, DRBD and iSCSI?

Can I use DRBD with OpenVZ?

Operation Issues

drbdadm create-md fails with "Operation refused." - what can I do?

the actual error message looks like

Found $some filesystem which uses $somuch kB
current configuration leaves usable $less kB
Device size would be truncated, which
would corrupt data and result in
'access beyond end of device' errors.
You need to either
* use external meta data (recommended)
* shrink that filesystem first
* zero out the device (destroy the filesystem)
Operation refused.

which means

you created your filesystem before you created your DRBD resource, or

you created your filesystem on your backing device, rather than your DRBD,

neither of which is a problem by itself, except - as the error message tries to hint - you need to enlarge the device (e.g. lvextend), shrink the filesystem (e.g. resize2fs), or place the DRBD metadata somewhere else (external meta data).

DRBD tries to detect an existing use of the block device in question. E.g. if it detects an existing file system that uses all the available space (as is default for most filesystems), and you try to use DRBD with internal meta data, there is no room for the internal meata data - creating that would corrupt the last few MiB of the existing file system.

If re-creating the filesystem on the DRBD is an option, one way to "zero out the device (destroy the filesystem)", and then recreate it on the DRBD is

IMD is "internal meta data". Once created, it is fixed size. With drbd 0.7 it was fixed 128MB. With drbd 8.0 it is approximately [total storage of real device]/4/8/512/2 rounded up, +36k, rounded up to the next 4k.

If you did mkfs /real/device, then later mount through DRBD, the file system either recognized size mismatch in superblock vs. actual block device size on the spot and refuse to mount (xfs does this, iirc).

Or the file system mounts alright, because it skips the check for block device size (ext3, at least certain version of it, aparently do this; it is ok for a file system to assume that its superblock contains valid data) and then thinks it could use the now not available space which is occupied by IMD.

There are various ways to find out what your file system thinks about the usable space it occupies. For ext3, you can find out with e.g.

As long as the file system does not want to use that area, it won't notice. If the file system eventually decides to use that area, whops, surprise, it gets an access beyond end of device error. When the file system will start using that area is nearly impossible to pretict. So it may appear to work fine for month, and then suddenly break again and again.

This is not a problem with drbd. It is a problem with using drbd incorrectly.

Why is Synchronization (SyncingAll) so slow?

Outdated, applies to drbd versions prior drbd-0.6.4 only For historical reasons replicate used to work backwards. Most physical devices do have a pretty slow thoughput when writing data backwards.

How can I speed up the Synchronization performance?

double check the value of sync-max in the net {} section (drbd-0.6) resp. rate in the syncer {} section (drbd-0.7). Keep in mind that the default value is very low, and the default unit is kByte/sec!

if you run on top of some local RAID, make sure it is not reconstructing at the same time

check whether DMA is enabled

How can I speed up write throughput?

First you need to find the bottleneck. This can be your local disk, the network, the remote disk, latency caused by excessive seeks, or the summed up latency of those components.

You may want to play with the values of protocol and sndbuf-size. If your NIC supports it, you may want to enable "jumbo frames" (up the value of the MTU). If nothing helps, ask the list for known good and performant setups...

Why is my "load average" that high?

Load average is defined as average number of processes in the runqueue during a given interval. A process is in the run queue, if it is

not waiting for external events (e.g. select on some fd)

not waiting on its own (not called "wait" explicitly)

not stopped

Note that all processes waiting for disk io are counted as runable! Therefore, if a lot of processes wait for disk io, the "load average" goes straight up, though the system actually may be almost idle cpu-wise ... E.g. crash your nfs server, and start 100 ls /path/to/non-cached/dir/on/nfs/mount-point on a client... you get a "load average" of 100+ for as long as the nfs timeout, which might be weeks ... though the cpu does nothing. Verify your system load by other means, e.g. vmstat, sysstat/sar. This will give you an idea of the bottleneck of your system. Some ideas are using multiple disks (not just partitions!) or even a RAID with 10.000rpm SCSI disks and probably even a Gigabit Ethernet. Even on a Fast Ethernet device you will rarely see more then 6 MByte per second. (100 MBit/s is at most 12.5 MByte/s minus protocol overhead and latency etc.).

What is warning: ''Return code 255 from /etc/ha.d/resource.d/datadisk'' telling me when using the datadisk script with heartbeat?

DRBD-0.6 only
Exit code 255 is most likely from a script generated die, which include a verbose error message. Capture the output of that script. this is the debugfile directive in your ha.cf, iirc. If that does not help, do it by hand, and see what error message it gives. datadisk says something like cannot promote to primary, sychronization running or fsck failed or ...

When the node goes from secondary to primary the drbd device will not be mounted on the primary. Manually mounting works.

Feature ...
DRBD does not automaticaly mount the partition. The script datadisk (or drbddisk since 0.7) is made for that purpose. It is intended to be called by hartbeat.

See drbdadm suspend-io/resume-io. Also temporarily set implicitly by fencing resource-and-stonith.

The next three characters show details of sync-after dependencies. They all say '-' if currently unset. If there is a resync running, but you have serialized resync of your devices (because they share some resources (live on the same "spindle"), or because you want some more important ones to be resynced first), there are certain ways to suspend this resync.

a: implicitly paused because of sync-after dependency on this node

p: implicitly paused because of sync-after dependency on the peer node

u: explicitly suspended by the user, see drbdadm pause-sync/resume-sync

cs

connection state

Unconfigured

Device waits for configuration.

StandAlone

Not trying to connect to peer, IO requests are only passed on locally.

Unconnected

Transitory state, while bind() blocks.

WFConnection

Device waits for configuration of other side.

WFReportParams

Transitory state, while waiting for first packet on a new TCP connection.

Connected

Everything is fine.

Timeout, BrokenPipe, NetworkFailure

Transitory states when connection was lost.

DRBD-0.6 specific

SyncingAll

All blocks of the primary node are being copied to the secondary node.

SyncingQuick

The secondary is updated, by copying the blocks which were updated since the now secondary node has left the cluster.

SyncPaused

Sync of this device has paused while higher priority (lower sync-group value) device is resyncing.