But I would like to mention to the mailing list, that I just booked
some

succes.
And because nobody is around to tell the wonderfull news,
I would like to share my hapiness here ;)
My setup:
2 NAS servers and 1 Supermicro bladeserver with 5 blades.
The NAS servers are running Openfiler 2.3
both NAS servers have:
1 Transcend IDE 4Gbyte flashcard (on the ide port on the mainboard).
3 x Transcend 4Gbyte usb sticks
8 SATA disks.
The IDE flashcard is setup in a raid-1 mirroring (md0) with one USB
stick providing the root FS voor openfiler
The other 2 USB sticks have 5 partitions: 4 x 500 MB and 1 x 2GB.
those are mirrored with raid-1 together. (md5 until md8 are the 500mb
partions, and md9 is the 2Gb partition).
Then the 8 harddisks are also tied together per 2 as mirroring raid1
(md1 until md4).
Then I used DRBD (8.2.7) to mirror the 4 raid-1's of the disks (md1
until md4) and the 2GB mirror (md9)

over the network to the other NAS server. (drbd1, drbd2, drbd3 and
drbd4)

The 500mb raid-1's are used to store metadata of the 4 disk raid-1's.
The 2gb drbd (drbd0) has internal metadata.
The 2gb drbd (drbd0) is mounted as ext3 on one only server and is used
to store all kinds
of openfiler information that is needed on both nas servers,

And heartbeat makes sure that one NAS server is running all the
software, and with any problems,
it can switch over very easily.

The drbd1 til 4 are setup as a LVM PV, and bound together in one big
VG.

From that VG, I created a 5 x 5GB LV to be used as root device for
blade1 til blade5

These LV's are stripped accross 2 PV for speed (altough that's still
my

only bottleneck at the moment, but more later about this...).
These LV's are setup as iSCSI

I also created one big LV of around 600GB, which can be mounted
through NFS.

Then a few more LV's are created (around 10GB, also iscsi) for every
VM

I want.
For every iSCSI LV I create a separate target.
The Supermicro blades can boot from an iscsi device.
The exact scsi device is given through a DHCP option.
I only setup a initiatior name in the iscsi bios of the blade.
On the blade LV's I installed CentOS 5.3 (latest updates).
But with a few modifications.
I changes a few things in the initrd, to bound eth0 to br0 during the
linux boot,
and before linux is taking over the iscsi from the bios, because

when you have a linux root through iscsi, and try to attach eth0 to
br0,

you loose networkconnectivity for a moment, and could crash the linux,
because everything it uses, comes from the network (iscsi root).

I also added a little script to the initrd to call iscsiadm with a
fixed

iscsi
target, because unfortunately iscsiadm can't read the iscsi settings
from dhcp
or the supermicro firmware.
When the blades are booted, they all join one redhat cluster with 3
nodes to be quorum.
Because I have 5 blades, two can fail before everything stops working.
Then I compiled the following software my own, because the ones in the
centos repo,
and the testing repo didn't function correctly:
libvirt 0.7.0 (./configure --prefix=/usr)
kvm-88 (./configure --prefix=/usr --disable-xen)

The /usr/share/cluster/vm.sh from the default centos repo is still
based

on xen.
I downloaded the latest from
https://bugzilla.redhat.com/show_bug.cgi?id=412911
but it appears that that one is not working correctly either.
I made some changes myself.
And now it's working all together very nicely
I just ran a VM on blade1, and while this VM was running bonnie++ on a
NFS mount to the NAS server,
I live-migrated it about 10 times to blade2 and back.
During this bonnie++ run and live migrations, I pinged the device.

And where the normal ping times are around 20-35 ms (I pinged
through a

VPN line from my home to the data center).

I only saw one or 2 pings just around the end of the live migration
that

were around 40-60ms.
but no drops, and no errors in bonnie++.

I will write some more information about the complete setup, and
post it

somewhere on my blog or someting,
But I just wanted to let everybody know, that it can be done ;)
If you have any questions, let me know.
The only 'problem' I still have is the speed to and from the disks.

When I update any settings on the bladeserver. I always do this on
blade1.
Then shut it down, On the NAS server I copy the content of the iscsi
LV

to an image file on the ext3 LV.
Then I can power up blade1, wait until it reenters the cluster,

and then one by one shut down the next blade, On the NAS copy the
image

from the ext3 LV to the blade LV.
And start the blade again.
I use the drbd1 til drbd4 as 4 PV's for a VG.
The speed (hdparm -t on the NAS) of all PV's are around 75 MB/sec
(except for one which is 45MB/sec)
The blade LV (/dev/vg0/blade1 for example) is striped over 2 PV's.
The Speed (hdparm -t) of /dev/vg0/blade1 is 122MB/sec.
The ext3 LV (/dev/vg0/data0) is striped over 4 PV's.
The Speed (hdparm -t) of /dev/vg0/data0 is 227 MB/sec.
But when copying from the blade LV to the ext3 LV:
dd if=/dev/vg0/blade1 of=/mnt/vg0/data0/vm/image/blade_v2.7.img
it takes about 70 seconds, which is about 75MB/sec.
but when copying back:
dd if=/mnt/vg0/data0/vm/image/blade_v2.7.img of=/dev/vg0/blade1
It takes about 390 seconds, which is about 13MB/sec

I think it has something to do with the striped over 4 PV's of the
LVM.

Content-Type: text/plain; charset=utf-8
----- "Wendell Dingus" <wendell bisonline com> wrote:
| Well, here's the entire list of blocks it ignored and the entire
| message section.
| Perhaps I'm just overlooking it but I'm not seeing anything in the
| messages
| that appears to be a block number. Maybe 1633350398 but if so it is
| not a match.
Your assumption is correct. The block number was 1633350398, which
is labeled "bh = " for some reason.

| Anyway, since you didn't specifically say a new/fixed version of
fsck

| was
| imminent and that it would likely fix this we began plan B today. We
Yesterday I pushed a newer gfs_fsck and fsck.gfs2 to their appropriate
git source repositories. So you can build that version from source if
you need it right away. But it sounds like it wouldn't have helped
your problem anyway. What would really be nice is if there is a way
to recreate the problem in our lab. In theory, this error could be
caused by a hardware problem too.
| plugged
| in another drive, placed a GFS2 filesystem on it and am actively
| copying files

| off to it now. Fingers crossed that nothing will hit a disk block
that

=> >> > I was designing a 2 node cluster and I was going to use 2
servers DELL
=> >> > PowerEdge 1950. I was going to buy a DRAC card to use for
fencing but
=> >> > running several commands in the servers I have noticed that
when I run