Google Analytics

Google Custom Search

"... an engineer who is not only competent at the analytics and technologies of engineering, but can bring value to clients, team well, design well, foster adoptions of new technologies, position for innovations, cope with accelerating change and mentor other engineers" -- CACM 2014/12

Once the corosync and sheepdog services are configured and running, sheepdog only needs one more command: format the cluster. I used the
Erasure Code Support mechanism. The trick here, is based upon
the directory (by default '/var/lib/sheepdog') as set in initialization scripts, is where the format command gets applied.

Following on from the previous article, this describes a corosync configuration for three appliances configured together in a 'triangle'. OSPF/BGP is running on each appliance. With this routing configuration, I am able to apply an ip address to the loopback interface, and make each of those addresses mutually reachable from each appliance.

I think most corosync examples make the assumption that all nodes are within the same segment. This then suggests a multicast solution. As I am using routing between each appliance, I need a unicast solution.

The following is an example configuration file for the second of three nodes / appliances. Notice the bind_addr is for the loopback address, and a complete list of all three nodes taking part in the quorum. There is a 'mcastport' listed, but because of 'transport: udpu', unicast is actually used from that port number.
Continue reading "Corosync in a Three-Some" »

Monday, November 6. 2017

In my Sheepdog cluster, I have three nodes, with each node having two 1TB SSDs dedicated to the use of a ZFS file system. Each node stripes the two drives together to gain some read performance, and then Sheepdog will apply an Eraser Code redundancy scheme across the three nodes to provide a 2:1 erasure coded tolerant set (aka in this case similar to RAID5), which should yield about 4TB of useful storage space.

Creating the ZFS file system is a two step process: create a simple zpool, then apply the file system. This example uses two partitions on the same drive to prove the concept, but in real use, two whole drives should be used.

Thursday, November 2. 2017

I have been looking at various distributed storage solutions, hoping to find something reliable in an open source style of solution. Some names I've encountered (open and closed source):

Ceph: by some accounts, seems to be resource heavy, but at the same time, appears to be well used in the industry

Open vStorage: could be a strong contender for me, but I have a bias against Java based applications.

Lustre: I've been watching this for quite some time, but the features didn't quite mesh with my desires

Zeta Systems: a mixture of proprietary and open solutions, which almost fits in with my perceptions, and uses ZFS as the underlying hardware format

SheepDog: I keep coming back to looking at this. With a version 1 release a little while ago, the developers indicate is satisfies their 'single point of nothing' criteria, which overlap with some of my own criteria. In addition, it appears to be resource light, horizontally scaleable, and integrates with the tools I am trying to integrate: lxc, kvm, and libvirt.

As Debian doesn't have a very recent package built, I build from scratch. Since my test environment is small, I use corosync rather than zookeeper. Here are my build statements for a package build. I will need to add to this to show the build statement as well as the requisite packages:

Sheepdog is Ready: distributed block storage is turning from experiment to production use. has performance test scenarios and background on durability, scalability, manageability, and availability (can be run with multipath scsi targets).

On the Sheepdog mailing list, a mechanism, other than sheepfs, a way to present a file system:

You can do, through qemu-nbd, formatting it and mounting it.
sheepdog -> qemu-nbd -> /dev/nbd{x} -> xfs/ext3/ext4/.. -> mount
modprobe nbd
qemu-nbd sheepdog://localhost:7000/my_volume -c /dev/nbd1
# Optionally you can do the rest on a different machine using nbd-client on this step
mkfs.xfs /dev/nbd1
mount /dev/nbd1 /path/to/mount

Tuesday, October 31. 2017

Due to various licensing compatibility issues, which are described at
What does it mean that ZFS is in Debian and
On ZFS on Debian, source-only packages are available for ZFS on Debian Linux. Binaries need to be 'self-built'. Here is my method for building those binaries as packages.

To start, add 'contrib' to /etc/apt/sources.list and run 'apt update'.

There are two dkms modules which need building: the ZFS kernel module, which depends upon the Solaris Porting Layer kernel module.

This process will need to be performed each time the kernel package gets updated or any of the related ZFS packages are updated. This process builds the kernel modules, and could be performed on a 'build machine', as various extra packages get installed to support the process:

Sunday, September 24. 2017

Debian has a BTRFS Wiki. One item there, which affected me, is that kernel 4.11 has issues and will cause corruption. I am now on kernel 4.12. I'm not sure if having duplicated metadata would have prevented some of the pain of recovery. To see if metadata is redundant:

Tuesday, August 8. 2017

With all the BTRFS bashing going on, even though it is recommended one cares about checksummed data and metadata, there wasn't an easy alternative. That has now been solved.

ZFS is now (ie, for some time now) available as a set of native (contrib) packages in Debian. I will need to give that a test now. [zfs-zed, zfsutils-linux, zfs-dkms, zfs-initramfs, zfsutils, zfs-dracut]. With the right set of packages and boot configuration, it is also possible to use zfs as a boot partition.

Monday, April 6. 2015

Have had couple instances where the user-interface (KDE) of my Linux workstation, which is based upon Debian Testing / Jessie, has become non-responsive. Yet, I was still able to SSH into the machine. I see systemd, some IRQ processes, and VirtualBox had high utilization. Both or all three times, I can't remember the count now, the issue occurred when debugging a program I've been writing which uses OpenGL. At the same time, I had a VirtualBox running with Windows 10 running. So there were many things running, any of which might cause issues. It was probably OpenGL related, but have not yet come up with a mechanism of proving this one way or another.

I am also running BTRFS on the machine. In looking general BTRFS and NFS configurations, I saw the mailing list article at:
BTRFS hangs - possibly NFS related?. In that article, a couple of troubleshooting commands are shown.
They represent sysrq flags. I will have to examine them if/when my issue re-asserts itself:

echo 1 > /proc/sys/kernel/sysrq
echo w > /proc/sysrq-trigger
dmesg

The results may show 'SysRq : Show Blocked State' entries. These will be places to further examine for issues.

In the same article, some other things to think about:

With the right tools CPU/load can be categorized into several areas, low- priority/niced, normal, kernel, IRQ, soft-IRQ, IO-wait, steal, guest,
although steal and guest are VM related (steal is CPU taken by the hypervisor or another guest if measured from within a guest, and thus not available to it,
guest is of course guests, when measured from the hypervisor) and will be zero if you're not running them,
and irq and soft-irq won't show much either in the normal case. And of course niced doesn't show either unless you're running something niced.

or simply use the alt-srq-w combo if you're on x86 and have it available, there's more about magic-srq in the kernel's Documentation/ sysrq.txt file)

If you don't have a tool that shows all that, one available tool that does is htop. It's a "better" top, ncurses/semi-gui-based so run it in a terminal window or text-login VT.

Of course you can see which threads are using all that CPU-time "load" that isn't, while you're at it.

Also check out iotop, to see what processes are actually doing IO and the total IO speed. Both these tools have manpages...

A work around for the original poster's problem was to use:

btrfs filesystem sync /mnt/btrfs

He even went so far as to put that into the crontab and ran once a minute.

Monday, April 30. 2012

Nexenta has a pretty good web interface on their SAN product, and when that isn't good enough, there is always their web console mode. But when even that isn't good enough, and you just have to see what is happening under the hood, there is something called expert mode.

Logging in as admin, provides some good commands when dealing with the various file shares. When I
changed into root mode, I've always wondered why it was so lacking. By reading some one else's site,
I now know why. Nexenta has disabled most of the root stuff, and you can only access it be
going a secret route:

option expert_mode=1
!bash

When finished use exit and then run their resync command to make everything right.

Disclaimer: This site may include market analysis. All ideas, opinions, and/or
forecasts, expressed or implied herein, are for informational purposes only and should not
be construed as a recommendation to invest, trade, and/or speculate in the markets. Any
investments, trades, and/or speculations made in light of the ideas, opinions, and/or
forecasts, expressed or implied herein, are committed at your own risk, financial or
otherwise.