Description of problem:
UFO PUT throughput for 1 MB is 1/4 XFS throughput, even when there is no network in use and aggregate CPU utilization is 50% on average, and there is no replication.
Version-Release number of selected component (if applicable):
RHS 2.0 RC1 (20120530)
How reproducible:
Just configure a single-brick with 12-drive 256-KB-stripe RAID6 LUN, install RC1, and run parallel curl threads within this one server. There is no network interaction.
Steps to Reproduce:
1. configure UFO on a single server with local 1-brick volume, using 12 swift proxy-server and object-server worker threads. Need 2 Westmere sockets, 15 GB mem, and megaraid.
2. run workload below on it, consisting of 128 threads, each thread writes 256 1-MB objects to its container.
3. with "top" utility, press "H" to see per-thread CPU consumption.
Actual results:
Throughput = 125 files/sec = 125 MB/s, hottest thread is in glusterfs. well above 90% consistently. So we have a CPU bottleneck. The storage is not a bottleneck, XFS can deliver 525 files/sec, 4x throughput for same workload using files instead of objects. Writing same files to gluster mountpoint delivers 400 MB/s, about 80% of XFS throughput. So there must be something that UFO is doing differently that causes glusterfs thread to work harder (xattrs?).
Expected results:
We want to see an I/O bottleneck. We are well below 10-GbE network speeds.
Additional info:
VERBOSE=1 ./parallel_curl.py PUT 128 256 1024 localhost localhost
benchmark available at
http://perf1.lab.bos.redhat.com/bengland/laptop/matte/ufo/parallel_curl.py
doing PUT 128 threads 256 objects/thread 1024 KB/object
clients: ['localhost']
servers: ['localhost']
authentication tokens done after 0.00 sec
upload files ready after 0.10 sec
container put finished after 42.67 sec
started to create curl object threads after 42.67 sec
curl threads started after 43.71 sec
curl threads stopped after 304.81 sec
thread startup time = 0.40%
elapsed time = 262.15 sec
throughput = 125.00 objs/sec
transfer rate = 125.00 MB/s
Will add more detail on what glusterfs thread is doing later.

As Avati mentioned, we must try the testing with client-io-threads on. If the performance improves then it means that network transactions made by GlusterFS with GlusterFSd was the bottleneck which the io-threads will solve the problem else we can imply that fuse is the bottleneck.

Created attachment 590497[details]
strace of hot threads now that performance.client-io-threads is enabled
Now I think I know why UFO is slow. Now that I enabled performance.client-io-threads, behavior is different. When I run
./parallel_curl.py PUT 128 256 1024 gprfs013 gprfs013-10ge
I see 2 hot glusterfs threads instead of 1. The 2nd thread just does system calls to read /proc/pid/status, and this has already been discussed elsewhere.
The second thread does mostly readv and writev, see attachment. It should be idle but it is working very hard because the I/O sizes are all small, average size was 115 bytes for the 1000 message sample. Does this mean UFO is not aggregating data transfers to filesystem? Can we get it to do bigger reads and writes? We know Gluster doesn't do well at tiny reads and writes.
Also, this thread is doing the same readv sequence every time:
readv(27, [{"\200\0\0$", 4}], 1) = 4
readv(27, [{"\0$\235r\0\0\0\1", 8}], 1) = 8
readv(27, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", N}], 1) = N
If we could just collapse the 4- and 8-byte reads together, this would improve efficiency, but I think the big problem is that the average transfer size from UFO to native FUSE client is too small.

Correction to my previous statement -- there might not be small I/O sizes coming from UFO server. When I looked at object-server threads they were doing large enough writes (64-KB), but there were a ton of other system calls happening as well, this is what is threw me off. Looks like I'll be spending some quality time with strace ;-)

Based on my current understanding of the code, there are still a lot of inefficiencies in the UFO code in terms of the system calls made. This is due in part to both the OpenStack Swift code and the UFO glue code that we wrote. It would be worth our time to make sure we can account for all the system calls being made by our UFO stack outside of GlusterFS before trying to nail down inefficiencies in the GlusterFS layer.

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
For information on the advisory, and where to find the updated files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note

You need to
log in
before you can comment on or make changes to this bug.