Created attachment 602973[details]
extracts from log files, cli output, statedumps
Description of problem:
Client doesn't reconnect after server comes back online (but logs and volume status say it does). Requires manual remount.
We have one client and two servers. They are all running Debian Squeeze and we compiled glusterfs-3.3.0 from source on each one of them.
Version-Release number of selected component (if applicable):
Glusterfs-3.3.0
How reproducible:
100% reproducible in our lab setup
Steps to Reproduce:
1. Run ls -lR in mounted gluster folder on client to start directory listing
2. Pull network cable from server2 for one minute. Directory listing waits for gluster timeout and resumes with only one server.
3. Reconnect network cable to server2.
4. Create a new file in mounted gluster folder on client. Run ls -lR in mounted gluster folder on client and in the exported folder on the servers.
Actual results:
The new file only shows up on server1. glusterfsd process uses ~10% CPU time on server1 but ~0% on server2. Redundancy lost.
Expected results:
Client should properly connect to server2 when it comes back online. The new file should show up on both servers. glusterfsd process should be using cpu time on both servers. Redundancy should be re-aquired.
Additional info:
Despite not connecting properly, the client log states that the client connects to the reconnected server: "Connected to 10.128.196.182:24010, attached to remote volume '/data/exp'.". Both servers show all bricks connected when running gluster volume status or gluster volume info.
Same problem occurs when I unplug server2 during a dbench on gluster mount. Same problem occurs whichever of server1 and server2 I unplug.
When the client/servers are in this state I can manually unmount and mount the filesystem again (umount /storage/asd and then mount -a) to get a proper connection to both servers. If I instead unplug the remaining server from the network, I get "Transport endpoint is not connected" when trying to use the mounted gluster folder.
The client and the servers are virtual machines in VMWare vSphere. I unplug the network by de-selecting "Connected" on the settings for Network for the virtual machine.
Attaching file:
gluster-connect-bug-2012-08-08.txt with extracts from log files, gluster commands output, and volume statedumps from the servers.

The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.
If there has been no update before 9 December 2014, this bug will get automatocally closed.

Note

You need to
log in
before you can comment on or make changes to this bug.