While setting up my gigabit ethernet network, during simulataneous Windows 7 file transfers to the HTPC+B setup, I've received a number of disconnects, virtually every time while transferring my files. At the point, I not only lost connectivity with the HTPC+B but also my web access most of the time. Here are the culprits and here's how we went about resolving them:

WINDOWS:
An error occurred while reconnecting Y: to \\192.168.0.14\HTPCFirstEntertainment Microsoft Windows Network: The local device name is already in use. This connection has not been restored.

However, the rather surprising and unexpected effect of this was that the network speed slowed down to roughly 1MB/s at most (From it's 1GB/s max theoretical speed of 125MB/s). (Interestingly, we will see why that has happened further down but it is not because of turning off Checksum Offload.)

So I've reverted everything back to the way it was in the two Windows 7 workstations to see if the speed will revert back to the quicker speeds earlier but it remained at 1MB/s. One interesting thing to note here is that the disconnects only happen when two or more workstations are transferring files. But not a single one.

At this point I checked the errors / messages in /var/log/samba/*.logs to see if resolving these will bring joy to this issue.

The first error we notice is from the samba logs is:

getpeername failed. Error was Transport endpoint is not connected

For this error, accordingly, we check what global settings are available and / or set to:

After tinkering around with the SMB config file, nothing appeared to change the windows behaviour above. Looking a bit deeper and into the network itself, I notice the speed was renagotiated to 10Mbits/s. (This explains the horrible download and upload times):

Any alternate CAT NET cable plugged into the system and then directly to the gigabit router results in the same thing: 10 Mbit, half duplex, link ok.

So this is leading us to check the drivers itself (before we actually go and buy a new NIC card). For this we'll check if there isn't a latest kernel version available that can be used here, potentially including an updated driver. (Let's try the easy stuff first.)

yum update kernel

This should show you if there is any new kernel available. If there is, try the upgrade:

NOTE: If you are on a mission critical host, you may wish to check if the new kernel will break any other functionality first.

After all, we can see from kernel.org that there's been quite a few changes to the r8169 driver so one of these might be addressing the issue we are facing. This isn't always the case however, even the updated surrounding code not directly related to r8169 could be the culprit:

However, trying to set to 100Mb/s results in 100Mb/s, which is now better. If it's able to set to 100Mb/s, this suggests the cable again because at least the NIC is trying. Well, at least some of the time we can set it to 100Mb/s:

Lights came on now. But ethtool still showed 10MBits/s. Lastly, we try a simple cold reboot (Allow voltage to drain by unplugging the power cord after system is shut down or press the on/off switch at the back of the power supply) When the system came back, we get 100Mbits/s:

And after plugging in another cable, this time we get 1000MBits/s and our transfers from the Windows computers above are in the 60MB/s range (each).

Now the question is, why did the NIC keep the settings on the card so long, even after reboots, it affected both the r8169 and r8168 driver. One possibility, as hinted here, might be the retention of gain settings. Further reading is available about such settings being retained.

Sep 12 2012
So we now set the Jumbo Frames to 8K to improve speed performance and turn all Checksum flags (Both on client and server) back to on though this part, leaving Checksum off should be fine as well: