Archive for March, 2019

On a server today I've found to have found a number of NFS mounts mounted through /etc/fstab file definitions that were hanging;

nfs-server:~# df -hT
…

command kept hanging as well as any attempt to access the mounted NFS directory was not possible.
The server with the hanged Network File System is running SLES (SuSE Enterprise Linux 12 SP3) a short investigation in the kernel logs (dmesg) as well as /var/log/messages reveales following errors:

It took me a short while to investigate and check the NetApp remote NFS storage filesystem and investigate the Virtual Machine that is running on top of OpenXen Hypervisor system.
The NFS storage permissions of the exported file permissions were checked and they were in a good shape, also a reexport of the NFS mount share was re-exported and on the Linux
mount host the following commands ran to remount the hanged Filesystems:

that fixed one of the hanged mount, but as I didn't wanted to manually remount each of the NFS FS-es, I've remounted them all with:

nfs-server:~# mount -a -t nfs

This solved it but, the fix seemed unpermanent as in a time while the issue started reoccuring and I've spend some time
in further investigation on the weird NFS hanging problem has led me to the following blog post where the same problem was described and it was pointed the root cause of it lays
in parameter for MTU which seems to be quite high MTU 9000 and this over the years has prooven to cause problems with NFS especially due to network router (switches) configurations
which seem to have a filters for MTU and are passing only packets with low MTU levels and using rsize / wzise custom mount NFS values in /etc/fstab could lead to this strange NFS hangs.

Below is a list of Maximum Transmission Unit (MTU) for Media Transport excerpt taken from wikipedia as of time of writting this article.

In my further research on the issue I've come across this very interesting article which explains a lot on "Large Internet" and Internet Performance

I've used tracepath command which is doing basicly the same as traceroute but could be run without root user and discovers hops (network routers) and shows MTU between path -> destionation.

Many network routers keeps the MTU to low as 1500 also because a higher values causes IP packet fragmentation when using NFS over UDP where IP packet fragmentation and packet
reassembly requires significant amount of CPU at both ends of the network connection.
Packet fragmentation also exposes network traffic to greater unreliability, since a complete RPC request must be retransmitted if a UDP packet fragment is dropped for any reason.
Any increase of RPC retransmissions, along with the possibility of increased timeouts, are the single worst impediment to performance for NFS over UDP.
This and many more is very well explained in Optimizing NFS Performance page (which is a must reading) for any sys admin that plans to use NFS frequently.

Even though lowering MTU (Maximum Transmission Union) value does solved my problem at some cases especially in a modern local LANs with Jumbo Frames, allowing and increasing the MTU to 9000 bytes
might be a good idea as this will increase the amount of packet size.and will raise network performance, however as always on distant networks with many router hops keeping MTU value as low as 1492 / 5000 is always a good idea.