NFS, TCP, retrans and timeo
I think there is a misunderstanding regarding the protocols and NFS options while using it over TCP.
As the NFS is an old protocol, and used to UDP, when the implementations did change the default transport to TCP, the features from one were mixed to the other in some interpretations.
Well, first things first…
NFS is a protocol independent from UDP and TCP. The retrans and timeo are NFS’ options, so they will impact in the NFS protocol. The timeo is in tenths of a second, and in the implementations using UDP were something like 0.6 or 0.7 seconds as the inital timeout (the value is incremented until a threshold and backoff). Using UDP the important point is that any problem will result in whole “retransmit” of the datagram. So, the whole RPC request retransmission…
Well, if we use a more robust protocol like TCP, we will have a better delivery service, but the NFS protocol will continue to have a time out for its requests. Using TCP, loosing a IP packet will not result in whole RPC retransmission, because TCP is smart and can handle that isolated failure. This makes the NFS protocol much more efficient because it does not need to handle all “little” problems on the top level (but we need to use netstat or other tool to see our network behaviour). And so, it’s better to wait for the TCP timeout.
OTOH, if the timeo option for the NFS has a low value, the NFS will interfere and resend the whole RPC call without wait for the TCP timeout (yes, TCP has a timeout, and has nothing to do with the fstab options). So, the default timeo for NFS/TCP shares should be 600 (60 seconds), because that should be sufficient for the TCP timeout.
Lets say we mount a NFS share using a timeo=30 (three seconds), and ifdown the IP alias from our NFS server. We can watch the RPC retransmissions in a GNU/Linux client using a simple script like:
# while true; do nfsstat -rc; sleep 5; done Client rpc stats: calls retrans authrefrsh 798523772 1805 0 Client rpc stats: calls retrans authrefrsh 798523901 1806 0 Client rpc stats: calls retrans authrefrsh 798524029 1807 0 Client rpc stats: calls retrans authrefrsh 798524131 1808 0 Client rpc stats: calls retrans authrefrsh 798524467 1809 0 Client rpc stats: calls retrans authrefrsh 798524578 1810 0 Client rpc stats: calls retrans authrefrsh 798525947 1811 0 Client rpc stats: calls retrans authrefrsh 798526089 1812 0 ...
As you can see the NFS client is doing one retransmission each 3 seconds, exactly as we did ask ;-)
I did an ifdown on my interface to make this easier, but you must know that after you send a request for the NFS server, you must count the processing of your request by the NFS server, the round-trip, and the processing of your client machine. That can take much more than you imagine… you can create a real bad loop and with a big rsize and wsize values, wait for the evil.
Interesting to see that the manual pages for nfs says that retrans is not used for TCP mounts. I think that is related to the handling of the failures the TCP does. But, as a NFS option, it will say how many times (retrans of whole RPC) the client will try to contact the NFS server without success (timeo), and will give up (soft, giving error for the application), or still try (hard, logging: NFS Not responding, still trying .
If we mount an NFS share using timeo=600,retrans=2 we will have something like this:
# while true; do nfsstat -rc; sleep 30; done Client rpc stats: calls retrans authrefrsh 798602801 1835 0 Client rpc stats: calls retrans authrefrsh 798604595 1836 0 Client rpc stats: calls retrans authrefrsh 798607557 1836 0 Client rpc stats: calls retrans authrefrsh 798608985 1837 0 Client rpc stats: calls retrans authrefrsh 798610877 1837 0 Client rpc stats: calls retrans authrefrsh 798612264 1838 0 Client rpc stats: calls retrans authrefrsh 798614787 1838 0 Client rpc stats: calls retrans authrefrsh 798615959 1838 0
After the two retransmissions we will see a message like:
nfs: server tired not responding, still trying
If you are using a “hard” NFS mount option, or:
nfs: server tired not responding, timed out nfs: server tired not responding, timed out
One for each retransmission time out, if you are using a “soft” NFS mount option (and the application trying to access the NFS share will receive an error). Ex.:
touch: cannot touch `/home/leal/test': Input/output error
So, what “prints” NFS Not Responding, still trying is the timeo and retrans NFS’ options. As a client, you tell how tolerant you are, but think twice before you go too exigent…
peace
Sir,
I am searching this topic from long time. It is interesting but I am unable to understand. Will you please explain ‘timeo,retran, retry’ with simple example?
Venkatesh
> NFS is a protocol independent from UDP and TCP.
I disagree. NFSv3 rode of top of UDP and RPC and NFSv4 rides on top of TCP.
LarsN wrote:
> I disagree. NFSv3 rode of top of UDP and RPC and NFSv4 rides on top of TCP.
I disagree. NFSv3 can be used on top of either UDP or TCP (specified with proto=udp or proto=tcp in the mount command or /etc/fstab). TCP is actually default, at least on Linux.
Good guide on tuning timeo or retrans (or using tcp so you don’t have to do either).
https://docstore.mik.ua/orelly/networking_2ndEd/nfs/ch18_01.htm
No examples… *sniff*
And for visually impaired ppls like me the font size is way to small…
But I think its useful for pros and not for noobs ;)