Directory name lookup cache
DNLC is something that did create a lot of discussions on ZFS mailing list, and is one of that parts that ZFS did not changed, and (at least for me) a ghost. Let me try to explain…
ZFS dynamics are totally different from many parts of the other subsystems on Solaris. And i guess that “dynamics” have enforced the “filesystem” to infiltrate on many layers of the OS. The VM, ARC, the “Rampant of layer violation”, and etc. So far so good, and i’m not able to discuss this decisions. Actually, would be much more simple if all other things were inside ZFS too… ;-) but life is not that simple, and we have many layers of the OS that needs to handle other filesystems, trade-offs and etc.
These days i was trying to understand a behaviour and could not fully understand it in fact because 90% of the code involved were usual, but that was not answering the problem. Actually i do not know if the problem were especifically on that server, but was a fact that the server was (at least) involved and presenting different numbers from usual. A few times i did see the dnlc_nentries reaching dnlc_max_nentries. Looking at the code we can read this:
170 /* 171 * If dnlc_nentries hits dnlc_max_nentries (twice ncsize) 172 * then this means the dnlc_reduce_cache() taskq is failing to 173 * keep up. In this case we refuse to add new entries to the dnlc 174 * until the taskq catches up. 175 */
What i assume from the commentary above, is that the code that noramlly runs and keeps things working as usual was quiting its job. ;-) What i don’t think is a good thing, because:
1 – When it is running, things are good;
2 – If it is running and things are not good, it not running i don’t think things will be better;
Using the D script dnlcstat (DTraceToolkit), i could see a constant 30% hit, what is a low number and could explain a bad perception from the client. The solution was wait until the dnlc_reduce_cache did the job and put the queue on the right value (under ncsize), and the hit ratio was fine again. Problem solved.
If my conclusion is right, the kernel guys did know what they were doing, but the cost was right (minutes, hours). I don’t know the reason or more specifically the workload that generated that DoS on the dnlc algorithm, but ARC is much more resistent. Solaris performance tuning gives some insights (eg.: when we are creating too many files), but the solution seems not so simple.
Summary: DNLC was full, giving 30% hit ratio, and closed to new entries. With half the size (usual), it is much more efficient.