Do you need L2ARC redundancy? I do…
Hello there…
I think you agree that the storage’s problem is the READ requests, synchronous by nature. And, as i said many times before, i think the solution for all problems (the answer for all questions ;-) is cache. Many levels, many flavors.
I did read many times about the recommended redundancy on the ZFS slog devices. In the past, earlier days of ZFS, we had a serious availability problem if we lose a slog device. So, mirror was the way to go.
But thinking about the ZIL concept, we need two failures in a row to make sense to have a mirrored slog device. And we will lose many IOPS doing so…
On the other hand, the MTTR for a slog device is the better one, in comparison with a regular vdev or a L2ARC.
Everything will be fine at the moment you replace the slog device (eg.: SSD).
And the L2ARC? Here we need time… and believe me, can be a long time.
We do configure a 100GB SSD device, delivering a lot of IOPS, very good latencies… and crash! We lose it!
Do you think the applications will be happy with the SATA latencies? We will have a performance or an availability problem? Or we will not have problems at all?
Well, as I did say at the beginning of this post, no one thinks that a failure of a warmed L2ARC device is a big deal. I would like to agree, but I don’t. And as I really like the ZFS architeture, I would guess the vdev concept for redundancy should be independent of the physical vdev. So, we could mirror the L2ARC… but no, we can’t.
So, I can understand that I’m the only one that do think about mirror a cache device, but the fact that we can not create a mirror (logical vdev) from a physical vdev, seems like a ZFS bug.
peace
Hi Marcelo,
great post, and a sharp analysis!
I agree that L2ARCs should be made mirrorable to avoid sharp declines in performance upon loss.
OTOH, what happens if you use 2 L2ARC devices, non-mirrored? Statistically speaking, cache data will end up on both devices, equally distributed. Now if you lose on of them, the performance degradation will be 50% of the case of when you only had 1 device. 33% for 3 L2ARCs and so on.
Given that read-biased SSDs are cheaper than write biased ones (you can get away with MLC, write performance isn’t important etc.), this may be an acceptable alternative. In extreme cases, for large servers, you could have 10 L2ARCs and then you’d hardly notice the 10% performance loss when one of them goes out.
Cheers,
Constantin
+1
Agree fully. However, your note does also recognise the fact that you’re talking about a warmed l2arc. The same problem will hit the application at startup when the arc is cold (until persistent l2arc lands).
Also, if you stripe across multiple l2arc devices, you spread load across them but you also spread failure impact over them. If one of four l2arc devices disappears, you’ll only have a cache miss rate / performance impact 1/4 of the drop you’re concerned about, until steady state is reached again.
Thus, 4 cheaper l2arc devices with moderate IOPS may really be better than 1 or 2 faster ones.
I can imagine losing one out of many (or two) L2ARC devices would result in just partial performance loss. But is this a verified fact? I think there is a chance that removing one device out of an array, especiall when striped, could force ZFS to rewarm the L2ARC.
What has the biggest impact by the situation you stated would be the bigger S7000 Unified Storage Systems. The SSDs containing the L2ARC are built into the heads of the storage. So a failover of a pool will remove all SSDs from it. By now I have not found a way to provide a pool with any sort of L2ARC when failovered.
Hello uep!
Yes, we can spread the load, and that can help because we will be dividing the problem. And this is always a good architeture in our job.
But the fact that the application will suffer on the beginning, is a little different from what i’m saying here… because if we do not have a L2ARC device, the applications and the load will be used to that reallity. But with it, over time the applications/load could be base in a “fake” hit.
Hi Thomas,
That’s the point, the MTTR of a crashed L2ARC, and bring the system to a “normal” state again… to warm a L2ARC cache cat take several hours, and when we are talking about gigas of cache, this can have a great performance impact.
Hello Constantin,
I’m glad you like it!
I did not see anyone talking about this, but just the argument: “we will not loose anything, the data is on discs”. Well, that is the problem!
In the actual implementation, a L2ARC device can be like a “unfounded check” (bounced check).
Don’t you agree? ;-)