MRSL.NONsharedDevice and Amnesia
Sun cluster(SC) is a great software, and implementing my first agent i could see it for real. Let’s take the amnesia scenario as an example:
As you know, the MRSL.NONsharedDevice is a resource type to provide NFS/ZFS HA using non-shared storage. To do that, we need a strong synchronization between the discs, and for that we use the AVS software.
So, imagine this sequence of events:
1) The Resource is running on node “A”, and the AVS software is in replicating state (node “B” is just fine).
2) Occurs a failure with node “B”, the AVS software goes to “logging” mode automatically. But you must know that the data on the node “B” discs is not updated, because the filesystem was being used after the node “B” failure.
3) Some time later (before node “B” comes back), the node “A” goes down (crash).
In the above scenario, the SC software will not let (by default), the resources be started on node “B”. Because it “knows” that may have some kind of inconsistency (in user data, or SC data). In our specific case (MRSL.NONsharedDevice), that is really true, and the resources managed by this agent “must” be started on the last online node (node “A”).
I know, i know, sometimes that is not possible, and we need/can start on the other node (node “B”), because “something” is better than nothing… so, you can tell the SC to start anyway, but in that situation, you know what you are doing, and the SC has done its job warning you.