sndradm “nothing” output
I’m making some tests with AVS software, and the “sndradm” command is not working very well…
I have a two-node cluster configuration, and sndradm returns “OK” on the secondary node (node2), but did not returns anything on the primary (node1).
Take a Look:
# sndradm -C local -P /dev/rdsk/c2d0s0 <- node1:/dev/rdsk/c2d0s0 autosync: off, max q writes: 4096, max q fbas: 16384, \\ async threads: 2, mode: sync, group: B2007, state: logging /dev/rdsk/c3d0s0 <- node1:/dev/rdsk/c3d0s0 autosync: off, max q writes: 4096, max q fbas: 16384, \\ async threads: 2, mode: sync, group: B2007, state: logging
Four lines of output:
# sndradm -C local -P | wc -l 4
Running the same command on node1:
# sndradm -C local -P # echo $? 0 #
So, i did start to look for the problem.. take a look at this dtrace script, first executed on the node2:
# dtrace -n 'syscall::write:entry / pid == $target / \\ { ustack(); }' -c "sndradm -C local -P" dtrace: description 'syscall::write:entry ' matched 1 probe /dev/rdsk/c2d0s0 <- node1:/dev/rdsk/c2d0s0 autosync: off, max q writes: 4096, max q fbas: 16384, \\ async threads: 2, mode: sync, group: B2007, state: logging /dev/rdsk/c3d0s0 <- node1:/dev/rdsk/c3d0s0 autosync: off, max q writes: 4096, max q fbas: 16384, \\ async threads: 2, mode: sync, group: B2007, state: logging dtrace: pid 28104 has exited CPU ID FUNCTION:NAME 0 100087 write:entry libc.so.1`_write+0x15 libc.so.1`_ndoprnt+0x2816 libc.so.1`printf+0xa8 sndradm`rdc_print+0x646 sndradm`main+0xceb sndradm`_start+0x7a 0 100087 write:entry libc.so.1`_write+0x15 libc.so.1`_ndoprnt+0x2816 libc.so.1`printf+0xa8 sndradm`rdc_print+0x8ab sndradm`main+0xceb sndradm`_start+0x7a 0 100087 write:entry libc.so.1`_write+0x15 libc.so.1`_ndoprnt+0x2816 libc.so.1`printf+0xa8 sndradm`rdc_print+0x646 sndradm`main+0xceb sndradm`_start+0x7a 0 100087 write:entry libc.so.1`_write+0x15 libc.so.1`_ndoprnt+0x2816 libc.so.1`printf+0xa8 sndradm`rdc_print+0x8ab sndradm`main+0xceb sndradm`_start+0x7a 1 100087 write:entry libc.so.1`_write+0x15 libdscfg.so.1`cfg_writepid+0x5b libdscfg.so.1`cfg_enterpid+0x7d libdscfg.so.1`cfg_lockd_init+0x126 libdscfg.so.1`cfg_open+0x40d sndradm`rdc_print+0x21f sndradm`main+0xceb sndradm`_start+0x7a
Ok, but in the node1:
# dtrace -n 'syscall::write:entry / pid == $target / \\ { ustack(); }' -c "sndradm -C local -P" dtrace: description 'syscall::write:entry ' matched 1 probe dtrace: pid 3729 has exited CPU ID FUNCTION:NAME 1 98411 write:entry libc.so.1`_write+0x15 libdscfg.so.1`cfg_writepid+0x5b libdscfg.so.1`cfg_enterpid+0x7d libdscfg.so.1`cfg_lockd_init+0x126 libdscfg.so.1`cfg_open+0x40d sndradm`rdc_print+0x21f sndradm`main+0xceb sndradm`_start+0x7a
So, the normal output has five writes from the rdc_print function, but the second sndradm command (node1), has just one! But with that stack trace, we just know which function to look at...
Let's follow the entry/return of the rdc_print function:
# dtrace -n 'pid$target::rdc_print:entry /pid == $target/ \\ {self->follow = 1; } pid$target::rdc_print:return {self-> \\ follow = 0; trace(arg0);} pid$target:::entry /self->follow/\\ {} pid$target:::return /self->follow/ {trace(arg0);}' -c \\ "sndradm -C local -P" | grep rdc_ dtrace: description 'pid$target::rdc_print:entry ' matched 11034 probes 0 100881 rdc_print:entry 0 100930 rdc_ioctl:entry 0 106451 rdc_ioctl:return 133 dtrace: pid 28113 has exited 0 100891 rdc_print_state:entry 0 106412 rdc_print_state:return 573 0 100891 rdc_print_state:entry 0 106412 rdc_print_state:return 573 0 100882 rdc_print:return 2370
On node1 (just the output):
dtrace: description 'pid$target::rdc_print:entry ' matched 11034 probes 0 98991 rdc_print:entry 0 99040 rdc_ioctl:entry 0 104561 rdc_ioctl:return 133 dtrace: pid 3745 has exited 0 98992 rdc_print:return 2370
As we could see, the command when executed on node1, there is no call to rdc_print_state function. Looking at the sndradm.c source code, we can see that the rdc_print function calls rdc_print_state to print the lines 2 and 4 on a normal execution.
Here we can see it:
(void) printf(gettext(", state: %s"), rdc_print_state(urdc));
The execution on node1 is not calling rdc_print_state!
Let's confirm it:
# dtrace -n 'pid$target::rdc_print_state: {}' -c "sndradm -C local -P" dtrace: description 'pid$target::rdc_print_state: ' matched 175 probes dtrace: pid 3785 has exited
So, i did a similar dtrace script to look at each rdc_print instruction, and take a diff..
node2:
# dtrace -n 'pid$target::rdc_print: {}' -c "sndradm -C local -P" |\\ /usr/sfw/bin/ggrep -A 5 -B 6 280 dtrace: description 'pid$target::rdc_print: ' matched 702 probes dtrace: pid 28156 has exited CPU ID FUNCTION:NAME 0 101045 rdc_print:253 0 101046 rdc_print:254 0 101047 rdc_print:259 0 101048 rdc_print:25c 0 101049 rdc_print:25f 0 101057 rdc_print:279 0 101058 rdc_print:280 0 101059 rdc_print:283 0 101060 rdc_print:286 0 101061 rdc_print:28c 0 101062 rdc_print:28f 0 101063 rdc_print:291
node1:
# dtrace -n 'pid$target::rdc_print: {}' -c "sndradm -C local -P" |\\ /usr/sfw/bin/ggrep -A 5 -B 6 280 dtrace: description 'pid$target::rdc_print: ' matched 702 probes dtrace: pid 3801 has exited 0 99155 rdc_print:253 0 99156 rdc_print:254 0 99157 rdc_print:259 0 99158 rdc_print:25c 0 99159 rdc_print:25f 0 99167 rdc_print:279 0 99168 rdc_print:280 0 99169 rdc_print:283 0 99170 rdc_print:286 0 99650 rdc_print:8c1 0 99651 rdc_print:8c4 0 99652 rdc_print:8c7
I think the important line is rdc_print:286. After that line, the same code take different directions on node1 and node2. So, we can use mdb to disassembly the code (we can execute this script in any node):
# echo rdc_print+271::dis | mdb /usr/sbin/sndradm rdc_print+0x253: pushl %eax rdc_print+0x254: call -0xcd8 < PLT:cfg_lock > rdc_print+0x259: addl $0x8,%esp rdc_print+0x25c: cmpl $0x0,%eax rdc_print+0x25f: jne +0x1a < rdc_print+0x279 > rdc_print+0x261: pushl $0x8061f58 rdc_print+0x266: call -0xd4a < PLT:gettext > rdc_print+0x26b: addl $0x4,%esp rdc_print+0x26e: pushl %eax rdc_print+0x26f: pushl $0x0 rdc_print+0x271: call +0xe87f < rdc_err > rdc_print+0x276: addl $0x8,%esp rdc_print+0x279: movl $0x0,-0x18(%ebp) rdc_print+0x280: movl -0x18(%ebp),%eax rdc_print+0x283: cmpl -0x20(%ebp),%eax rdc_print+0x286: jge +0x63b < rdc_print+0x8c1 > rdc_print+0x28c: movl -0x18(%ebp),%ecx rdc_print+0x28f: movl %ecx,%eax rdc_print+0x291: shll $0x4,%eax rdc_print+0x294: movl %ecx,%edx rdc_print+0x296: shll $0x5,%edx
Ok, let's go! Here, the first important line is:
rdc_print+0x254: call -0xcd8 < PLT:cfg_lock >
In the function rdc_print there is just "one" call to cfg_lock function, and is on line: 273. That instruction is executed on both nodes, and the result is fine, as we can see here:
rdc_print+0x25f: jne +0x1a < rdc_print+0x279 >
In both nodes the next executed instruction is rdc_print+0x279.
But the problem is on the next comparison, and in the following jump:
rdc_print+0x286: jge +0x63b < rdc_print+0x8c1 >
Looking the source code more one time, we can see that the next evaluation is on line: 277, and is:
for (i = 0; i < max; i++) {
The result for that evaluation is "wrong" on node1! For some reason, the max variable "must" be corrupted.. and the "for" is not executed, that is why i got "nothing" as output. In the line 263, we can see:
max = min(rdc_status->nset, rdc_maxsets);
So, dtrace gave us the answer, and the rdc_print+0x286 was the culprit..
I think we can continue the debug task with mdb, or another tool to find out the value of "max". But maybe the "devel team" know about that problem, so, i will post this on opensolaris forums, and see what i get...
That's all
Have you made sure that the configuration location for AVS is on a slice of a disk shared between the cluster nodes, and with the location stored in /etc/dscfg_cluster using the dscfg and dscfgadm commands, as described in the Sun Cluster and AVS integration guide at
https://docs.sun.com/app/docs/doc/819-6150-10?l=en ?
By default AVS will install for local use on the system where it is installed, and if the configuration database isn’t shared then you’ll not see matching sndradm output on all nodes.
When configuring AVS (dscfgadm) in a Sun Cluster environment, the software complains about a shared disc for the cluster configuration. In my case, the did device is: “/dev/did/rdsk/d2s1”, and that information is stored in “/etc/dscfg_cluster” file. What i have forgot to mention, is that the configuration “works” until the reboot. I mean, the dsstat command, and the sndradm -P command work just fine! After the reboot the problem occurs.
I curious if this ever as resolved? I’m hitting the same issue with dscfgadm configuration lost after the reboot in Sun Cluster environment.
Apparently dscfgadm requires a shared did device in the cluster environment, but I’m trying to configure the sndr replication with no shared storage at all.
Is there a way to recover from this, or one must reinitialize the database and restart the replication services?
Hello Andrei!
Are you using the procedure i have described with SC 3.2, Solaris 10 and OpenSource AVS? Or are you using the AVS integrated on OpenSolaris/Nevada release? Anyway, depending on the node that reboots, and if you know the state of the discs (was a controlled reboot), you can resume (doing a reconfiguration). But in other cases the solution is resync…
Leal.
Hi Leal,
I was using nevada b97, and I have AVS that came with the build and opensolaris cluster express. I was trying to configure solaris cluster in vmware ESX environment on two ESX servers so I can share local disks using Solaris HA. I have no shared storage.
I realize that AVS requires shared storage in the SC environment. This is a showstopper, since I’ve observed that if a node is rebooted (was not a planned reboot), the AVS configuration information is lost. “sndradm -P” command runs but does not return anything on the rebooted node. Same command works fine on the node that was not rebooted (it stayed up since I also have a quorum virtual machine).
Thank you for your reply. I’ve noticed you had the same issue with sndradm. Unfortunately I had to put a project on hold and cannot provide you any more info at the moment…
Hello Andrei, how are you!?
This problem is really bad, and we got no answers… i think you should post that in the storage community. Just search for my posts about “AVS configuration lost” or “sndradm nothing output”, and bring that discussions live again. I did talk to Jim Dunham many times, but with my totally hacked solution, was always the same answer.
So, with a bundled AVS, i think is sufficient to confirm the bug, and make somebody work on the problem. Another thing is file a CR to remove that silly AVS’ dependency when sndr is in a cluster environment. It’s important if you will use failover for AVS, and it’s not the case. So, it must be “optional”. My project on OpenSolaris is out, and would be nice hear your comments/suggestions:
https://www.opensolaris.org/os/project/ohac-avs/
Thanks,
Leal
Hello Leal, how are you?
I am very interest in your NON-Shared disc AVS. I try to follow your guide (Solaris 10 u3 – SC 3.2 ZFS/NFS HA with NON-shared) to deploy a NON-Shared AVS on Solaris 10 u6. It works find unite I reboot.
After reboot, sndradm -i or -p would not give me anything.
I have read this article and [storage-discuss] AVS configuration lost (part I, II, III). And I also did a little research, but I have no idea how to fix it.
I have few question, I do not have a did device as the cluster configuration database. Is that ok? or I still need one even it is a NON-shared disc AVS system?
And what is the hack way to fix it ?
Thanks very much.
I apologize for my pool English.
Thanks,
Minhao
Hello Minhao,
First, in a cluster environment AVS needs a shared disk for the cluster shared database. That is something i think we need to fix, but right now is a reality.
The problem about “loosing the AVS configuration”, is something really anoying, and i don’t know for sure what is the cause. If you look the manuals from SUN, you will see that one possible cause is a wrong configuration on /etc/nsswitch.conf file. You can take a look on it, and see if the “file” directive appears before dns or any other database for hosts, netmasks, services, and etc. So, if you change it, and it solves your problem, luck you. ;-)
I have the same problem in some environments, and there is no way to fix. Maybe is a Solaris 10 or Solaris Express problem.
My cluster agent is to configure HA services with non-shared disks, but the AVS requirement is something we need to live with for now. You can use a iscsi lun for that, and take a look on my presentation on OHAC Summit :-)
Leal
Hello Leal, thanks for your help.
I already checked my nsswitch.conf, and then I tried save configuration in iscsi device, but it is still not work. It is unfortunate.
but still thanks for your help.
Думаю стоит добавить в избранное, мне понравилось
Понравилась статья.Буду следить за комментами….
Неплохая подборка в блоге, хорошо сделано, автору спс.
Отличное наполнение блога, есть что почитать интересного, спс
А мне нравится этот блог, только авторам надо помнить , что посетители разные бывают. Короче учитывайте возростной ценс посетителей.
Поддерживаю положительные отзывы о блоги, реально все качественно сделано.
Тоже думаю, что совсем неплохой блог сделали, побольше бы таких в рунете.
Very interesting and informative site! Good job done by you guys, Thanks
im having the same problem using opensolaris 2009.6…. any solutions??
Hello Chris,
This is an *old* problem without fix that i know. If you do a search on google about AVS configuration lost, you will see a lot of emails from me. ;-)
You could ask again in OHAC mailing list with CC storage. I did talk with Jim Dunham about this behaviour without luck.
Thanks for your comment!
im having the same problem using opensolaris 2009.6…. any solutions??
Did u ever get a solution to this. sndradm -P shows no output. I have a geographic cluster, one cluster suffered multiple disk failures on the shared storage including the volume that had the avs configuration database. I rebuilt the disk sets and even restored the avs config database using the one from the cluster that is up. When I validate my config using geopg validate, its ok, dscfgadm -i output shows services are up and the cluster config database is valid but I get no output from sndradm -P or dsstat. Regards