sndradm “nothing” output

Posted in Avs, Dtrace, General, Opensolaris by Marcelo Leal on 10 September 2007

I’m making some tests with AVS software, and the “sndradm” command is not working very well…
I have a two-node cluster configuration, and sndradm returns “OK” on the secondary node (node2), but did not returns anything on the primary (node1).

Take a Look:

 # sndradm -C local -P
/dev/rdsk/c2d0s0        <-      node1:/dev/rdsk/c2d0s0
autosync: off, max q writes: 4096, max q fbas: 16384, \\
async threads: 2, mode: sync, group: B2007, state: logging
/dev/rdsk/c3d0s0        <-      node1:/dev/rdsk/c3d0s0
autosync: off, max q writes: 4096, max q fbas: 16384, \\
async threads: 2, mode: sync, group: B2007, state: logging

Four lines of output:

 # sndradm -C local -P | wc -l
    4

Running the same command on node1:

 # sndradm -C local -P
 # echo $?
0
 #

So, i did start to look for the problem.. take a look at this dtrace script, first executed on the node2:

 # dtrace -n 'syscall::write:entry / pid == $target / \\
{ ustack(); }' -c "sndradm -C local -P"

dtrace: description 'syscall::write:entry ' matched 1 probe

/dev/rdsk/c2d0s0        <-      node1:/dev/rdsk/c2d0s0
autosync: off, max q writes: 4096, max q fbas: 16384, \\
async threads: 2, mode: sync, group: B2007, state: logging
/dev/rdsk/c3d0s0        <-      node1:/dev/rdsk/c3d0s0
autosync: off, max q writes: 4096, max q fbas: 16384, \\
async threads: 2, mode: sync, group: B2007, state: logging
dtrace: pid 28104 has exited
CPU     ID                    FUNCTION:NAME
  0 100087                      write:entry
              libc.so.1`_write+0x15
              libc.so.1`_ndoprnt+0x2816
              libc.so.1`printf+0xa8
              sndradm`rdc_print+0x646
              sndradm`main+0xceb
              sndradm`_start+0x7a

  0 100087                      write:entry
              libc.so.1`_write+0x15
              libc.so.1`_ndoprnt+0x2816
              libc.so.1`printf+0xa8
              sndradm`rdc_print+0x8ab
              sndradm`main+0xceb
              sndradm`_start+0x7a

  0 100087                      write:entry
              libc.so.1`_write+0x15
              libc.so.1`_ndoprnt+0x2816
              libc.so.1`printf+0xa8
              sndradm`rdc_print+0x646
              sndradm`main+0xceb
              sndradm`_start+0x7a

  0 100087                      write:entry
              libc.so.1`_write+0x15
              libc.so.1`_ndoprnt+0x2816
              libc.so.1`printf+0xa8
              sndradm`rdc_print+0x8ab
              sndradm`main+0xceb
              sndradm`_start+0x7a

  1 100087                      write:entry
              libc.so.1`_write+0x15
              libdscfg.so.1`cfg_writepid+0x5b
              libdscfg.so.1`cfg_enterpid+0x7d
              libdscfg.so.1`cfg_lockd_init+0x126
              libdscfg.so.1`cfg_open+0x40d
              sndradm`rdc_print+0x21f
              sndradm`main+0xceb
              sndradm`_start+0x7a

Ok, but in the node1:

# dtrace -n 'syscall::write:entry / pid == $target / \\
{ ustack(); }' -c "sndradm -C local -P"
dtrace: description 'syscall::write:entry ' matched 1 probe

dtrace: pid 3729 has exited
CPU     ID                    FUNCTION:NAME
  1  98411                      write:entry
              libc.so.1`_write+0x15
              libdscfg.so.1`cfg_writepid+0x5b
              libdscfg.so.1`cfg_enterpid+0x7d
              libdscfg.so.1`cfg_lockd_init+0x126
              libdscfg.so.1`cfg_open+0x40d
              sndradm`rdc_print+0x21f
              sndradm`main+0xceb
              sndradm`_start+0x7a

So, the normal output has five writes from the rdc_print function, but the second sndradm command (node1), has just one! But with that stack trace, we just know which function to look at...
Let's follow the entry/return of the rdc_print function:

# dtrace -n 'pid$target::rdc_print:entry /pid == $target/ \\
{self->follow = 1; } pid$target::rdc_print:return {self->  \\
follow = 0; trace(arg0);} pid$target:::entry /self->follow/\\
{} pid$target:::return /self->follow/ {trace(arg0);}' -c    \\
"sndradm -C local -P" | grep rdc_

dtrace: description 'pid$target::rdc_print:entry '
matched 11034 probes
  0 100881                  rdc_print:entry
  0 100930                  rdc_ioctl:entry
  0 106451                 rdc_ioctl:return               133
dtrace: pid 28113 has exited
  0 100891            rdc_print_state:entry
  0 106412           rdc_print_state:return               573
  0 100891            rdc_print_state:entry
  0 106412           rdc_print_state:return               573
  0 100882                 rdc_print:return              2370

On node1 (just the output):

dtrace: description 'pid$target::rdc_print:entry '
matched 11034 probes
  0  98991                  rdc_print:entry
  0  99040                  rdc_ioctl:entry
  0 104561                 rdc_ioctl:return               133
dtrace: pid 3745 has exited
  0  98992                 rdc_print:return              2370

As we could see, the command when executed on node1, there is no call to rdc_print_state function. Looking at the sndradm.c source code, we can see that the rdc_print function calls rdc_print_state to print the lines 2 and 4 on a normal execution.
Here we can see it:

(void) printf(gettext(", state: %s"), rdc_print_state(urdc));

The execution on node1 is not calling rdc_print_state!
Let's confirm it:

 # dtrace -n 'pid$target::rdc_print_state: {}' -c "sndradm -C local -P"
dtrace: description 'pid$target::rdc_print_state: ' matched 175 probes
dtrace: pid 3785 has exited

So, i did a similar dtrace script to look at each rdc_print instruction, and take a diff..
node2:

 # dtrace -n 'pid$target::rdc_print: {}' -c "sndradm -C local -P" |\\
 /usr/sfw/bin/ggrep -A 5 -B 6 280
dtrace: description 'pid$target::rdc_print: ' matched 702 probes
dtrace: pid 28156 has exited
CPU     ID                    FUNCTION:NAME
  0 101045                    rdc_print:253
  0 101046                    rdc_print:254
  0 101047                    rdc_print:259
  0 101048                    rdc_print:25c
  0 101049                    rdc_print:25f
  0 101057                    rdc_print:279
  0 101058                    rdc_print:280
  0 101059                    rdc_print:283
  0 101060                    rdc_print:286
  0 101061                    rdc_print:28c
  0 101062                    rdc_print:28f
  0 101063                    rdc_print:291

node1:

 # dtrace -n 'pid$target::rdc_print: {}' -c "sndradm -C local -P" |\\
 /usr/sfw/bin/ggrep -A 5 -B 6 280
dtrace: description 'pid$target::rdc_print: ' matched 702 probes
dtrace: pid 3801 has exited
  0  99155                    rdc_print:253
  0  99156                    rdc_print:254
  0  99157                    rdc_print:259
  0  99158                    rdc_print:25c
  0  99159                    rdc_print:25f
  0  99167                    rdc_print:279
  0  99168                    rdc_print:280
  0  99169                    rdc_print:283
  0  99170                    rdc_print:286
  0  99650                    rdc_print:8c1
  0  99651                    rdc_print:8c4
  0  99652                    rdc_print:8c7

I think the important line is rdc_print:286. After that line, the same code take different directions on node1 and node2. So, we can use mdb to disassembly the code (we can execute this script in any node):

 # echo rdc_print+271::dis | mdb /usr/sbin/sndradm
rdc_print+0x253:                pushl  %eax
rdc_print+0x254:                call   -0xcd8   < PLT:cfg_lock >
rdc_print+0x259:                addl   $0x8,%esp
rdc_print+0x25c:                cmpl   $0x0,%eax
rdc_print+0x25f:                jne    +0x1a    < rdc_print+0x279 >
rdc_print+0x261:                pushl  $0x8061f58
rdc_print+0x266:                call   -0xd4a   < PLT:gettext >
rdc_print+0x26b:                addl   $0x4,%esp
rdc_print+0x26e:                pushl  %eax
rdc_print+0x26f:                pushl  $0x0
rdc_print+0x271:                call   +0xe87f  < rdc_err >
rdc_print+0x276:                addl   $0x8,%esp
rdc_print+0x279:                movl   $0x0,-0x18(%ebp)
rdc_print+0x280:                movl   -0x18(%ebp),%eax
rdc_print+0x283:                cmpl   -0x20(%ebp),%eax
rdc_print+0x286:                jge    +0x63b   < rdc_print+0x8c1 >
rdc_print+0x28c:                movl   -0x18(%ebp),%ecx
rdc_print+0x28f:                movl   %ecx,%eax
rdc_print+0x291:                shll   $0x4,%eax
rdc_print+0x294:                movl   %ecx,%edx
rdc_print+0x296:                shll   $0x5,%edx

Ok, let's go! Here, the first important line is:

rdc_print+0x254:                call   -0xcd8   < PLT:cfg_lock >

In the function rdc_print there is just "one" call to cfg_lock function, and is on line: 273. That instruction is executed on both nodes, and the result is fine, as we can see here:

rdc_print+0x25f:                jne    +0x1a    < rdc_print+0x279 >

In both nodes the next executed instruction is rdc_print+0x279.

But the problem is on the next comparison, and in the following jump:

rdc_print+0x286:                jge    +0x63b   < rdc_print+0x8c1 >

Looking the source code more one time, we can see that the next evaluation is on line: 277, and is:

 for (i = 0; i < max; i++) {

The result for that evaluation is "wrong" on node1! For some reason, the max variable "must" be corrupted.. and the "for" is not executed, that is why i got "nothing" as output. In the line 263, we can see:

max = min(rdc_status->nset, rdc_maxsets);

So, dtrace gave us the answer, and the rdc_print+0x286 was the culprit..

I think we can continue the debug task with mdb, or another tool to find out the value of "max". But maybe the "devel team" know about that problem, so, i will post this on opensolaris forums, and see what i get...
That's all

About Marcelo Leal

I do work with systems architecture (operating systems, network and storage plumbing). Strong experience (more than 20 years): prospecting, modeling, planning and implementing new solutions to provide resilient internet services, cost effective operational infrastructure, storage performance and data protection for highly available systems.

View all posts by Marcelo Leal →

21 Comments Already

Subscribe to comments feed

Steve - October 16th, 2007 at 5:47 am

Have you made sure that the configuration location for AVS is on a slice of a disk shared between the cluster nodes, and with the location stored in /etc/dscfg_cluster using the dscfg and dscfgadm commands, as described in the Sun Cluster and AVS integration guide at
https://docs.sun.com/app/docs/doc/819-6150-10?l=en ?

By default AVS will install for local use on the system where it is installed, and if the configuration database isn’t shared then you’ll not see matching sndradm output on all nodes.
msl - October 17th, 2007 at 9:50 am

When configuring AVS (dscfgadm) in a Sun Cluster environment, the software complains about a shared disc for the cluster configuration. In my case, the did device is: “/dev/did/rdsk/d2s1”, and that information is stored in “/etc/dscfg_cluster” file. What i have forgot to mention, is that the configuration “works” until the reboot. I mean, the dsstat command, and the sndradm -P command work just fine! After the reboot the problem occurs.
andrei_r - October 5th, 2008 at 2:23 am

I curious if this ever as resolved? I’m hitting the same issue with dscfgadm configuration lost after the reboot in Sun Cluster environment.

Apparently dscfgadm requires a shared did device in the cluster environment, but I’m trying to configure the sndr replication with no shared storage at all.

Is there a way to recover from this, or one must reinitialize the database and restart the replication services?
Marcelo Leal - October 5th, 2008 at 8:54 pm

Hello Andrei!
Are you using the procedure i have described with SC 3.2, Solaris 10 and OpenSource AVS? Or are you using the AVS integrated on OpenSolaris/Nevada release? Anyway, depending on the node that reboots, and if you know the state of the discs (was a controlled reboot), you can resume (doing a reconfiguration). But in other cases the solution is resync…

Leal.
andrei_r - October 10th, 2008 at 9:55 pm

Hi Leal,

I was using nevada b97, and I have AVS that came with the build and opensolaris cluster express. I was trying to configure solaris cluster in vmware ESX environment on two ESX servers so I can share local disks using Solaris HA. I have no shared storage.

I realize that AVS requires shared storage in the SC environment. This is a showstopper, since I’ve observed that if a node is rebooted (was not a planned reboot), the AVS configuration information is lost. “sndradm -P” command runs but does not return anything on the rebooted node. Same command works fine on the node that was not rebooted (it stayed up since I also have a quorum virtual machine).

Thank you for your reply. I’ve noticed you had the same issue with sndradm. Unfortunately I had to put a project on hold and cannot provide you any more info at the moment…
Marcelo Leal - October 12th, 2008 at 3:35 pm

Hello Andrei, how are you!?
This problem is really bad, and we got no answers… i think you should post that in the storage community. Just search for my posts about “AVS configuration lost” or “sndradm nothing output”, and bring that discussions live again. I did talk to Jim Dunham many times, but with my totally hacked solution, was always the same answer.
So, with a bundled AVS, i think is sufficient to confirm the bug, and make somebody work on the problem. Another thing is file a CR to remove that silly AVS’ dependency when sndr is in a cluster environment. It’s important if you will use failover for AVS, and it’s not the case. So, it must be “optional”. My project on OpenSolaris is out, and would be nice hear your comments/suggestions:
https://www.opensolaris.org/os/project/ohac-avs/
Thanks,

Leal
Minhao Dong - May 11th, 2009 at 7:08 pm

Hello Leal, how are you?

I am very interest in your NON-Shared disc AVS. I try to follow your guide (Solaris 10 u3 – SC 3.2 ZFS/NFS HA with NON-shared) to deploy a NON-Shared AVS on Solaris 10 u6. It works find unite I reboot.

After reboot, sndradm -i or -p would not give me anything.

I have read this article and [storage-discuss] AVS configuration lost (part I, II, III). And I also did a little research, but I have no idea how to fix it.

I have few question, I do not have a did device as the cluster configuration database. Is that ok? or I still need one even it is a NON-shared disc AVS system?

And what is the hack way to fix it ?

Thanks very much.

I apologize for my pool English.

Thanks,
Minhao
Marcelo Leal - May 11th, 2009 at 10:00 pm

Hello Minhao,
First, in a cluster environment AVS needs a shared disk for the cluster shared database. That is something i think we need to fix, but right now is a reality.
The problem about “loosing the AVS configuration”, is something really anoying, and i don’t know for sure what is the cause. If you look the manuals from SUN, you will see that one possible cause is a wrong configuration on /etc/nsswitch.conf file. You can take a look on it, and see if the “file” directive appears before dns or any other database for hosts, netmasks, services, and etc. So, if you change it, and it solves your problem, luck you. ;-)
I have the same problem in some environments, and there is no way to fix. Maybe is a Solaris 10 or Solaris Express problem.
My cluster agent is to configure HA services with non-shared disks, but the AVS requirement is something we need to live with for now. You can use a iscsi lun for that, and take a look on my presentation on OHAC Summit :-)
Leal
Minhao Dong - May 12th, 2009 at 11:12 pm

Hello Leal, thanks for your help.

I already checked my nsswitch.conf, and then I tried save configuration in iscsi device, but it is still not work. It is unfortunate.

but still thanks for your help.
Koshan - August 6th, 2009 at 4:09 am

Думаю стоит добавить в избранное, мне понравилось
Geshak - August 6th, 2009 at 1:16 pm

Понравилась статья.Буду следить за комментами….
Plovdiy - August 10th, 2009 at 4:33 am

Неплохая подборка в блоге, хорошо сделано, автору спс.
Tenders - September 5th, 2009 at 9:46 am

Отличное наполнение блога, есть что почитать интересного, спс
Kinder - September 12th, 2009 at 3:31 am

А мне нравится этот блог, только авторам надо помнить , что посетители разные бывают. Короче учитывайте возростной ценс посетителей.
Kranovchik - September 30th, 2009 at 6:05 pm

Поддерживаю положительные отзывы о блоги, реально все качественно сделано.
Lexuses - October 2nd, 2009 at 12:05 am

Тоже думаю, что совсем неплохой блог сделали, побольше бы таких в рунете.
website designing pakistan - January 10th, 2010 at 5:37 pm

Very interesting and informative site! Good job done by you guys, Thanks
Chris - March 17th, 2010 at 4:41 am

im having the same problem using opensolaris 2009.6…. any solutions??
Marcelo Leal - March 17th, 2010 at 11:55 am

Hello Chris,
This is an *old* problem without fix that i know. If you do a search on google about AVS configuration lost, you will see a lot of emails from me. ;-)
You could ask again in OHAC mailing list with CC storage. I did talk with Jim Dunham about this behaviour without luck.
Thanks for your comment!
Aşk Büyüsü - April 28th, 2010 at 6:42 am

im having the same problem using opensolaris 2009.6…. any solutions??
Ed - July 19th, 2013 at 4:07 am

Did u ever get a solution to this. sndradm -P shows no output. I have a geographic cluster, one cluster suffered multiple disk failures on the shared storage including the volume that had the avs configuration database. I rebuilt the disk sets and even restored the avs config database using the one from the cluster that is up. When I validate my config using geopg validate, its ok, dscfgadm -i output shows services are up and the cluster config database is valid but I get no output from sndradm -P or dsstat. Regards