ZFS Internals (part #11)

PLEASE BE AWARE THAT ANY INFORMATION YOU MAY FIND HERE MAY BE INACCURATE, AND COULD INCLUDE TECHNICAL INACCURACIES, TYPOGRAPHICAL ERRORS, AND EVEN SPELLING ERRORS.

 From the MANUAL page:
 The zdb command is used by  support  engineers  to  diagnose
 failures and gather statistics. Since the ZFS file system is
 always consistent on disk and is self-repairing, zdb  should
 only be run under the direction by a support engineer.

DO NOT TRY IT IN PRODUCTION. USE AT YOUR OWN RISK!

Hello there… well, after securing the title at #UFCRio, i will do something less agressive. ;-) hehe, you don’t know how many jokes i’m listening about this…

In all these years working with ZFS, just a few times i did need to execute similar procedures as i will describe in this post. And (luck) all the times i could stress the filesystem and face this scenarios at lab, and rely on a stable configuration on production. Hope that do not change!
This time i did face the “where-is-my-pool” trying to exercise the capabilities of ZFS as “Zettabyte” filesystem (Creating and using EB and ZB thin provisioned). In my experience, the sparse and “really big” dataset is a good way to stress ZFS (actually, any FS i think). The demand to space is growing do you know… so, the trigger this time was: destroying the dataset.

BTW, we are talking about the Illumos source code on this blog from now on. Specifically in this time, the build 151.

As from the begininig of this series, here we talk about joys that the ZFS hackers did hide on the source code, and so we mortals do need to seek for them. In the old days the procedure to do what we will do in this post was much more complicated, and this is the first time i need to dig on this, after the new features to address restoring a faulted pool integrated on the ZFS code.
After my tests, i was presented in a situation where the machine was hang deleting a dataset, and after the reset, freezing again. A caotic situation. So, as i did know that the txg rewind code was present on the new ZFS implementation, i did take a look at the zpool command and could not find anything about it. So, my next stop was the source code: zpool_main.c
The idea was to find a comment, so doing a “/rewind” on vi, the first match was this:

  *       -F     Attempt rewind if necessary.

C’mon, why that is not on the zpool manual page?!
Oh, that reminds me the You are not expected to understand this. Ok…
Well, i think that would do the job. Just issuing “-F”. But as that comment section describes two undocumented options (explicitly), i will put this here:

 *       -V     Import even in the presence of faulted vdevs.  This is an
 *              intentionally undocumented option for testing purposes, and
 *              treats the pool configuration as complete, leaving any bad
 *              vdevs in the FAULTED state. In other words, it does verbatim
 *              import.
...
 *       -T     Specify a starting txg to use for import. This option is
 *              intentionally undocumented option for testing purposes.

If you look at the “zpool -h” switch, you will see a lot of options, but no way to know nothing about them…
The last point was to look at the code that actually handles the options. So, i think these two lines tell a lot:

         /* check options */
        while ((c = getopt(argc, argv, ":aCc:d:DEfFmnNo:rR:T:VX")) != -1) {

X? ;-)
No way to stop now, we will need to look at the “case” statement… so, here is some excerpts of the “case” code:

                case 'F':
                        do_rewind = B_TRUE;
                        break;
                case 'm':
                        flags |= ZFS_IMPORT_MISSING_LOG;
                        break;
                case 'n':
                        dryrun = B_TRUE;
                        break;
...
                case 'T':
                        errno = 0;
                        txg = strtoull(optarg, &endptr, 10);
                        if (errno != 0 || *endptr != '\0') {
                                (void) fprintf(stderr,
                                    gettext("invalid txg value\n"));
                                usage(B_FALSE);
                        }
                        rewind_policy = ZPOOL_DO_REWIND | ZPOOL_EXTREME_REWIND;
                        break;
                case 'V':
                        flags |= ZFS_IMPORT_VERBATIM;
                        break;
                case 'X':
                        xtreme_rewind = B_TRUE;
                        break;
...

After that reading, i had two bullets: -F and -X (for who was with none, is a lot). ;-)
Let’s use them both from once: zpool import -FX mypool
ps.: The -X will be used just after -F on rewind code…

The pool online and in good shape! The filesystem i did remove was not there, so was after the deletion, and a scrub did run without any issues. But i think it could be easier if the information was public. We could talk about the -T, for sure about the -m, -V options too, but this post is very extense already.

So, we could make some tests and explore that in another blog post. hoping nobody did loose a pool (with real data), without knowing about these options.

The bug? I’m trying to participate and inform that on the illumos mailing list, but seems like the people there are really busy.
peace

uep - September 2nd, 2011 at 3:22 am

For what it’s worth, -F and -m are documented at least in the Solaris Express b151 (different 151) manpage.

-X isn’t, but that just allows rolling back further than the first 10 (iirc) TXG’s, and can be found in the discussions about the feature when it was introduced.

Steven Harrison - August 23rd, 2012 at 5:53 pm

Great blog! I’ve got one maybe you can answer… how do you get a device to come online while in a failed (unavailable) zpool?

The -V will import the pool, but you cannot zpool online when the pool is not available. Likewise you cannot it seems zpool online when the pool is not imported. What I’d like to be able to do is import the pool from the zdb labels on the offline disk, since it shows all my disks online!

Any ideas?

Marcelo Leal - August 23rd, 2012 at 6:08 pm

Hello Steven,
I’m afraid I have not understood your question.
Did you try the above procedure? You are talking about offline devices, offline disks, offline pools… if you have some devices not available and you want to access your pool, you can try the above procedure and depending on your protection level, you can lose or not some data.
If the device is a cache or log device, zpool should import without problems (if is a log device, and the device is gone, you can not be able to replay some transations, and lose some data too). But the filesystem on disk must be 100% consistent. Hope that helps.
Leal

Elliot - March 22nd, 2013 at 2:41 pm

Hello Marcelo,

I’ve been googling but can’t find a solution for my problem with “zpool import” . Hope a zfs guru could give some idea what went wrong with my ZFS pool. My system was just upgraded to FreeBSD 9.1 for 1-2 weeks. It suddenly hanged, so I boot the system with a Live CD (FreeBSD 9.1 with ZPOOL version 28) and tried to do zpool import but failed.

All the disks are showing ONLINE but the zpool is showing UNAVAIL with “newer version” error. But the “zdb -l” show on disk format version 14. ZPOOL version 28 is the latest version FreeBSD is using. Do you know what went wrong with the zpool? Is it possible a software bug that corrupted the pool data? I’ve also tried “zpool import -fFX tank” but still couldn’t help.

# zpool import
pool: tank
id: [removed]
state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
see: https://illumos.org/msg/ZFS-8000-EY
config:

tank UNAVAIL newer version
mirror-0 ONLINE
label/disk0 ONLINE
label/disk1 ONLINE

# zpool import -fFX tank
cannot import ‘tank’: one or more devices is currently unavailable

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Leal's blog

ZFS Internals (part #11)

About Marcelo Leal

4 Comments Already

Leave a Reply Cancel reply

ZFS Internals (part #11)

Related Posts

ASCiiVMSSDashboard for Azure Virtual Machine Scale-Set

AroundCorners World Application

Packt $5 eBook Bonanza!

About Marcelo Leal

4 Comments Already

Leave a Reply Cancel reply