ZFS Internals (part #5)
From the MANUAL page: The zdb command is used by support engineers to diagnose failures and gather statistics. Since the ZFS file system is always consistent on disk and is self-repairing, zdb should only be run under the direction by a support engineer.
Today, we will see if what we did with the ext3 filesystem can be done with ZFS. We start creating a brand new filesystem, and putting our file into it…
# mkfile 100m /var/fakedevices/disk0 # zpool create cow /var/fakedevices/disk0 # zfs create cow/fs01 # cp -pRf /root/bash_completion /cow/fs01/ # ls -i /cow/fs01/ 4 bash_completion
Ok, now we can start to play..
# zdb -dddddd cow/fs01 4 ... snipped... path /bash_completion atime Sun Sep 21 12:03:56 2008 mtime Sun Sep 21 12:03:56 2008 ctime Sun Sep 21 12:10:39 2008 crtime Sun Sep 21 12:10:39 2008 gen 16 mode 100644 size 216071 parent 3 links 1 xattr 0 rdev 0x0000000000000000 Indirect blocks: 0 L1 0:a000:400 0:120a000:400 4000L/400P F=2 B=16 0 L0 0:40000:20000 20000L/20000P F=1 B=16 20000 L0 0:60000:20000 20000L/20000P F=1 B=16 segment [0000000000000000, 0000000001000000) size 16M
So, now we have the DVA‘s for the two data blocks (0:40000 and 0:60000). Let’s get our data, umount the filesystem, and try to put the data in place again. For that, we just need the first block…
# zdb -R 0:40000:20000:r 2> /tmp/file-part1 # head /tmp/file-part1 # bash_completion - programmable completion functions for bash 3.x # (backwards compatible with bash 2.05b) # # $Id: bash_completion,v 1.872 2006/03/01 16:20:18 ianmacd Exp $ # # Copyright (C) Ian Macdonald# # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option)
Let’s change the file’s content, and put it there again. But let’s do a change more difficult to catch this time (just one byte).
# vi /tmp/file-part1 change # bash_completion - programmable completion functions for bash 3.x for # bash_completion - programmable completion functions for bash 1.x # zpool export cow # dd if=/var/fakedevices/disk0 of=/tmp/fs01-part1 bs=512 count=8704 # dd if=/var/fakedevices/disk0 of=/tmp/file-part1 bs=512 iseek=8704 count=256 # dd if=/var/fakedevices/disk0 of=/tmp/fs01-part2 bs=512 skip=8960 # dd if=/tmp/file-part1 of=/tmp/payload bs=131072 count=1 # cp -pRf /tmp/fs01-part1 /var/fakedevices/disk0 # cat /tmp/payload >> /var/fakedevices/disk0 # cat /tmp/fs01-part2 >> /var/fakedevices/disk0 # zpool import -d /var/fakedevices/ cow # zpool status pool: cow state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM cow ONLINE 0 0 0 /var/fakedevices//disk0 ONLINE 0 0 0 errors: No known data errors
Ok, everything seems to be fine. So, let’s get our data…
# head /cow/fs01/bash_completion #
Nothing? But our file is there…
# ls -l /cow/fs01 total 517 -rw-r--r-- 1 root root 216071 Sep 21 12:03 bash_completion # ls -l /root/bash_completion -rw-r--r-- 1 root root 216071 Sep 21 12:03 /root/bash_completion
Let’s see the zpool status command again…
# zpool status -v pool: cow state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM cow ONLINE 0 0 2 /var/fakedevices//disk0 ONLINE 0 0 2 errors: 1 data errors, use '-v' for a list
Oh, trying to access the file, ZFS could see the checksum error on the block pointer. That’s why is important to schedule a scrub, because it will traverse the entire pool looking for errors like that. In this example i did use a pool with just one disk, in a real situation, don’t do that! If we had a mirror for example, ZFS would fix the problem using a “good” copy (in this case, if the bad guy did not mess with it too). What zdb can show to us?
# zdb -c cow Traversing all blocks to verify checksums and verify nothing leaked ... zdb_blkptr_cb: Got error 50 reading <21, 4, 0, 0> -- skipping Error counts: errno count 50 1 leaked space: vdev 0, offset 0x40000, size 131072 block traversal size 339456 != alloc 470528 (leaked 131072) bp count: 53 bp logical: 655360 avg: 12365 bp physical: 207360 avg: 3912 compression: 3.16 bp allocated: 339456 avg: 6404 compression: 1.93 SPA allocated: 470528 used: 0.47%
Ok, we have another copy (from a trusted media ;)…
# cp -pRf /root/bash_completion /cow/fs01/bash_completion # head /cow/fs01/bash_completion # bash_completion - programmable completion functions for bash 3.x # (backwards compatible with bash 2.05b) # # $Id: bash_completion,v 1.872 2006/03/01 16:20:18 ianmacd Exp $ # # Copyright (C) Ian Macdonald# # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option)
Now everything is in a good shape again…
see ya.