Elevator Algorithm II
In my last post i did write about some issues related with disksort:
1) Performance (Latency)
2) Consistency
In my D Script i’m printing the buf sector when the sd driver receives it. So, without sorting.
That’s important because i’m trying to understand why we have some label updates in the middle of my D script’s output, and that confirms that it has nothing to do (yet) with sd.c disksort algorithm. I think one thing we can guarantee is: the sd driver is receiving the commands in that order.
– But is ZFS sending the label updates in the wrong order?
– Or we have different groups of transactions on that report?
I need to figure it out, but i did some tests on a really idle server, and the commands are issued perfectly in sync with the spa_sync. Maybe we have cmds being mixed somewhere and i’m seeing that on sd driver layer. Anyway, i don’t have the right data to affirm anything… ;-)
Edited (13/sep/2010):
I did realize i’m not separating reads from writes, so the commands issued after the Label updates can be reads. And actually is what i think it is… what we need now is to print that information to confirm that, and if so, the only thing we need to understand is the Label update order (L0 and L2… L1 and L3).
From a performance perspective, as i did mention on that post too, i did solve a latency problem on some workloads tuning the zfs_vdev_max_pending parameter from “35” to “10” (and that’s the new default value for ZFS anyway). But for latencies to be predictable, i think a FIFO algorithm would be a better approach, solving the problem as a whole (even with 10 on the waitq we still have sorting enabled).
So, i have disabled disksort on disks, and here are the steps i did to change the disksort_disabled on a running system (i did not find this information anywhere, so i’m sharing this hoping can be usefull for somebody else):
OBS:
– This is just for one disk, using this method you will need to do it for each instance (disk) you want, in this case i’m doing for instance 1, or sd1;
– I’m changing kernel variables on a running server, do not do it in production or use this instructions at your own risk;
– The changes made here are not persistent after server reboot;
– In the end of this post there is some scripts to help set this parameter on more than one disk;
# echo '*sd_state::softstate 1 | \ ::print -at "struct sd_lun"' | mdb -k ... a lot of output ...
So, let’s see just the disksort_disabled option:
# echo '*sd_state::softstate 1 | \ ::print -at "struct sd_lun"' | \ mdb -k | grep disksort ffffff09090b77e7.3 unsigned un_f_disksort_disabled :1 = 0
The .3 is a specific bit on the address ffffff09090b77e7, so lets see the whole byte:
# echo '*sd_state::softstate 1 | \ ::print -at "struct sd_lun"' | \ mdb -k | grep ffffff09090b77e7 ffffff09090b77e7 unsigned un_f_pm_is_enabled :1 = 0 ffffff09090b77e7.1 unsigned un_f_watcht_stopped :1 = 0 ffffff09090b77e7.2 unsigned un_f_pkstats_enabled :1 = 0x1 ffffff09090b77e7.3 unsigned un_f_disksort_disabled :1 = 0 ffffff09090b77e7.4 unsigned un_f_lun_reset_enabled :1 = 0 ffffff09090b77e7.5 unsigned un_f_doorlock_supported :1 = 0 ffffff09090b77e7.6 unsigned un_f_start_stop_supported :1 = 0 ffffff09090b77e7.7 unsigned un_f_reserved1 :1 = 0
So, the Byte is: 0000 0100 (Decimal 4). Let’s confirm…
# echo '0xffffff09090b77e7/B' | mdb -k 0xffffff09090b77e7: 4
So, to change the disksort_disabled to 1, we need to change that Byte to: 0000 1100 (Decimal 12). Let’s do it!
# echo 'ffffff09090b77e7/v0t12' | mdb -kw 0xffffff09090b77e7: 0x4 = 0xc
Ok, let’s see if that works…
echo '*sd_state::softstate 1 | \ ::print -at "struct sd_lun" un_f_disksort_disabled' | mdb -k ffffff09090b77e7.3 unsigned un_f_disksort_disabled :1 = 0x1
We can print the whole Byte to confirm too:
# echo '*sd_state::softstate 1 | \ ::print -at "struct sd_lun"' | \ mdb -k | grep ffffff09090b77e7 ffffff09090b77e7 unsigned un_f_pm_is_enabled :1 = 0 ffffff09090b77e7.1 unsigned un_f_watcht_stopped :1 = 0 ffffff09090b77e7.2 unsigned un_f_pkstats_enabled :1 = 0x1 ffffff09090b77e7.3 unsigned un_f_disksort_disabled :1 = 0x1 ffffff09090b77e7.4 unsigned un_f_lun_reset_enabled :1 = 0 ffffff09090b77e7.5 unsigned un_f_doorlock_supported :1 = 0 ffffff09090b77e7.6 unsigned un_f_start_stop_supported :1 = 0 ffffff09090b77e7.7 unsigned un_f_reserved1 :1 = 0
Here is a sample script if you need to configure this for more disks (with the same configuration):
x=1; while [ $x -lt $NRDISKS ]; do echo -n "sd$x: "; \ for y in `echo "*sd_state::softstate 0t$x | \ ::print -at 'struct sd_lun' un_f_pm_is_enabled" | \ mdb -k | awk '{print $1}'`; do echo "$y/v0t12" | \ mdb -kw; done; let x=x+1; done sd1: 0xffffff09090b77e7: 0x4 = 0xc sd2: 0xffffff09102d0e27: 0x4 = 0xc sd3: 0xffffff0910780c27: 0x4 = 0xc sd4: 0xffffff0910ba04e7: 0x4 = 0xc sd5: 0xffffff0910998367: 0x4 = 0xc sd6: 0xffffff0932189067: 0x4 = 0xc sd7: 0xffffff0932188be7: 0x4 = 0xc sd8: 0xffffff0932188767: 0x4 = 0xc sd9: 0xffffff09321882e7: 0x4 = 0xc sd10: 0xffffff0932300e27: 0x4 = 0xc sd11: 0xffffff09323009a7: 0x4 = 0xc sd12: 0xffffff0932300527: 0x4 = 0xc sd13: 0xffffff09323000a7: 0x4 = 0xc sd14: 0xffffff09322ffc27: 0x4 = 0xc sd15: 0xffffff09322ff7a7: 0x4 = 0xc sd16: 0xffffff09322ff327: 0x4 = 0xc sd17: 0xffffff090f79f7a7: 0x4 = 0xc sd18: 0xffffff090f79fc27: 0x4 = 0xc sd19: 0xffffff090f79f327: 0x4 = 0xc sd20: 0xffffff0930adb0e7: 0x4 = 0xc sd21: 0xffffff09322fc9e7: 0x4 = 0xc sd22: 0xffffff09322fc567: 0x4 = 0xc sd23: 0xffffff09322fc0e7: 0x4 = 0xc ...
So you can check the new value:
x=1; while [ $x -lt $NRDISKS ]; do echo -n "sd$x: "; \ echo "*sd_state::softstate 0t$x | \ ::print -at 'struct sd_lun' un_f_disksort_disabled" | \ mdb -k; let x=x+1; done sd1: ffffff09090b77e7.3 unsigned un_f_disksort_disabled :1 = 0x1 sd2: ffffff09102d0e27.3 unsigned un_f_disksort_disabled :1 = 0x1 sd3: ffffff0910780c27.3 unsigned un_f_disksort_disabled :1 = 0x1 sd4: ffffff0910ba04e7.3 unsigned un_f_disksort_disabled :1 = 0x1 sd5: ffffff0910998367.3 unsigned un_f_disksort_disabled :1 = 0x1 sd6: ffffff0932189067.3 unsigned un_f_disksort_disabled :1 = 0x1 sd7: ffffff0932188be7.3 unsigned un_f_disksort_disabled :1 = 0x1 sd8: ffffff0932188767.3 unsigned un_f_disksort_disabled :1 = 0x1 sd9: ffffff09321882e7.3 unsigned un_f_disksort_disabled :1 = 0x1 sd10: ffffff0932300e27.3 unsigned un_f_disksort_disabled :1 = 0x1 sd11: ffffff09323009a7.3 unsigned un_f_disksort_disabled :1 = 0x1 sd12: ffffff0932300527.3 unsigned un_f_disksort_disabled :1 = 0x1 sd13: ffffff09323000a7.3 unsigned un_f_disksort_disabled :1 = 0x1 sd14: ffffff09322ffc27.3 unsigned un_f_disksort_disabled :1 = 0x1 sd15: ffffff09322ff7a7.3 unsigned un_f_disksort_disabled :1 = 0x1 sd16: ffffff09322ff327.3 unsigned un_f_disksort_disabled :1 = 0x1 sd17: ffffff090f79f7a7.3 unsigned un_f_disksort_disabled :1 = 0x1 sd18: ffffff090f79fc27.3 unsigned un_f_disksort_disabled :1 = 0x1 sd19: ffffff090f79f327.3 unsigned un_f_disksort_disabled :1 = 0x1 sd20: ffffff0930adb0e7.3 unsigned un_f_disksort_disabled :1 = 0x1 sd21: ffffff09322fc9e7.3 unsigned un_f_disksort_disabled :1 = 0x1 sd22: ffffff09322fc567.3 unsigned un_f_disksort_disabled :1 = 0x1 sd23: ffffff09322fc0e7.3 unsigned un_f_disksort_disabled :1 = 0x1 ...
OBS: You just need to set the “NRDISKS” variable (e.g: how many disks you have).
To make this permanent, you will need to use this method, or find a /etc/system global parameter for it (what is not recommended). To be continued…
peace