Discussion:
[OpenZFS Developer] Review Request 112: account for ashift when choosing buffers to be written to l2arc device
Andriy Gapon
2014-10-07 16:04:10 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.csiden.org/r/112/
-----------------------------------------------------------

Review request for OpenZFS Developer Mailing List, Matthew Ahrens and Saso Kiselkov.


Repository: illumos-gate


Description
-------

If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.

The discrepancy between the write size calculation and the actual increment
to l2ad_hand was introduced in
commit e14bb3258d05c1b1077e2db7cf77088924e56919

Also, consistently use asize / a_sz for the allocated size, psize / p_sz
for the physical size. Where the latter accounts for possible size
reduction because of compression, whereas the former accounts for possible
size expansion because of alignment requirements.

The code still assumes that either underlying storage subsystems or
hardware is able to do read-modify-write when an L2ARC buffer size is
not a multiple of a disk's block size. This is true for 4KB sector disks
that provide 512B sector emulation, but may not be true in general.
In other words, we currently do not have any code to make sure that
an L2ARC buffer, whether compressed or not, which is used for physical I/O
has a suitable size.


Diffs
-----

usr/src/uts/common/fs/zfs/arc.c 69d16af3b669458e3937fe0c6a4d91755bc6e2a7

Diff: https://reviews.csiden.org/r/112/diff/


Testing
-------


Thanks,

Andriy Gapon
Matthew Ahrens
2014-10-07 17:46:23 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.csiden.org/r/112/#review275
-----------------------------------------------------------


You will also need to file a bug and test on illumos.


usr/src/uts/common/fs/zfs/arc.c
<https://reviews.csiden.org/r/112/#comment229>

I see that you're keeping the meaning of these stats the same, but I think they were wrong before. What would you think about changing them to the (new, correct definition) write_asize?


- Matthew Ahrens
Post by Andriy Gapon
-----------------------------------------------------------
https://reviews.csiden.org/r/112/
-----------------------------------------------------------
(Updated Oct. 7, 2014, 4:04 p.m.)
Review request for OpenZFS Developer Mailing List, Matthew Ahrens and Saso Kiselkov.
Repository: illumos-gate
Description
-------
If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.
The discrepancy between the write size calculation and the actual increment
to l2ad_hand was introduced in
commit e14bb3258d05c1b1077e2db7cf77088924e56919
Also, consistently use asize / a_sz for the allocated size, psize / p_sz
for the physical size. Where the latter accounts for possible size
reduction because of compression, whereas the former accounts for possible
size expansion because of alignment requirements.
The code still assumes that either underlying storage subsystems or
hardware is able to do read-modify-write when an L2ARC buffer size is
not a multiple of a disk's block size. This is true for 4KB sector disks
that provide 512B sector emulation, but may not be true in general.
In other words, we currently do not have any code to make sure that
an L2ARC buffer, whether compressed or not, which is used for physical I/O
has a suitable size.
Diffs
-----
usr/src/uts/common/fs/zfs/arc.c 69d16af3b669458e3937fe0c6a4d91755bc6e2a7
Diff: https://reviews.csiden.org/r/112/diff/
Testing
-------
Thanks,
Andriy Gapon
Andriy Gapon
2014-10-08 13:38:12 UTC
Permalink
Post by Andriy Gapon
Post by Matthew Ahrens
You will also need to file a bug and test on illumos.
The bug: https://www.illumos.org/issues/5219
I failed to mention that my testing was done on FreeBSD, so illumos testing is the harder part.


- Andriy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.csiden.org/r/112/#review275
-----------------------------------------------------------
Post by Andriy Gapon
-----------------------------------------------------------
https://reviews.csiden.org/r/112/
-----------------------------------------------------------
(Updated Oct. 7, 2014, 7:04 p.m.)
Review request for OpenZFS Developer Mailing List, Matthew Ahrens and Saso Kiselkov.
Repository: illumos-gate
Description
-------
If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.
The discrepancy between the write size calculation and the actual increment
to l2ad_hand was introduced in
commit e14bb3258d05c1b1077e2db7cf77088924e56919
Also, consistently use asize / a_sz for the allocated size, psize / p_sz
for the physical size. Where the latter accounts for possible size
reduction because of compression, whereas the former accounts for possible
size expansion because of alignment requirements.
The code still assumes that either underlying storage subsystems or
hardware is able to do read-modify-write when an L2ARC buffer size is
not a multiple of a disk's block size. This is true for 4KB sector disks
that provide 512B sector emulation, but may not be true in general.
In other words, we currently do not have any code to make sure that
an L2ARC buffer, whether compressed or not, which is used for physical I/O
has a suitable size.
Diffs
-----
usr/src/uts/common/fs/zfs/arc.c 69d16af3b669458e3937fe0c6a4d91755bc6e2a7
Diff: https://reviews.csiden.org/r/112/diff/
Testing
-------
Thanks,
Andriy Gapon
Andriy Gapon
2014-10-08 13:54:55 UTC
Permalink
Post by Andriy Gapon
usr/src/uts/common/fs/zfs/arc.c, lines 4803-4804
<https://reviews.csiden.org/r/112/diff/1/?file=10378#file10378line4803>
I see that you're keeping the meaning of these stats the same, but I think they were wrong before. What would you think about changing them to the (new, correct definition) write_asize?
Yes, I agree that it would be better to update these (duplicate?) stats to the new meaning.
One issue is that they are increased in just this one place, but there are many places where they are decreased, so vdev_psize_to_asize() would have to be littered over a lot more code.

Also, there could be a semi-theorical discussion about what an actual size of a write is and what the used space actuall is. For example, let ashift be 12 (for 4KB physical sector size) and let's assume that we write a 1KB buffer (zio->io_size = 1024) and then skip 3KB so that the next buffer would be written at an aligned offset. A disk will have to do read-modify-write, so internally it writes 4KB, but for the OS the write is 1KB. And should we treat that skipped 3KB remainder of a physical block as a used space or as a some sort of a hole?

But then there is this issue https://www.illumos.org/issues/5220. I think that if that issue is fixed then the stats will automatically become correct and we will not actually need vdev_psize_to_asize() in l2arc_write_buffers() as all write sizes will be ashift aligned. The bug is about disks that can not emulate 512B sectors, however I think that it should make things less confusing for 512e kind of disks as well.


- Andriy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.csiden.org/r/112/#review275
-----------------------------------------------------------
Post by Andriy Gapon
-----------------------------------------------------------
https://reviews.csiden.org/r/112/
-----------------------------------------------------------
(Updated Oct. 7, 2014, 7:04 p.m.)
Review request for OpenZFS Developer Mailing List, Matthew Ahrens and Saso Kiselkov.
Repository: illumos-gate
Description
-------
If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.
The discrepancy between the write size calculation and the actual increment
to l2ad_hand was introduced in
commit e14bb3258d05c1b1077e2db7cf77088924e56919
Also, consistently use asize / a_sz for the allocated size, psize / p_sz
for the physical size. Where the latter accounts for possible size
reduction because of compression, whereas the former accounts for possible
size expansion because of alignment requirements.
The code still assumes that either underlying storage subsystems or
hardware is able to do read-modify-write when an L2ARC buffer size is
not a multiple of a disk's block size. This is true for 4KB sector disks
that provide 512B sector emulation, but may not be true in general.
In other words, we currently do not have any code to make sure that
an L2ARC buffer, whether compressed or not, which is used for physical I/O
has a suitable size.
Diffs
-----
usr/src/uts/common/fs/zfs/arc.c 69d16af3b669458e3937fe0c6a4d91755bc6e2a7
Diff: https://reviews.csiden.org/r/112/diff/
Testing
-------
Thanks,
Andriy Gapon
Loading...