"Fossies" - the Fresh Open Source Software Archive  

Source code changes of the file "md.4" between
mdadm-4.1.tar.gz and mdadm-4.2.tar.gz

About: mdadm is a tool for creating, managing and monitoring device arrays using the "md" driver in Linux, also known as Software RAID arrays.

md.4  (mdadm-4.1):md.4  (mdadm-4.2)
skipping to change at line 196 skipping to change at line 196
drives have been assigned one chunk. This collection of chunks forms a drives have been assigned one chunk. This collection of chunks forms a
.BR stripe . .BR stripe .
Further chunks are gathered into stripes in the same way, and are Further chunks are gathered into stripes in the same way, and are
assigned to the remaining space in the drives. assigned to the remaining space in the drives.
If devices in the array are not all the same size, then once the If devices in the array are not all the same size, then once the
smallest device has been exhausted, the RAID0 driver starts smallest device has been exhausted, the RAID0 driver starts
collecting chunks into smaller stripes that only span the drives which collecting chunks into smaller stripes that only span the drives which
still have remaining space. still have remaining space.
A bug was introduced in linux 3.14 which changed the layout of blocks in
a RAID0 beyond the region that is striped over all devices. This bug
does not affect an array with all devices the same size, but can affect
other RAID0 arrays.
Linux 5.4 (and some stable kernels to which the change was backported)
will not normally assemble such an array as it cannot know which layout
to use. There is a module parameter "raid0.default_layout" which can be
set to "1" to force the kernel to use the pre-3.14 layout or to "2" to
force it to use the 3.14-and-later layout. when creating a new RAID0
array,
.I mdadm
will record the chosen layout in the metadata in a way that allows newer
kernels to assemble the array without needing a module parameter.
To assemble an old array on a new kernel without using the module parameter,
use either the
.B "--update=layout-original"
option or the
.B "--update=layout-alternate"
option.
Once you have updated the layout you will not be able to mount the array
on an older kernel. If you need to revert to an older kernel, the
layout information can be erased with the
.B "--update=layout-unspecificed"
option. If you use this option to
.B --assemble
while running a newer kernel, the array will NOT assemble, but the
metadata will be update so that it can be assembled on an older kernel.
No that setting the layout to "unspecified" removes protections against
this bug, and you must be sure that the kernel you use matches the
layout of the array.
.SS RAID1 .SS RAID1
A RAID1 array is also known as a mirrored set (though mirrors tend to A RAID1 array is also known as a mirrored set (though mirrors tend to
provide reflected images, which RAID1 does not) or a plex. provide reflected images, which RAID1 does not) or a plex.
Once initialised, each device in a RAID1 array contains exactly the Once initialised, each device in a RAID1 array contains exactly the
same data. Changes are written to all devices in parallel. Data is same data. Changes are written to all devices in parallel. Data is
read from any one device. The driver attempts to distribute read read from any one device. The driver attempts to distribute read
requests across all devices to maximise performance. requests across all devices to maximise performance.
skipping to change at line 875 skipping to change at line 910
that succeeds, the address will be removed from the list. that succeeds, the address will be removed from the list.
This allows an array to fail more gracefully - a few blocks on This allows an array to fail more gracefully - a few blocks on
different devices can be faulty without taking the whole array out of different devices can be faulty without taking the whole array out of
action. action.
The list is particularly useful when recovering to a spare. If a few blocks The list is particularly useful when recovering to a spare. If a few blocks
cannot be read from the other devices, the bulk of the recovery can cannot be read from the other devices, the bulk of the recovery can
complete and those few bad blocks will be recorded in the bad block list. complete and those few bad blocks will be recorded in the bad block list.
.SS RAID456 WRITE JOURNAL .SS RAID WRITE HOLE
Due to non-atomicity nature of RAID write operations, interruption of Due to non-atomicity nature of RAID write operations,
write operations (system crash, etc.) to RAID456 array can lead to interruption of write operations (system crash, etc.) to RAID456
inconsistent parity and data loss (so called RAID-5 write hole). array can lead to inconsistent parity and data loss (so called
RAID-5 write hole).
To plug the write hole, from Linux 4.4 (to be confirmed), To plug the write hole md supports two mechanisms described below.
.I md
supports write ahead journal for RAID456. When the array is created, .TP
an additional journal device can be added to the array through DIRTY STRIPE JOURNAL
.IR write-journal From Linux 4.4, md supports write ahead journal for RAID456.
option. The RAID write journal works similar to file system journals. When the array is created, an additional journal device can be added to
Before writing to the data disks, md persists data AND parity of the the array through write-journal option. The RAID write journal works
stripe to the journal device. After crashes, md searches the journal similar to file system journals. Before writing to the data
device for incomplete write operations, and replay them to the data disks, md persists data AND parity of the stripe to the journal
disks. device. After crashes, md searches the journal device for
incomplete write operations, and replay them to the data disks.
When the journal device fails, the RAID array is forced to run in When the journal device fails, the RAID array is forced to run in
read-only mode. read-only mode.
.TP
PARTIAL PARITY LOG
From Linux 4.12 md supports Partial Parity Log (PPL) for RAID5 arrays only.
Partial parity for a write operation is the XOR of stripe data chunks not
modified by the write. PPL is stored in the metadata region of RAID member drive
s,
no additional journal drive is needed.
After crashes, if one of the not modified data disks of
the stripe is missing, this updated parity can be used to recover its
data.
This mechanism is documented more fully in the file
Documentation/md/raid5-ppl.rst
.SS WRITE-BEHIND .SS WRITE-BEHIND
From Linux 2.6.14, From Linux 2.6.14,
.I md .I md
supports WRITE-BEHIND on RAID1 arrays. supports WRITE-BEHIND on RAID1 arrays.
This allows certain devices in the array to be flagged as This allows certain devices in the array to be flagged as
.IR write-mostly . .IR write-mostly .
MD will only read from such devices if there is no MD will only read from such devices if there is no
other option. other option.
skipping to change at line 1040 skipping to change at line 1089
Each block device appears as a directory in Each block device appears as a directory in
.I sysfs .I sysfs
(which is usually mounted at (which is usually mounted at
.BR /sys ). .BR /sys ).
For MD devices, this directory will contain a subdirectory called For MD devices, this directory will contain a subdirectory called
.B md .B md
which contains various files for providing access to information about which contains various files for providing access to information about
the array. the array.
This interface is documented more fully in the file This interface is documented more fully in the file
.B Documentation/md.txt .B Documentation/admin-guide/md.rst
which is distributed with the kernel sources. That file should be which is distributed with the kernel sources. That file should be
consulted for full documentation. The following are just a selection consulted for full documentation. The following are just a selection
of attribute files that are available. of attribute files that are available.
.TP .TP
.B md/sync_speed_min .B md/sync_speed_min
This value, if set, overrides the system-wide setting in This value, if set, overrides the system-wide setting in
.B /proc/sys/dev/raid/speed_limit_min .B /proc/sys/dev/raid/speed_limit_min
for this array only. for this array only.
Writing the value Writing the value
skipping to change at line 1101 skipping to change at line 1150
.TP .TP
.B md/preread_bypass_threshold .B md/preread_bypass_threshold
This is only available on RAID5 and RAID6. This variable sets the This is only available on RAID5 and RAID6. This variable sets the
number of times MD will service a full-stripe-write before servicing a number of times MD will service a full-stripe-write before servicing a
stripe that requires some "prereading". For fairness this defaults to stripe that requires some "prereading". For fairness this defaults to
1. Valid values are 0 to stripe_cache_size. Setting this to 0 1. Valid values are 0 to stripe_cache_size. Setting this to 0
maximizes sequential-write throughput at the cost of fairness to threads maximizes sequential-write throughput at the cost of fairness to threads
doing small or random writes. doing small or random writes.
.TP
.B md/bitmap/backlog
The value stored in the file only has any effect on RAID1 when write-mostly
devices are active, and write requests to those devices are proceed in the
background.
This variable sets a limit on the number of concurrent background writes,
the valid values are 0 to 16383, 0 means that write-behind is not allowed,
while any other number means it can happen. If there are more write requests
than the number, new writes will by synchronous.
.TP
.B md/bitmap/can_clear
This is for externally managed bitmaps, where the kernel writes the bitmap
itself, but metadata describing the bitmap is managed by mdmon or similar.
When the array is degraded, bits mustn't be cleared. When the array becomes
optimal again, bit can be cleared, but first the metadata needs to record
the current event count. So md sets this to 'false' and notifies mdmon,
then mdmon updates the metadata and writes 'true'.
There is no code in mdmon to actually do this, so maybe it doesn't even
work.
.TP
.B md/bitmap/chunksize
The bitmap chunksize can only be changed when no bitmap is active, and
the value should be power of 2 and at least 512.
.TP
.B md/bitmap/location
This indicates where the write-intent bitmap for the array is stored.
It can be "none" or "file" or a signed offset from the array metadata
- measured in sectors. You cannot set a file by writing here - that can
only be done with the SET_BITMAP_FILE ioctl.
Write 'none' to 'bitmap/location' will clear bitmap, and the previous
location value must be write to it to restore bitmap.
.TP
.B md/bitmap/max_backlog_used
This keeps track of the maximum number of concurrent write-behind requests
for an md array, writing any value to this file will clear it.
.TP
.B md/bitmap/metadata
This can be 'internal' or 'clustered' or 'external'. 'internal' is set
by default, which means the metadata for bitmap is stored in the first 256
bytes of the bitmap space. 'clustered' means separate bitmap metadata are
used for each cluster node. 'external' means that bitmap metadata is managed
externally to the kernel.
.TP
.B md/bitmap/space
This shows the space (in sectors) which is available at md/bitmap/location,
and allows the kernel to know when it is safe to resize the bitmap to match
a resized array. It should big enough to contain the total bytes in the bitmap.
For 1.0 metadata, assume we can use up to the superblock if before, else
to 4K beyond superblock. For other metadata versions, assume no change is
possible.
.TP
.B md/bitmap/time_base
This shows the time (in seconds) between disk flushes, and is used to looking
for bits in the bitmap to be cleared.
The default value is 5 seconds, and it should be an unsigned long value.
.SS KERNEL PARAMETERS .SS KERNEL PARAMETERS
The md driver recognised several different kernel parameters. The md driver recognised several different kernel parameters.
.TP .TP
.B raid=noautodetect .B raid=noautodetect
This will disable the normal detection of md arrays that happens at This will disable the normal detection of md arrays that happens at
boot time. If a drive is partitioned with MS-DOS style partitions, boot time. If a drive is partitioned with MS-DOS style partitions,
then if any of the 4 main partitions has a partition type of 0xFD, then if any of the 4 main partitions has a partition type of 0xFD,
then that partition will normally be inspected to see if it is part of then that partition will normally be inspected to see if it is part of
an MD array, and if any full arrays are found, they are started. This an MD array, and if any full arrays are found, they are started. This
 End of changes. 6 change blocks. 
16 lines changed or deleted 135 lines changed or added

Home  |  About  |  Features  |  All  |  Newest  |  Dox  |  Diffs  |  RSS Feeds  |  Screenshots  |  Comments  |  Imprint  |  Privacy  |  HTTP(S)