1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 Layout 3 Layout 4 ------ 4 ------ 5 5 6 The layout of a standard block group is approx 6 The layout of a standard block group is approximately as follows (each 7 of these fields is discussed in a separate sec 7 of these fields is discussed in a separate section below): 8 8 9 .. list-table:: 9 .. list-table:: 10 :widths: 1 1 1 1 1 1 1 1 10 :widths: 1 1 1 1 1 1 1 1 11 :header-rows: 1 11 :header-rows: 1 12 12 13 * - Group 0 Padding 13 * - Group 0 Padding 14 - ext4 Super Block 14 - ext4 Super Block 15 - Group Descriptors 15 - Group Descriptors 16 - Reserved GDT Blocks 16 - Reserved GDT Blocks 17 - Data Block Bitmap 17 - Data Block Bitmap 18 - inode Bitmap 18 - inode Bitmap 19 - inode Table 19 - inode Table 20 - Data Blocks 20 - Data Blocks 21 * - 1024 bytes 21 * - 1024 bytes 22 - 1 block 22 - 1 block 23 - many blocks 23 - many blocks 24 - many blocks 24 - many blocks 25 - 1 block 25 - 1 block 26 - 1 block 26 - 1 block 27 - many blocks 27 - many blocks 28 - many more blocks 28 - many more blocks 29 29 30 For the special case of block group 0, the fir 30 For the special case of block group 0, the first 1024 bytes are unused, 31 to allow for the installation of x86 boot sect 31 to allow for the installation of x86 boot sectors and other oddities. 32 The superblock will start at offset 1024 bytes 32 The superblock will start at offset 1024 bytes, whichever block that 33 happens to be (usually 0). However, if for som 33 happens to be (usually 0). However, if for some reason the block size = 34 1024, then block 0 is marked in use and the su 34 1024, then block 0 is marked in use and the superblock goes in block 1. 35 For all other block groups, there is no paddin 35 For all other block groups, there is no padding. 36 36 37 The ext4 driver primarily works with the super 37 The ext4 driver primarily works with the superblock and the group 38 descriptors that are found in block group 0. R 38 descriptors that are found in block group 0. Redundant copies of the 39 superblock and group descriptors are written t 39 superblock and group descriptors are written to some of the block groups 40 across the disk in case the beginning of the d 40 across the disk in case the beginning of the disk gets trashed, though 41 not all block groups necessarily host a redund 41 not all block groups necessarily host a redundant copy (see following 42 paragraph for more details). If the group does 42 paragraph for more details). If the group does not have a redundant 43 copy, the block group begins with the data blo 43 copy, the block group begins with the data block bitmap. Note also that 44 when the filesystem is freshly formatted, mkfs 44 when the filesystem is freshly formatted, mkfs will allocate “reserve 45 GDT block” space after the block group descr 45 GDT block” space after the block group descriptors and before the start 46 of the block bitmaps to allow for future expan 46 of the block bitmaps to allow for future expansion of the filesystem. By 47 default, a filesystem is allowed to increase i 47 default, a filesystem is allowed to increase in size by a factor of 48 1024x over the original filesystem size. 48 1024x over the original filesystem size. 49 49 50 The location of the inode table is given by `` 50 The location of the inode table is given by ``grp.bg_inode_table_*``. It 51 is continuous range of blocks large enough to 51 is continuous range of blocks large enough to contain 52 ``sb.s_inodes_per_group * sb.s_inode_size`` by 52 ``sb.s_inodes_per_group * sb.s_inode_size`` bytes. 53 53 54 As for the ordering of items in a block group, 54 As for the ordering of items in a block group, it is generally 55 established that the super block and the group 55 established that the super block and the group descriptor table, if 56 present, will be at the beginning of the block 56 present, will be at the beginning of the block group. The bitmaps and 57 the inode table can be anywhere, and it is qui 57 the inode table can be anywhere, and it is quite possible for the 58 bitmaps to come after the inode table, or for 58 bitmaps to come after the inode table, or for both to be in different 59 groups (flex_bg). Leftover space is used for f 59 groups (flex_bg). Leftover space is used for file data blocks, indirect 60 block maps, extent tree blocks, and extended a 60 block maps, extent tree blocks, and extended attributes. 61 61 62 Flexible Block Groups 62 Flexible Block Groups 63 --------------------- 63 --------------------- 64 64 65 Starting in ext4, there is a new feature calle 65 Starting in ext4, there is a new feature called flexible block groups 66 (flex_bg). In a flex_bg, several block groups 66 (flex_bg). In a flex_bg, several block groups are tied together as one 67 logical block group; the bitmap spaces and the 67 logical block group; the bitmap spaces and the inode table space in the 68 first block group of the flex_bg are expanded 68 first block group of the flex_bg are expanded to include the bitmaps 69 and inode tables of all other block groups in 69 and inode tables of all other block groups in the flex_bg. For example, 70 if the flex_bg size is 4, then group 0 will co 70 if the flex_bg size is 4, then group 0 will contain (in order) the 71 superblock, group descriptors, data block bitm 71 superblock, group descriptors, data block bitmaps for groups 0-3, inode 72 bitmaps for groups 0-3, inode tables for group 72 bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining 73 space in group 0 is for file data. The effect 73 space in group 0 is for file data. The effect of this is to group the 74 block group metadata close together for faster 74 block group metadata close together for faster loading, and to enable 75 large files to be continuous on disk. Backup c 75 large files to be continuous on disk. Backup copies of the superblock 76 and group descriptors are always at the beginn 76 and group descriptors are always at the beginning of block groups, even 77 if flex_bg is enabled. The number of block gro 77 if flex_bg is enabled. The number of block groups that make up a 78 flex_bg is given by 2 ^ ``sb.s_log_groups_per_ 78 flex_bg is given by 2 ^ ``sb.s_log_groups_per_flex``. 79 79 80 Meta Block Groups 80 Meta Block Groups 81 ----------------- 81 ----------------- 82 82 83 Without the option META_BG, for safety concern 83 Without the option META_BG, for safety concerns, all block group 84 descriptors copies are kept in the first block 84 descriptors copies are kept in the first block group. Given the default 85 128MiB(2^27 bytes) block group size and 64-byt 85 128MiB(2^27 bytes) block group size and 64-byte group descriptors, ext4 86 can have at most 2^27/64 = 2^21 block groups. 86 can have at most 2^27/64 = 2^21 block groups. This limits the entire 87 filesystem size to 2^21 * 2^27 = 2^48bytes or 87 filesystem size to 2^21 * 2^27 = 2^48bytes or 256TiB. 88 88 89 The solution to this problem is to use the met 89 The solution to this problem is to use the metablock group feature 90 (META_BG), which is already in ext3 for all 2. 90 (META_BG), which is already in ext3 for all 2.6 releases. With the 91 META_BG feature, ext4 filesystems are partitio 91 META_BG feature, ext4 filesystems are partitioned into many metablock 92 groups. Each metablock group is a cluster of b 92 groups. Each metablock group is a cluster of block groups whose group 93 descriptor structures can be stored in a singl 93 descriptor structures can be stored in a single disk block. For ext4 94 filesystems with 4 KB block size, a single met 94 filesystems with 4 KB block size, a single metablock group partition 95 includes 64 block groups, or 8 GiB of disk spa 95 includes 64 block groups, or 8 GiB of disk space. The metablock group 96 feature moves the location of the group descri 96 feature moves the location of the group descriptors from the congested 97 first block group of the whole filesystem into 97 first block group of the whole filesystem into the first group of each 98 metablock group itself. The backups are in the 98 metablock group itself. The backups are in the second and last group of 99 each metablock group. This increases the 2^21 99 each metablock group. This increases the 2^21 maximum block groups limit 100 to the hard limit 2^32, allowing support for a 100 to the hard limit 2^32, allowing support for a 512PiB filesystem. 101 101 102 The change in the filesystem format replaces t 102 The change in the filesystem format replaces the current scheme where 103 the superblock is followed by a variable-lengt 103 the superblock is followed by a variable-length set of block group 104 descriptors. Instead, the superblock and a sin 104 descriptors. Instead, the superblock and a single block group descriptor 105 block is placed at the beginning of the first, 105 block is placed at the beginning of the first, second, and last block 106 groups in a meta-block group. A meta-block gro 106 groups in a meta-block group. A meta-block group is a collection of 107 block groups which can be described by a singl 107 block groups which can be described by a single block group descriptor 108 block. Since the size of the block group descr !! 108 block. Since the size of the block group descriptor structure is 32 109 bytes, a meta-block group contains 16 block gr !! 109 bytes, a meta-block group contains 32 block groups for filesystems with 110 a 1KB block size, and 64 block groups for file !! 110 a 1KB block size, and 128 block groups for filesystems with a 4KB 111 blocksize. Filesystems can either be created u 111 blocksize. Filesystems can either be created using this new block group 112 descriptor layout, or existing filesystems can 112 descriptor layout, or existing filesystems can be resized on-line, and 113 the field s_first_meta_bg in the superblock wi 113 the field s_first_meta_bg in the superblock will indicate the first 114 block group using this new layout. 114 block group using this new layout. 115 115 116 Please see an important note about ``BLOCK_UNI 116 Please see an important note about ``BLOCK_UNINIT`` in the section about 117 block and inode bitmaps. 117 block and inode bitmaps. 118 118 119 Lazy Block Group Initialization 119 Lazy Block Group Initialization 120 ------------------------------- 120 ------------------------------- 121 121 122 A new feature for ext4 are three block group d 122 A new feature for ext4 are three block group descriptor flags that 123 enable mkfs to skip initializing other parts o 123 enable mkfs to skip initializing other parts of the block group 124 metadata. Specifically, the INODE_UNINIT and B 124 metadata. Specifically, the INODE_UNINIT and BLOCK_UNINIT flags mean 125 that the inode and block bitmaps for that grou 125 that the inode and block bitmaps for that group can be calculated and 126 therefore the on-disk bitmap blocks are not in 126 therefore the on-disk bitmap blocks are not initialized. This is 127 generally the case for an empty block group or 127 generally the case for an empty block group or a block group containing 128 only fixed-location block group metadata. The 128 only fixed-location block group metadata. The INODE_ZEROED flag means 129 that the inode table has been initialized; mkf 129 that the inode table has been initialized; mkfs will unset this flag and 130 rely on the kernel to initialize the inode tab 130 rely on the kernel to initialize the inode tables in the background. 131 131 132 By not writing zeroes to the bitmaps and inode 132 By not writing zeroes to the bitmaps and inode table, mkfs time is 133 reduced considerably. Note the feature flag is 133 reduced considerably. Note the feature flag is RO_COMPAT_GDT_CSUM, 134 but the dumpe2fs output prints this as “unin 134 but the dumpe2fs output prints this as “uninit_bg”. They are the same 135 thing. 135 thing.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.