1 Notes on Filesystem Layout 1 Notes on Filesystem Layout 2 -------------------------- 2 -------------------------- 3 3 4 These notes describe what mkcramfs generates. 4 These notes describe what mkcramfs generates. Kernel requirements are 5 a bit looser, e.g. it doesn't care if the <fil 5 a bit looser, e.g. it doesn't care if the <file_data> items are 6 swapped around (though it does care that direc 6 swapped around (though it does care that directory entries (inodes) in 7 a given directory are contiguous, as this is u 7 a given directory are contiguous, as this is used by readdir). 8 8 9 All data is currently in host-endian format; n 9 All data is currently in host-endian format; neither mkcramfs nor the 10 kernel ever do swabbing. (See section `Block 10 kernel ever do swabbing. (See section `Block Size' below.) 11 11 12 <filesystem>: 12 <filesystem>: 13 <superblock> 13 <superblock> 14 <directory_structure> 14 <directory_structure> 15 <data> 15 <data> 16 16 17 <superblock>: struct cramfs_super (see cramfs_ 17 <superblock>: struct cramfs_super (see cramfs_fs.h). 18 18 19 <directory_structure>: 19 <directory_structure>: 20 For each file: 20 For each file: 21 struct cramfs_inode (see cramf 21 struct cramfs_inode (see cramfs_fs.h). 22 Filename. Not generally null- 22 Filename. Not generally null-terminated, but it is 23 null-padded to a multiple of 23 null-padded to a multiple of 4 bytes. 24 24 25 The order of inode traversal is described as " 25 The order of inode traversal is described as "width-first" (not to be 26 confused with breadth-first); i.e. like depth- 26 confused with breadth-first); i.e. like depth-first but listing all of 27 a directory's entries before recursing down it 27 a directory's entries before recursing down its subdirectories: the 28 same order as `ls -AUR' (but without the /^\.. 28 same order as `ls -AUR' (but without the /^\..*:$/ directory header 29 lines); put another way, the same order as `fi 29 lines); put another way, the same order as `find -type d -exec 30 ls -AU1 {} \;'. 30 ls -AU1 {} \;'. 31 31 32 Beginning in 2.4.7, directory entries are sort 32 Beginning in 2.4.7, directory entries are sorted. This optimization 33 allows cramfs_lookup to return more quickly wh 33 allows cramfs_lookup to return more quickly when a filename does not 34 exist, speeds up user-space directory sorts, e 34 exist, speeds up user-space directory sorts, etc. 35 35 36 <data>: 36 <data>: 37 One <file_data> for each file that's e 37 One <file_data> for each file that's either a symlink or a 38 regular file of non-zero st_size. 38 regular file of non-zero st_size. 39 39 40 <file_data>: 40 <file_data>: 41 nblocks * <block_pointer> 41 nblocks * <block_pointer> 42 (where nblocks = (st_size - 1) / blks 42 (where nblocks = (st_size - 1) / blksize + 1) 43 nblocks * <block> 43 nblocks * <block> 44 padding to multiple of 4 bytes 44 padding to multiple of 4 bytes 45 45 46 The i'th <block_pointer> for a file stores the 46 The i'th <block_pointer> for a file stores the byte offset of the 47 *end* of the i'th <block> (i.e. one past the l 47 *end* of the i'th <block> (i.e. one past the last byte, which is the 48 same as the start of the (i+1)'th <block> if t 48 same as the start of the (i+1)'th <block> if there is one). The first 49 <block> immediately follows the last <block_po 49 <block> immediately follows the last <block_pointer> for the file. 50 <block_pointer>s are each 32 bits long. 50 <block_pointer>s are each 32 bits long. 51 51 52 When the CRAMFS_FLAG_EXT_BLOCK_POINTERS capabi 52 When the CRAMFS_FLAG_EXT_BLOCK_POINTERS capability bit is set, each 53 <block_pointer>'s top bits may contain special 53 <block_pointer>'s top bits may contain special flags as follows: 54 54 55 CRAMFS_BLK_FLAG_UNCOMPRESSED (bit 31): 55 CRAMFS_BLK_FLAG_UNCOMPRESSED (bit 31): 56 The block data is not compressed and s 56 The block data is not compressed and should be copied verbatim. 57 57 58 CRAMFS_BLK_FLAG_DIRECT_PTR (bit 30): 58 CRAMFS_BLK_FLAG_DIRECT_PTR (bit 30): 59 The <block_pointer> stores the actual 59 The <block_pointer> stores the actual block start offset and not 60 its end, shifted right by 2 bits. The 60 its end, shifted right by 2 bits. The block must therefore be 61 aligned to a 4-byte boundary. The bloc 61 aligned to a 4-byte boundary. The block size is either blksize 62 if CRAMFS_BLK_FLAG_UNCOMPRESSED is als 62 if CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified, otherwise 63 the compressed data length is included 63 the compressed data length is included in the first 2 bytes of 64 the block data. This is used to allow 64 the block data. This is used to allow discontiguous data layout 65 and specific data block alignments e.g 65 and specific data block alignments e.g. for XIP applications. 66 66 67 67 68 The order of <file_data>'s is a depth-first de 68 The order of <file_data>'s is a depth-first descent of the directory 69 tree, i.e. the same order as `find -size +0 \( 69 tree, i.e. the same order as `find -size +0 \( -type f -o -type l \) 70 -print'. 70 -print'. 71 71 72 72 73 <block>: The i'th <block> is the output of zli 73 <block>: The i'th <block> is the output of zlib's compress function 74 applied to the i'th blksize-sized chunk of the 74 applied to the i'th blksize-sized chunk of the input data if the 75 corresponding CRAMFS_BLK_FLAG_UNCOMPRESSED <bl 75 corresponding CRAMFS_BLK_FLAG_UNCOMPRESSED <block_ptr> bit is not set, 76 otherwise it is the input data directly. 76 otherwise it is the input data directly. 77 (For the last <block> of the file, the input m 77 (For the last <block> of the file, the input may of course be smaller.) 78 Each <block> may be a different size. (See <b 78 Each <block> may be a different size. (See <block_pointer> above.) 79 79 80 <block>s are merely byte-aligned, not generall 80 <block>s are merely byte-aligned, not generally u32-aligned. 81 81 82 When CRAMFS_BLK_FLAG_DIRECT_PTR is specified t 82 When CRAMFS_BLK_FLAG_DIRECT_PTR is specified then the corresponding 83 <block> may be located anywhere and not necess 83 <block> may be located anywhere and not necessarily contiguous with 84 the previous/next blocks. In that case it is m 84 the previous/next blocks. In that case it is minimally u32-aligned. 85 If CRAMFS_BLK_FLAG_UNCOMPRESSED is also specif 85 If CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified then the size is always 86 blksize except for the last block which is lim 86 blksize except for the last block which is limited by the file length. 87 If CRAMFS_BLK_FLAG_DIRECT_PTR is set and CRAMF 87 If CRAMFS_BLK_FLAG_DIRECT_PTR is set and CRAMFS_BLK_FLAG_UNCOMPRESSED 88 is not set then the first 2 bytes of the block 88 is not set then the first 2 bytes of the block contains the size of the 89 remaining block data as this cannot be determi 89 remaining block data as this cannot be determined from the placement of 90 logically adjacent blocks. 90 logically adjacent blocks. 91 91 92 92 93 Holes 93 Holes 94 ----- 94 ----- 95 95 96 This kernel supports cramfs holes (i.e. [effic 96 This kernel supports cramfs holes (i.e. [efficient representation of] 97 blocks in uncompressed data consisting entirel 97 blocks in uncompressed data consisting entirely of NUL bytes), but by 98 default mkcramfs doesn't test for & create hol 98 default mkcramfs doesn't test for & create holes, since cramfs in 99 kernels up to at least 2.3.39 didn't support h 99 kernels up to at least 2.3.39 didn't support holes. Run mkcramfs 100 with -z if you want it to create files that ca 100 with -z if you want it to create files that can have holes in them. 101 101 102 102 103 Tools 103 Tools 104 ----- 104 ----- 105 105 106 The cramfs user-space tools, including mkcramf 106 The cramfs user-space tools, including mkcramfs and cramfsck, are 107 located at <http://sourceforge.net/projects/cr 107 located at <http://sourceforge.net/projects/cramfs/>. 108 108 109 109 110 Future Development 110 Future Development 111 ================== 111 ================== 112 112 113 Block Size 113 Block Size 114 ---------- 114 ---------- 115 115 116 (Block size in cramfs refers to the size of in 116 (Block size in cramfs refers to the size of input data that is 117 compressed at a time. It's intended to be som 117 compressed at a time. It's intended to be somewhere around 118 PAGE_SIZE for cramfs_read_folio's convenience. 118 PAGE_SIZE for cramfs_read_folio's convenience.) 119 119 120 The superblock ought to indicate the block siz 120 The superblock ought to indicate the block size that the fs was 121 written for, since comments in <linux/pagemap. 121 written for, since comments in <linux/pagemap.h> indicate that 122 PAGE_SIZE may grow in future (if I interpret t 122 PAGE_SIZE may grow in future (if I interpret the comment 123 correctly). 123 correctly). 124 124 125 Currently, mkcramfs #define's PAGE_SIZE as 409 125 Currently, mkcramfs #define's PAGE_SIZE as 4096 and uses that 126 for blksize, whereas Linux-2.3.39 uses its PAG 126 for blksize, whereas Linux-2.3.39 uses its PAGE_SIZE, which in 127 turn is defined as PAGE_SIZE (which can be as 127 turn is defined as PAGE_SIZE (which can be as large as 32KB on arm). 128 This discrepancy is a bug, though it's not cle 128 This discrepancy is a bug, though it's not clear which should be 129 changed. 129 changed. 130 130 131 One option is to change mkcramfs to take its P 131 One option is to change mkcramfs to take its PAGE_SIZE from 132 <asm/page.h>. Personally I don't like this op 132 <asm/page.h>. Personally I don't like this option, but it does 133 require the least amount of change: just chang 133 require the least amount of change: just change `#define 134 PAGE_SIZE (4096)' to `#include <asm/page.h>'. 134 PAGE_SIZE (4096)' to `#include <asm/page.h>'. The disadvantage 135 is that the generated cramfs cannot always be 135 is that the generated cramfs cannot always be shared between different 136 kernels, not even necessarily kernels of the s 136 kernels, not even necessarily kernels of the same architecture if 137 PAGE_SIZE is subject to change between kernel 137 PAGE_SIZE is subject to change between kernel versions 138 (currently possible with arm and ia64). 138 (currently possible with arm and ia64). 139 139 140 The remaining options try to make cramfs more 140 The remaining options try to make cramfs more sharable. 141 141 142 One part of that is addressing endianness. Th 142 One part of that is addressing endianness. The two options here are 143 `always use little-endian' (like ext2fs) or `w 143 `always use little-endian' (like ext2fs) or `writer chooses 144 endianness; kernel adapts at runtime'. Little 144 endianness; kernel adapts at runtime'. Little-endian wins because of 145 code simplicity and little CPU overhead even o 145 code simplicity and little CPU overhead even on big-endian machines. 146 146 147 The cost of swabbing is changing the code to u 147 The cost of swabbing is changing the code to use the le32_to_cpu 148 etc. macros as used by ext2fs. We don't need 148 etc. macros as used by ext2fs. We don't need to swab the compressed 149 data, only the superblock, inodes and block po 149 data, only the superblock, inodes and block pointers. 150 150 151 151 152 The other part of making cramfs more sharable 152 The other part of making cramfs more sharable is choosing a block 153 size. The options are: 153 size. The options are: 154 154 155 1. Always 4096 bytes. 155 1. Always 4096 bytes. 156 156 157 2. Writer chooses blocksize; kernel adapts b 157 2. Writer chooses blocksize; kernel adapts but rejects blocksize > 158 PAGE_SIZE. 158 PAGE_SIZE. 159 159 160 3. Writer chooses blocksize; kernel adapts e 160 3. Writer chooses blocksize; kernel adapts even to blocksize > 161 PAGE_SIZE. 161 PAGE_SIZE. 162 162 163 It's easy enough to change the kernel to use a 163 It's easy enough to change the kernel to use a smaller value than 164 PAGE_SIZE: just make cramfs_read_folio read mu 164 PAGE_SIZE: just make cramfs_read_folio read multiple blocks. 165 165 166 The cost of option 1 is that kernels with a la 166 The cost of option 1 is that kernels with a larger PAGE_SIZE 167 value don't get as good compression as they ca 167 value don't get as good compression as they can. 168 168 169 The cost of option 2 relative to option 1 is t 169 The cost of option 2 relative to option 1 is that the code uses 170 variables instead of #define'd constants. The 170 variables instead of #define'd constants. The gain is that people 171 with kernels having larger PAGE_SIZE can make 171 with kernels having larger PAGE_SIZE can make use of that if 172 they don't mind their cramfs being inaccessibl 172 they don't mind their cramfs being inaccessible to kernels with 173 smaller PAGE_SIZE values. 173 smaller PAGE_SIZE values. 174 174 175 Option 3 is easy to implement if we don't mind 175 Option 3 is easy to implement if we don't mind being CPU-inefficient: 176 e.g. get read_folio to decompress to a buffer 176 e.g. get read_folio to decompress to a buffer of size MAX_BLKSIZE (which 177 must be no larger than 32KB) and discard what 177 must be no larger than 32KB) and discard what it doesn't need. 178 Getting read_folio to read into all the covere 178 Getting read_folio to read into all the covered pages is harder. 179 179 180 The main advantage of option 3 over 1, 2, is b 180 The main advantage of option 3 over 1, 2, is better compression. The 181 cost is greater complexity. Probably not wort 181 cost is greater complexity. Probably not worth it, but I hope someone 182 will disagree. (If it is implemented, then I' 182 will disagree. (If it is implemented, then I'll re-use that code in 183 e2compr.) 183 e2compr.) 184 184 185 185 186 Another cost of 2 and 3 over 1 is making mkcra 186 Another cost of 2 and 3 over 1 is making mkcramfs use a different 187 block size, but that just means adding and par 187 block size, but that just means adding and parsing a -b option. 188 188 189 189 190 Inode Size 190 Inode Size 191 ---------- 191 ---------- 192 192 193 Given that cramfs will probably be used for CD 193 Given that cramfs will probably be used for CDs etc. as well as just 194 silicon ROMs, it might make sense to expand th 194 silicon ROMs, it might make sense to expand the inode a little from 195 its current 12 bytes. Inodes other than the r 195 its current 12 bytes. Inodes other than the root inode are followed 196 by filename, so the expansion doesn't even hav 196 by filename, so the expansion doesn't even have to be a multiple of 4 197 bytes. 197 bytes.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.