1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 .. Copyright (C) 2022, Google LLC. 2 .. Copyright (C) 2022, Google LLC. 3 3 4 =============================== 4 =============================== 5 Kernel Memory Sanitizer (KMSAN) 5 Kernel Memory Sanitizer (KMSAN) 6 =============================== 6 =============================== 7 7 8 KMSAN is a dynamic error detector aimed at fin 8 KMSAN is a dynamic error detector aimed at finding uses of uninitialized 9 values. It is based on compiler instrumentatio 9 values. It is based on compiler instrumentation, and is quite similar to the 10 userspace `MemorySanitizer tool`_. 10 userspace `MemorySanitizer tool`_. 11 11 12 An important note is that KMSAN is not intende 12 An important note is that KMSAN is not intended for production use, because it 13 drastically increases kernel memory footprint 13 drastically increases kernel memory footprint and slows the whole system down. 14 14 15 Usage 15 Usage 16 ===== 16 ===== 17 17 18 Building the kernel 18 Building the kernel 19 ------------------- 19 ------------------- 20 20 21 In order to build a kernel with KMSAN you will 21 In order to build a kernel with KMSAN you will need a fresh Clang (14.0.6+). 22 Please refer to `LLVM documentation`_ for the 22 Please refer to `LLVM documentation`_ for the instructions on how to build Clang. 23 23 24 Now configure and build the kernel with CONFIG 24 Now configure and build the kernel with CONFIG_KMSAN enabled. 25 25 26 Example report 26 Example report 27 -------------- 27 -------------- 28 28 29 Here is an example of a KMSAN report:: 29 Here is an example of a KMSAN report:: 30 30 31 ============================================ 31 ===================================================== 32 BUG: KMSAN: uninit-value in test_uninit_kmsa 32 BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test] 33 test_uninit_kmsan_check_memory+0x1be/0x380 33 test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273 34 kunit_run_case_internal lib/kunit/test.c:33 34 kunit_run_case_internal lib/kunit/test.c:333 35 kunit_try_run_case+0x206/0x420 lib/kunit/te 35 kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374 36 kunit_generic_run_threadfn_adapter+0x6d/0xc 36 kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28 37 kthread+0x721/0x850 kernel/kthread.c:327 37 kthread+0x721/0x850 kernel/kthread.c:327 38 ret_from_fork+0x1f/0x30 ??:? 38 ret_from_fork+0x1f/0x30 ??:? 39 39 40 Uninit was stored to memory at: 40 Uninit was stored to memory at: 41 do_uninit_local_array+0xfa/0x110 mm/kmsan/k 41 do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260 42 test_uninit_kmsan_check_memory+0x1a2/0x380 42 test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271 43 kunit_run_case_internal lib/kunit/test.c:33 43 kunit_run_case_internal lib/kunit/test.c:333 44 kunit_try_run_case+0x206/0x420 lib/kunit/te 44 kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374 45 kunit_generic_run_threadfn_adapter+0x6d/0xc 45 kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28 46 kthread+0x721/0x850 kernel/kthread.c:327 46 kthread+0x721/0x850 kernel/kthread.c:327 47 ret_from_fork+0x1f/0x30 ??:? 47 ret_from_fork+0x1f/0x30 ??:? 48 48 49 Local variable uninit created at: 49 Local variable uninit created at: 50 do_uninit_local_array+0x4a/0x110 mm/kmsan/k 50 do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256 51 test_uninit_kmsan_check_memory+0x1a2/0x380 51 test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271 52 52 53 Bytes 4-7 of 8 are uninitialized 53 Bytes 4-7 of 8 are uninitialized 54 Memory access of size 8 starts at ffff888083 54 Memory access of size 8 starts at ffff888083fe3da0 55 55 56 CPU: 0 PID: 6731 Comm: kunit_try_catch Taint 56 CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104 57 Hardware name: QEMU Standard PC (i440FX + PI 57 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 58 ============================================ 58 ===================================================== 59 59 60 The report says that the local variable ``unin 60 The report says that the local variable ``uninit`` was created uninitialized in 61 ``do_uninit_local_array()``. The third stack t 61 ``do_uninit_local_array()``. The third stack trace corresponds to the place 62 where this variable was created. 62 where this variable was created. 63 63 64 The first stack trace shows where the uninit v 64 The first stack trace shows where the uninit value was used (in 65 ``test_uninit_kmsan_check_memory()``). The too 65 ``test_uninit_kmsan_check_memory()``). The tool shows the bytes which were left 66 uninitialized in the local variable, as well a 66 uninitialized in the local variable, as well as the stack where the value was 67 copied to another memory location before use. 67 copied to another memory location before use. 68 68 69 A use of uninitialized value ``v`` is reported 69 A use of uninitialized value ``v`` is reported by KMSAN in the following cases: 70 70 71 - in a condition, e.g. ``if (v) { ... }``; 71 - in a condition, e.g. ``if (v) { ... }``; 72 - in an indexing or pointer dereferencing, e. 72 - in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``; 73 - when it is copied to userspace or hardware, 73 - when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``; 74 - when it is passed as an argument to a funct 74 - when it is passed as an argument to a function, and 75 ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enab 75 ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below). 76 76 77 The mentioned cases (apart from copying data t 77 The mentioned cases (apart from copying data to userspace or hardware, which is 78 a security issue) are considered undefined beh 78 a security issue) are considered undefined behavior from the C11 Standard point 79 of view. 79 of view. 80 80 81 Disabling the instrumentation 81 Disabling the instrumentation 82 ----------------------------- 82 ----------------------------- 83 83 84 A function can be marked with ``__no_kmsan_che 84 A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN 85 ignore uninitialized values in that function a 85 ignore uninitialized values in that function and mark its output as initialized. 86 As a result, the user will not get KMSAN repor 86 As a result, the user will not get KMSAN reports related to that function. 87 87 88 Another function attribute supported by KMSAN 88 Another function attribute supported by KMSAN is ``__no_sanitize_memory``. 89 Applying this attribute to a function will res 89 Applying this attribute to a function will result in KMSAN not instrumenting 90 it, which can be helpful if we do not want the 90 it, which can be helpful if we do not want the compiler to interfere with some 91 low-level code (e.g. that marked with ``noinst 91 low-level code (e.g. that marked with ``noinstr`` which implicitly adds 92 ``__no_sanitize_memory``). 92 ``__no_sanitize_memory``). 93 93 94 This however comes at a cost: stack allocation 94 This however comes at a cost: stack allocations from such functions will have 95 incorrect shadow/origin values, likely leading 95 incorrect shadow/origin values, likely leading to false positives. Functions 96 called from non-instrumented code may also rec 96 called from non-instrumented code may also receive incorrect metadata for their 97 parameters. 97 parameters. 98 98 99 As a rule of thumb, avoid using ``__no_sanitiz 99 As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly. 100 100 101 It is also possible to disable KMSAN for a sin 101 It is also possible to disable KMSAN for a single file (e.g. main.o):: 102 102 103 KMSAN_SANITIZE_main.o := n 103 KMSAN_SANITIZE_main.o := n 104 104 105 or for the whole directory:: 105 or for the whole directory:: 106 106 107 KMSAN_SANITIZE := n 107 KMSAN_SANITIZE := n 108 108 109 in the Makefile. Think of this as applying ``_ 109 in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every 110 function in the file or directory. Most users 110 function in the file or directory. Most users won't need KMSAN_SANITIZE, unless 111 their code gets broken by KMSAN (e.g. runs at 111 their code gets broken by KMSAN (e.g. runs at early boot time). 112 112 113 KMSAN checks can also be temporarily disabled 113 KMSAN checks can also be temporarily disabled for the current task using 114 ``kmsan_disable_current()`` and ``kmsan_enable 114 ``kmsan_disable_current()`` and ``kmsan_enable_current()`` calls. Each 115 ``kmsan_enable_current()`` call must be preced 115 ``kmsan_enable_current()`` call must be preceded by a 116 ``kmsan_disable_current()`` call; these call p 116 ``kmsan_disable_current()`` call; these call pairs may be nested. One needs to 117 be careful with these calls, keeping the regio 117 be careful with these calls, keeping the regions short and preferring other 118 ways to disable instrumentation, where possibl 118 ways to disable instrumentation, where possible. 119 119 120 Support 120 Support 121 ======= 121 ======= 122 122 123 In order for KMSAN to work the kernel must be 123 In order for KMSAN to work the kernel must be built with Clang, which so far is 124 the only compiler that has KMSAN support. The 124 the only compiler that has KMSAN support. The kernel instrumentation pass is 125 based on the userspace `MemorySanitizer tool`_ 125 based on the userspace `MemorySanitizer tool`_. 126 126 127 The runtime library only supports x86_64 at th 127 The runtime library only supports x86_64 at the moment. 128 128 129 How KMSAN works 129 How KMSAN works 130 =============== 130 =============== 131 131 132 KMSAN shadow memory 132 KMSAN shadow memory 133 ------------------- 133 ------------------- 134 134 135 KMSAN associates a metadata byte (also called 135 KMSAN associates a metadata byte (also called shadow byte) with every byte of 136 kernel memory. A bit in the shadow byte is set 136 kernel memory. A bit in the shadow byte is set iff the corresponding bit of the 137 kernel memory byte is uninitialized. Marking t 137 kernel memory byte is uninitialized. Marking the memory uninitialized (i.e. 138 setting its shadow bytes to ``0xff``) is calle 138 setting its shadow bytes to ``0xff``) is called poisoning, marking it 139 initialized (setting the shadow bytes to ``0x0 139 initialized (setting the shadow bytes to ``0x00``) is called unpoisoning. 140 140 141 When a new variable is allocated on the stack, 141 When a new variable is allocated on the stack, it is poisoned by default by 142 instrumentation code inserted by the compiler 142 instrumentation code inserted by the compiler (unless it is a stack variable 143 that is immediately initialized). Any new heap 143 that is immediately initialized). Any new heap allocation done without 144 ``__GFP_ZERO`` is also poisoned. 144 ``__GFP_ZERO`` is also poisoned. 145 145 146 Compiler instrumentation also tracks the shado 146 Compiler instrumentation also tracks the shadow values as they are used along 147 the code. When needed, instrumentation code in 147 the code. When needed, instrumentation code invokes the runtime library in 148 ``mm/kmsan/`` to persist shadow values. 148 ``mm/kmsan/`` to persist shadow values. 149 149 150 The shadow value of a basic or compound type i 150 The shadow value of a basic or compound type is an array of bytes of the same 151 length. When a constant value is written into 151 length. When a constant value is written into memory, that memory is unpoisoned. 152 When a value is read from memory, its shadow m 152 When a value is read from memory, its shadow memory is also obtained and 153 propagated into all the operations which use t 153 propagated into all the operations which use that value. For every instruction 154 that takes one or more values the compiler gen 154 that takes one or more values the compiler generates code that calculates the 155 shadow of the result depending on those values 155 shadow of the result depending on those values and their shadows. 156 156 157 Example:: 157 Example:: 158 158 159 int a = 0xff; // i.e. 0x000000ff 159 int a = 0xff; // i.e. 0x000000ff 160 int b; 160 int b; 161 int c = a | b; 161 int c = a | b; 162 162 163 In this case the shadow of ``a`` is ``0``, sha 163 In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``, 164 shadow of ``c`` is ``0xffffff00``. This means 164 shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of 165 ``c`` are uninitialized, while the lower byte 165 ``c`` are uninitialized, while the lower byte is initialized. 166 166 167 Origin tracking 167 Origin tracking 168 --------------- 168 --------------- 169 169 170 Every four bytes of kernel memory also have a 170 Every four bytes of kernel memory also have a so-called origin mapped to them. 171 This origin describes the point in program exe 171 This origin describes the point in program execution at which the uninitialized 172 value was created. Every origin is associated 172 value was created. Every origin is associated with either the full allocation 173 stack (for heap-allocated memory), or the func 173 stack (for heap-allocated memory), or the function containing the uninitialized 174 variable (for locals). 174 variable (for locals). 175 175 176 When an uninitialized variable is allocated on 176 When an uninitialized variable is allocated on stack or heap, a new origin 177 value is created, and that variable's origin i 177 value is created, and that variable's origin is filled with that value. When a 178 value is read from memory, its origin is also 178 value is read from memory, its origin is also read and kept together with the 179 shadow. For every instruction that takes one o 179 shadow. For every instruction that takes one or more values, the origin of the 180 result is one of the origins corresponding to 180 result is one of the origins corresponding to any of the uninitialized inputs. 181 If a poisoned value is written into memory, it 181 If a poisoned value is written into memory, its origin is written to the 182 corresponding storage as well. 182 corresponding storage as well. 183 183 184 Example 1:: 184 Example 1:: 185 185 186 int a = 42; 186 int a = 42; 187 int b; 187 int b; 188 int c = a + b; 188 int c = a + b; 189 189 190 In this case the origin of ``b`` is generated 190 In this case the origin of ``b`` is generated upon function entry, and is 191 stored to the origin of ``c`` right before the 191 stored to the origin of ``c`` right before the addition result is written into 192 memory. 192 memory. 193 193 194 Several variables may share the same origin ad 194 Several variables may share the same origin address, if they are stored in the 195 same four-byte chunk. In this case every write 195 same four-byte chunk. In this case every write to either variable updates the 196 origin for all of them. We have to sacrifice p 196 origin for all of them. We have to sacrifice precision in this case, because 197 storing origins for individual bits (and even 197 storing origins for individual bits (and even bytes) would be too costly. 198 198 199 Example 2:: 199 Example 2:: 200 200 201 int combine(short a, short b) { 201 int combine(short a, short b) { 202 union ret_t { 202 union ret_t { 203 int i; 203 int i; 204 short s[2]; 204 short s[2]; 205 } ret; 205 } ret; 206 ret.s[0] = a; 206 ret.s[0] = a; 207 ret.s[1] = b; 207 ret.s[1] = b; 208 return ret.i; 208 return ret.i; 209 } 209 } 210 210 211 If ``a`` is initialized and ``b`` is not, the 211 If ``a`` is initialized and ``b`` is not, the shadow of the result would be 212 0xffff0000, and the origin of the result would 212 0xffff0000, and the origin of the result would be the origin of ``b``. 213 ``ret.s[0]`` would have the same origin, but i 213 ``ret.s[0]`` would have the same origin, but it will never be used, because 214 that variable is initialized. 214 that variable is initialized. 215 215 216 If both function arguments are uninitialized, 216 If both function arguments are uninitialized, only the origin of the second 217 argument is preserved. 217 argument is preserved. 218 218 219 Origin chaining 219 Origin chaining 220 ~~~~~~~~~~~~~~~ 220 ~~~~~~~~~~~~~~~ 221 221 222 To ease debugging, KMSAN creates a new origin 222 To ease debugging, KMSAN creates a new origin for every store of an 223 uninitialized value to memory. The new origin 223 uninitialized value to memory. The new origin references both its creation stack 224 and the previous origin the value had. This ma 224 and the previous origin the value had. This may cause increased memory 225 consumption, so we limit the length of origin 225 consumption, so we limit the length of origin chains in the runtime. 226 226 227 Clang instrumentation API 227 Clang instrumentation API 228 ------------------------- 228 ------------------------- 229 229 230 Clang instrumentation pass inserts calls to fu 230 Clang instrumentation pass inserts calls to functions defined in 231 ``mm/kmsan/nstrumentation.c`` into the kernel 231 ``mm/kmsan/nstrumentation.c`` into the kernel code. 232 232 233 Shadow manipulation 233 Shadow manipulation 234 ~~~~~~~~~~~~~~~~~~~ 234 ~~~~~~~~~~~~~~~~~~~ 235 235 236 For every memory access the compiler emits a c 236 For every memory access the compiler emits a call to a function that returns a 237 pair of pointers to the shadow and origin addr 237 pair of pointers to the shadow and origin addresses of the given memory:: 238 238 239 typedef struct { 239 typedef struct { 240 void *shadow, *origin; 240 void *shadow, *origin; 241 } shadow_origin_ptr_t 241 } shadow_origin_ptr_t 242 242 243 shadow_origin_ptr_t __msan_metadata_ptr_for_ 243 shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr) 244 shadow_origin_ptr_t __msan_metadata_ptr_for_ 244 shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr) 245 shadow_origin_ptr_t __msan_metadata_ptr_for_ 245 shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size) 246 shadow_origin_ptr_t __msan_metadata_ptr_for_ 246 shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size) 247 247 248 The function name depends on the memory access 248 The function name depends on the memory access size. 249 249 250 The compiler makes sure that for every loaded 250 The compiler makes sure that for every loaded value its shadow and origin 251 values are read from memory. When a value is s 251 values are read from memory. When a value is stored to memory, its shadow and 252 origin are also stored using the metadata poin 252 origin are also stored using the metadata pointers. 253 253 254 Handling locals 254 Handling locals 255 ~~~~~~~~~~~~~~~ 255 ~~~~~~~~~~~~~~~ 256 256 257 A special function is used to create a new ori 257 A special function is used to create a new origin value for a local variable and 258 set the origin of that variable to that value: 258 set the origin of that variable to that value:: 259 259 260 void __msan_poison_alloca(void *addr, uintpt 260 void __msan_poison_alloca(void *addr, uintptr_t size, char *descr) 261 261 262 Access to per-task data 262 Access to per-task data 263 ~~~~~~~~~~~~~~~~~~~~~~~ 263 ~~~~~~~~~~~~~~~~~~~~~~~ 264 264 265 At the beginning of every instrumented functio 265 At the beginning of every instrumented function KMSAN inserts a call to 266 ``__msan_get_context_state()``:: 266 ``__msan_get_context_state()``:: 267 267 268 kmsan_context_state *__msan_get_context_stat 268 kmsan_context_state *__msan_get_context_state(void) 269 269 270 ``kmsan_context_state`` is declared in ``inclu 270 ``kmsan_context_state`` is declared in ``include/linux/kmsan.h``:: 271 271 272 struct kmsan_context_state { 272 struct kmsan_context_state { 273 char param_tls[KMSAN_PARAM_SIZE]; 273 char param_tls[KMSAN_PARAM_SIZE]; 274 char retval_tls[KMSAN_RETVAL_SIZE]; 274 char retval_tls[KMSAN_RETVAL_SIZE]; 275 char va_arg_tls[KMSAN_PARAM_SIZE]; 275 char va_arg_tls[KMSAN_PARAM_SIZE]; 276 char va_arg_origin_tls[KMSAN_PARAM_SIZE]; 276 char va_arg_origin_tls[KMSAN_PARAM_SIZE]; 277 u64 va_arg_overflow_size_tls; 277 u64 va_arg_overflow_size_tls; 278 char param_origin_tls[KMSAN_PARAM_SIZE]; 278 char param_origin_tls[KMSAN_PARAM_SIZE]; 279 depot_stack_handle_t retval_origin_tls; 279 depot_stack_handle_t retval_origin_tls; 280 }; 280 }; 281 281 282 This structure is used by KMSAN to pass parame 282 This structure is used by KMSAN to pass parameter shadows and origins between 283 instrumented functions (unless the parameters 283 instrumented functions (unless the parameters are checked immediately by 284 ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``). 284 ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``). 285 285 286 Passing uninitialized values to functions 286 Passing uninitialized values to functions 287 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 287 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 288 288 289 Clang's MemorySanitizer instrumentation has an 289 Clang's MemorySanitizer instrumentation has an option, 290 ``-fsanitize-memory-param-retval``, which make 290 ``-fsanitize-memory-param-retval``, which makes the compiler check function 291 parameters passed by value, as well as functio 291 parameters passed by value, as well as function return values. 292 292 293 The option is controlled by ``CONFIG_KMSAN_CHE 293 The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is 294 enabled by default to let KMSAN report uniniti 294 enabled by default to let KMSAN report uninitialized values earlier. 295 Please refer to the `LKML discussion`_ for mor 295 Please refer to the `LKML discussion`_ for more details. 296 296 297 Because of the way the checks are implemented 297 Because of the way the checks are implemented in LLVM (they are only applied to 298 parameters marked as ``noundef``), not all par 298 parameters marked as ``noundef``), not all parameters are guaranteed to be 299 checked, so we cannot give up the metadata sto 299 checked, so we cannot give up the metadata storage in ``kmsan_context_state``. 300 300 301 String functions 301 String functions 302 ~~~~~~~~~~~~~~~~ 302 ~~~~~~~~~~~~~~~~ 303 303 304 The compiler replaces calls to ``memcpy()``/`` 304 The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the 305 following functions. These functions are also 305 following functions. These functions are also called when data structures are 306 initialized or copied, making sure shadow and 306 initialized or copied, making sure shadow and origin values are copied alongside 307 with the data:: 307 with the data:: 308 308 309 void *__msan_memcpy(void *dst, void *src, ui 309 void *__msan_memcpy(void *dst, void *src, uintptr_t n) 310 void *__msan_memmove(void *dst, void *src, u 310 void *__msan_memmove(void *dst, void *src, uintptr_t n) 311 void *__msan_memset(void *dst, int c, uintpt 311 void *__msan_memset(void *dst, int c, uintptr_t n) 312 312 313 Error reporting 313 Error reporting 314 ~~~~~~~~~~~~~~~ 314 ~~~~~~~~~~~~~~~ 315 315 316 For each use of a value the compiler emits a s 316 For each use of a value the compiler emits a shadow check that calls 317 ``__msan_warning()`` in the case that value is 317 ``__msan_warning()`` in the case that value is poisoned:: 318 318 319 void __msan_warning(u32 origin) 319 void __msan_warning(u32 origin) 320 320 321 ``__msan_warning()`` causes KMSAN runtime to p 321 ``__msan_warning()`` causes KMSAN runtime to print an error report. 322 322 323 Inline assembly instrumentation 323 Inline assembly instrumentation 324 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 324 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 325 325 326 KMSAN instruments every inline assembly output 326 KMSAN instruments every inline assembly output with a call to:: 327 327 328 void __msan_instrument_asm_store(void *addr, 328 void __msan_instrument_asm_store(void *addr, uintptr_t size) 329 329 330 , which unpoisons the memory region. 330 , which unpoisons the memory region. 331 331 332 This approach may mask certain errors, but it 332 This approach may mask certain errors, but it also helps to avoid a lot of 333 false positives in bitwise operations, atomics 333 false positives in bitwise operations, atomics etc. 334 334 335 Sometimes the pointers passed into inline asse 335 Sometimes the pointers passed into inline assembly do not point to valid memory. 336 In such cases they are ignored at runtime. 336 In such cases they are ignored at runtime. 337 337 338 338 339 Runtime library 339 Runtime library 340 --------------- 340 --------------- 341 341 342 The code is located in ``mm/kmsan/``. 342 The code is located in ``mm/kmsan/``. 343 343 344 Per-task KMSAN state 344 Per-task KMSAN state 345 ~~~~~~~~~~~~~~~~~~~~ 345 ~~~~~~~~~~~~~~~~~~~~ 346 346 347 Every task_struct has an associated KMSAN task 347 Every task_struct has an associated KMSAN task state that holds the KMSAN 348 context (see above) and a per-task counter dis 348 context (see above) and a per-task counter disallowing KMSAN reports:: 349 349 350 struct kmsan_context { 350 struct kmsan_context { 351 ... 351 ... 352 unsigned int depth; 352 unsigned int depth; 353 struct kmsan_context_state cstate; 353 struct kmsan_context_state cstate; 354 ... 354 ... 355 } 355 } 356 356 357 struct task_struct { 357 struct task_struct { 358 ... 358 ... 359 struct kmsan_context kmsan; 359 struct kmsan_context kmsan; 360 ... 360 ... 361 } 361 } 362 362 363 KMSAN contexts 363 KMSAN contexts 364 ~~~~~~~~~~~~~~ 364 ~~~~~~~~~~~~~~ 365 365 366 When running in a kernel task context, KMSAN u 366 When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to 367 hold the metadata for function parameters and 367 hold the metadata for function parameters and return values. 368 368 369 But in the case the kernel is running in the i 369 But in the case the kernel is running in the interrupt, softirq or NMI context, 370 where ``current`` is unavailable, KMSAN switch 370 where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state:: 371 371 372 DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percp 372 DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx); 373 373 374 Metadata allocation 374 Metadata allocation 375 ~~~~~~~~~~~~~~~~~~~ 375 ~~~~~~~~~~~~~~~~~~~ 376 376 377 There are several places in the kernel for whi 377 There are several places in the kernel for which the metadata is stored. 378 378 379 1. Each ``struct page`` instance contains two 379 1. Each ``struct page`` instance contains two pointers to its shadow and 380 origin pages:: 380 origin pages:: 381 381 382 struct page { 382 struct page { 383 ... 383 ... 384 struct page *shadow, *origin; 384 struct page *shadow, *origin; 385 ... 385 ... 386 }; 386 }; 387 387 388 At boot-time, the kernel allocates shadow and 388 At boot-time, the kernel allocates shadow and origin pages for every available 389 kernel page. This is done quite late, when the 389 kernel page. This is done quite late, when the kernel address space is already 390 fragmented, so normal data pages may arbitrari 390 fragmented, so normal data pages may arbitrarily interleave with the metadata 391 pages. 391 pages. 392 392 393 This means that in general for two contiguous 393 This means that in general for two contiguous memory pages their shadow/origin 394 pages may not be contiguous. Consequently, if 394 pages may not be contiguous. Consequently, if a memory access crosses the 395 boundary of a memory block, accesses to shadow 395 boundary of a memory block, accesses to shadow/origin memory may potentially 396 corrupt other pages or read incorrect values f 396 corrupt other pages or read incorrect values from them. 397 397 398 In practice, contiguous memory pages returned 398 In practice, contiguous memory pages returned by the same ``alloc_pages()`` 399 call will have contiguous metadata, whereas if 399 call will have contiguous metadata, whereas if these pages belong to two 400 different allocations their metadata pages can 400 different allocations their metadata pages can be fragmented. 401 401 402 For the kernel data (``.data``, ``.bss`` etc.) 402 For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions 403 there also are no guarantees on metadata conti 403 there also are no guarantees on metadata contiguity. 404 404 405 In the case ``__msan_metadata_ptr_for_XXX_YYY( 405 In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two 406 pages with non-contiguous metadata, it returns 406 pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions:: 407 407 408 char dummy_load_page[PAGE_SIZE] __attribute_ 408 char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); 409 char dummy_store_page[PAGE_SIZE] __attribute 409 char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); 410 410 411 ``dummy_load_page`` is zero-initialized, so re 411 ``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes. 412 All stores to ``dummy_store_page`` are ignored 412 All stores to ``dummy_store_page`` are ignored. 413 413 414 2. For vmalloc memory and modules, there is a 414 2. For vmalloc memory and modules, there is a direct mapping between the memory 415 range, its shadow and origin. KMSAN reduces th 415 range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only 416 the first quarter available to ``vmalloc()``. 416 the first quarter available to ``vmalloc()``. The second quarter of the vmalloc 417 area contains shadow memory for the first quar 417 area contains shadow memory for the first quarter, the third one holds the 418 origins. A small part of the fourth quarter co 418 origins. A small part of the fourth quarter contains shadow and origins for the 419 kernel modules. Please refer to ``arch/x86/inc 419 kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for 420 more details. 420 more details. 421 421 422 When an array of pages is mapped into a contig 422 When an array of pages is mapped into a contiguous virtual memory space, their 423 shadow and origin pages are similarly mapped i 423 shadow and origin pages are similarly mapped into contiguous regions. 424 424 425 References 425 References 426 ========== 426 ========== 427 427 428 E. Stepanov, K. Serebryany. `MemorySanitizer: 428 E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized 429 memory use in C++ 429 memory use in C++ 430 <https://static.googleusercontent.com/media/re 430 <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_. 431 In Proceedings of CGO 2015. 431 In Proceedings of CGO 2015. 432 432 433 .. _MemorySanitizer tool: https://clang.llvm.o 433 .. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html 434 .. _LLVM documentation: https://llvm.org/docs/ 434 .. _LLVM documentation: https://llvm.org/docs/GettingStarted.html 435 .. _LKML discussion: https://lore.kernel.org/a 435 .. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.