1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 ================= 3 ================= 4 KVM VCPU Requests 4 KVM VCPU Requests 5 ================= 5 ================= 6 6 7 Overview 7 Overview 8 ======== 8 ======== 9 9 10 KVM supports an internal API enabling threads 10 KVM supports an internal API enabling threads to request a VCPU thread to 11 perform some activity. For example, a thread 11 perform some activity. For example, a thread may request a VCPU to flush 12 its TLB with a VCPU request. The API consists 12 its TLB with a VCPU request. The API consists of the following functions:: 13 13 14 /* Check if any requests are pending for VCP 14 /* Check if any requests are pending for VCPU @vcpu. */ 15 bool kvm_request_pending(struct kvm_vcpu *vc 15 bool kvm_request_pending(struct kvm_vcpu *vcpu); 16 16 17 /* Check if VCPU @vcpu has request @req pend 17 /* Check if VCPU @vcpu has request @req pending. */ 18 bool kvm_test_request(int req, struct kvm_vc 18 bool kvm_test_request(int req, struct kvm_vcpu *vcpu); 19 19 20 /* Clear request @req for VCPU @vcpu. */ 20 /* Clear request @req for VCPU @vcpu. */ 21 void kvm_clear_request(int req, struct kvm_v 21 void kvm_clear_request(int req, struct kvm_vcpu *vcpu); 22 22 23 /* 23 /* 24 * Check if VCPU @vcpu has request @req pend 24 * Check if VCPU @vcpu has request @req pending. When the request is 25 * pending it will be cleared and a memory b 25 * pending it will be cleared and a memory barrier, which pairs with 26 * another in kvm_make_request(), will be is 26 * another in kvm_make_request(), will be issued. 27 */ 27 */ 28 bool kvm_check_request(int req, struct kvm_v 28 bool kvm_check_request(int req, struct kvm_vcpu *vcpu); 29 29 30 /* 30 /* 31 * Make request @req of VCPU @vcpu. Issues a 31 * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs 32 * with another in kvm_check_request(), prio 32 * with another in kvm_check_request(), prior to setting the request. 33 */ 33 */ 34 void kvm_make_request(int req, struct kvm_vc 34 void kvm_make_request(int req, struct kvm_vcpu *vcpu); 35 35 36 /* Make request @req of all VCPUs of the VM 36 /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */ 37 bool kvm_make_all_cpus_request(struct kvm *k 37 bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req); 38 38 39 Typically a requester wants the VCPU to perfor 39 Typically a requester wants the VCPU to perform the activity as soon 40 as possible after making the request. This me 40 as possible after making the request. This means most requests 41 (kvm_make_request() calls) are followed by a c 41 (kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(), 42 and kvm_make_all_cpus_request() has the kickin 42 and kvm_make_all_cpus_request() has the kicking of all VCPUs built 43 into it. 43 into it. 44 44 45 VCPU Kicks 45 VCPU Kicks 46 ---------- 46 ---------- 47 47 48 The goal of a VCPU kick is to bring a VCPU thr 48 The goal of a VCPU kick is to bring a VCPU thread out of guest mode in 49 order to perform some KVM maintenance. To do 49 order to perform some KVM maintenance. To do so, an IPI is sent, forcing 50 a guest mode exit. However, a VCPU thread may 50 a guest mode exit. However, a VCPU thread may not be in guest mode at the 51 time of the kick. Therefore, depending on the 51 time of the kick. Therefore, depending on the mode and state of the VCPU 52 thread, there are two other actions a kick may 52 thread, there are two other actions a kick may take. All three actions 53 are listed below: 53 are listed below: 54 54 55 1) Send an IPI. This forces a guest mode exit 55 1) Send an IPI. This forces a guest mode exit. 56 2) Waking a sleeping VCPU. Sleeping VCPUs are 56 2) Waking a sleeping VCPU. Sleeping VCPUs are VCPU threads outside guest 57 mode that wait on waitqueues. Waking them 57 mode that wait on waitqueues. Waking them removes the threads from 58 the waitqueues, allowing the threads to run 58 the waitqueues, allowing the threads to run again. This behavior 59 may be suppressed, see KVM_REQUEST_NO_WAKEU 59 may be suppressed, see KVM_REQUEST_NO_WAKEUP below. 60 3) Nothing. When the VCPU is not in guest mod 60 3) Nothing. When the VCPU is not in guest mode and the VCPU thread is not 61 sleeping, then there is nothing to do. 61 sleeping, then there is nothing to do. 62 62 63 VCPU Mode 63 VCPU Mode 64 --------- 64 --------- 65 65 66 VCPUs have a mode state, ``vcpu->mode``, that 66 VCPUs have a mode state, ``vcpu->mode``, that is used to track whether the 67 guest is running in guest mode or not, as well 67 guest is running in guest mode or not, as well as some specific 68 outside guest mode states. The architecture m 68 outside guest mode states. The architecture may use ``vcpu->mode`` to 69 ensure VCPU requests are seen by VCPUs (see "E 69 ensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"), 70 as well as to avoid sending unnecessary IPIs ( 70 as well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and 71 even to ensure IPI acknowledgements are waited 71 even to ensure IPI acknowledgements are waited upon (see "Waiting for 72 Acknowledgements"). The following modes are d 72 Acknowledgements"). The following modes are defined: 73 73 74 OUTSIDE_GUEST_MODE 74 OUTSIDE_GUEST_MODE 75 75 76 The VCPU thread is outside guest mode. 76 The VCPU thread is outside guest mode. 77 77 78 IN_GUEST_MODE 78 IN_GUEST_MODE 79 79 80 The VCPU thread is in guest mode. 80 The VCPU thread is in guest mode. 81 81 82 EXITING_GUEST_MODE 82 EXITING_GUEST_MODE 83 83 84 The VCPU thread is transitioning from IN_GUE 84 The VCPU thread is transitioning from IN_GUEST_MODE to 85 OUTSIDE_GUEST_MODE. 85 OUTSIDE_GUEST_MODE. 86 86 87 READING_SHADOW_PAGE_TABLES 87 READING_SHADOW_PAGE_TABLES 88 88 89 The VCPU thread is outside guest mode, but i 89 The VCPU thread is outside guest mode, but it wants the sender of 90 certain VCPU requests, namely KVM_REQ_TLB_FL 90 certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU 91 thread is done reading the page tables. 91 thread is done reading the page tables. 92 92 93 VCPU Request Internals 93 VCPU Request Internals 94 ====================== 94 ====================== 95 95 96 VCPU requests are simply bit indices of the `` 96 VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap. 97 This means general bitops, like those document 97 This means general bitops, like those documented in [atomic-ops]_ could 98 also be used, e.g. :: 98 also be used, e.g. :: 99 99 100 clear_bit(KVM_REQ_UNBLOCK & KVM_REQUEST_MASK !! 100 clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests); 101 101 102 However, VCPU request users should refrain fro 102 However, VCPU request users should refrain from doing so, as it would 103 break the abstraction. The first 8 bits are r 103 break the abstraction. The first 8 bits are reserved for architecture 104 independent requests; all additional bits are !! 104 independent requests, all additional bits are available for architecture 105 dependent requests. 105 dependent requests. 106 106 107 Architecture Independent Requests 107 Architecture Independent Requests 108 --------------------------------- 108 --------------------------------- 109 109 110 KVM_REQ_TLB_FLUSH 110 KVM_REQ_TLB_FLUSH 111 111 112 KVM's common MMU notifier may need to flush 112 KVM's common MMU notifier may need to flush all of a guest's TLB 113 entries, calling kvm_flush_remote_tlbs() to 113 entries, calling kvm_flush_remote_tlbs() to do so. Architectures that 114 choose to use the common kvm_flush_remote_tl 114 choose to use the common kvm_flush_remote_tlbs() implementation will 115 need to handle this VCPU request. 115 need to handle this VCPU request. 116 116 117 KVM_REQ_VM_DEAD 117 KVM_REQ_VM_DEAD 118 118 119 This request informs all VCPUs that the VM i 119 This request informs all VCPUs that the VM is dead and unusable, e.g. due to 120 fatal error or because the VM's state has be 120 fatal error or because the VM's state has been intentionally destroyed. 121 121 122 KVM_REQ_UNBLOCK 122 KVM_REQ_UNBLOCK 123 123 124 This request informs the vCPU to exit kvm_vc 124 This request informs the vCPU to exit kvm_vcpu_block. It is used for 125 example from timer handlers that run on the 125 example from timer handlers that run on the host on behalf of a vCPU, 126 or in order to update the interrupt routing 126 or in order to update the interrupt routing and ensure that assigned 127 devices will wake up the vCPU. 127 devices will wake up the vCPU. 128 128 >> 129 KVM_REQ_UNHALT >> 130 >> 131 This request may be made from the KVM common function kvm_vcpu_block(), >> 132 which is used to emulate an instruction that causes a CPU to halt until >> 133 one of an architectural specific set of events and/or interrupts is >> 134 received (determined by checking kvm_arch_vcpu_runnable()). When that >> 135 event or interrupt arrives kvm_vcpu_block() makes the request. This is >> 136 in contrast to when kvm_vcpu_block() returns due to any other reason, >> 137 such as a pending signal, which does not indicate the VCPU's halt >> 138 emulation should stop, and therefore does not make the request. >> 139 129 KVM_REQ_OUTSIDE_GUEST_MODE 140 KVM_REQ_OUTSIDE_GUEST_MODE 130 141 131 This "request" ensures the target vCPU has e 142 This "request" ensures the target vCPU has exited guest mode prior to the 132 sender of the request continuing on. No act 143 sender of the request continuing on. No action needs be taken by the target, 133 and so no request is actually logged for the 144 and so no request is actually logged for the target. This request is similar 134 to a "kick", but unlike a kick it guarantees 145 to a "kick", but unlike a kick it guarantees the vCPU has actually exited 135 guest mode. A kick only guarantees the vCPU 146 guest mode. A kick only guarantees the vCPU will exit at some point in the 136 future, e.g. a previous kick may have starte 147 future, e.g. a previous kick may have started the process, but there's no 137 guarantee the to-be-kicked vCPU has fully ex 148 guarantee the to-be-kicked vCPU has fully exited guest mode. 138 149 139 KVM_REQUEST_MASK 150 KVM_REQUEST_MASK 140 ---------------- 151 ---------------- 141 152 142 VCPU requests should be masked by KVM_REQUEST_ 153 VCPU requests should be masked by KVM_REQUEST_MASK before using them with 143 bitops. This is because only the lower 8 bits 154 bitops. This is because only the lower 8 bits are used to represent the 144 request's number. The upper bits are used as 155 request's number. The upper bits are used as flags. Currently only two 145 flags are defined. 156 flags are defined. 146 157 147 VCPU Request Flags 158 VCPU Request Flags 148 ------------------ 159 ------------------ 149 160 150 KVM_REQUEST_NO_WAKEUP 161 KVM_REQUEST_NO_WAKEUP 151 162 152 This flag is applied to requests that only n 163 This flag is applied to requests that only need immediate attention 153 from VCPUs running in guest mode. That is, 164 from VCPUs running in guest mode. That is, sleeping VCPUs do not need 154 to be awakened for these requests. Sleeping !! 165 to be awaken for these requests. Sleeping VCPUs will handle the 155 requests when they are awakened later for so !! 166 requests when they are awaken later for some other reason. 156 167 157 KVM_REQUEST_WAIT 168 KVM_REQUEST_WAIT 158 169 159 When requests with this flag are made with k 170 When requests with this flag are made with kvm_make_all_cpus_request(), 160 then the caller will wait for each VCPU to a 171 then the caller will wait for each VCPU to acknowledge its IPI before 161 proceeding. This flag only applies to VCPUs 172 proceeding. This flag only applies to VCPUs that would receive IPIs. 162 If, for example, the VCPU is sleeping, so no 173 If, for example, the VCPU is sleeping, so no IPI is necessary, then 163 the requesting thread does not wait. This m 174 the requesting thread does not wait. This means that this flag may be 164 safely combined with KVM_REQUEST_NO_WAKEUP. 175 safely combined with KVM_REQUEST_NO_WAKEUP. See "Waiting for 165 Acknowledgements" for more information about 176 Acknowledgements" for more information about requests with 166 KVM_REQUEST_WAIT. 177 KVM_REQUEST_WAIT. 167 178 168 VCPU Requests with Associated State 179 VCPU Requests with Associated State 169 =================================== 180 =================================== 170 181 171 Requesters that want the receiving VCPU to han 182 Requesters that want the receiving VCPU to handle new state need to ensure 172 the newly written state is observable to the r 183 the newly written state is observable to the receiving VCPU thread's CPU 173 by the time it observes the request. This mea 184 by the time it observes the request. This means a write memory barrier 174 must be inserted after writing the new state a 185 must be inserted after writing the new state and before setting the VCPU 175 request bit. Additionally, on the receiving V 186 request bit. Additionally, on the receiving VCPU thread's side, a 176 corresponding read barrier must be inserted af 187 corresponding read barrier must be inserted after reading the request bit 177 and before proceeding to read the new state as 188 and before proceeding to read the new state associated with it. See 178 scenario 3, Message and Flag, of [lwn-mb]_ and 189 scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation 179 [memory-barriers]_. 190 [memory-barriers]_. 180 191 181 The pair of functions, kvm_check_request() and 192 The pair of functions, kvm_check_request() and kvm_make_request(), provide 182 the memory barriers, allowing this requirement 193 the memory barriers, allowing this requirement to be handled internally by 183 the API. 194 the API. 184 195 185 Ensuring Requests Are Seen 196 Ensuring Requests Are Seen 186 ========================== 197 ========================== 187 198 188 When making requests to VCPUs, we want to avoi 199 When making requests to VCPUs, we want to avoid the receiving VCPU 189 executing in guest mode for an arbitrary long 200 executing in guest mode for an arbitrary long time without handling the 190 request. We can be sure this won't happen as 201 request. We can be sure this won't happen as long as we ensure the VCPU 191 thread checks kvm_request_pending() before ent 202 thread checks kvm_request_pending() before entering guest mode and that a 192 kick will send an IPI to force an exit from gu 203 kick will send an IPI to force an exit from guest mode when necessary. 193 Extra care must be taken to cover the period a 204 Extra care must be taken to cover the period after the VCPU thread's last 194 kvm_request_pending() check and before it has 205 kvm_request_pending() check and before it has entered guest mode, as kick 195 IPIs will only trigger guest mode exits for VC 206 IPIs will only trigger guest mode exits for VCPU threads that are in guest 196 mode or at least have already disabled interru 207 mode or at least have already disabled interrupts in order to prepare to 197 enter guest mode. This means that an optimize 208 enter guest mode. This means that an optimized implementation (see "IPI 198 Reduction") must be certain when it's safe to 209 Reduction") must be certain when it's safe to not send the IPI. One 199 solution, which all architectures except s390 210 solution, which all architectures except s390 apply, is to: 200 211 201 - set ``vcpu->mode`` to IN_GUEST_MODE between 212 - set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and 202 the last kvm_request_pending() check; 213 the last kvm_request_pending() check; 203 - enable interrupts atomically when entering t 214 - enable interrupts atomically when entering the guest. 204 215 205 This solution also requires memory barriers to 216 This solution also requires memory barriers to be placed carefully in both 206 the requesting thread and the receiving VCPU. 217 the requesting thread and the receiving VCPU. With the memory barriers we 207 can exclude the possibility of a VCPU thread o 218 can exclude the possibility of a VCPU thread observing 208 !kvm_request_pending() on its last check and t 219 !kvm_request_pending() on its last check and then not receiving an IPI for 209 the next request made of it, even if the reque 220 the next request made of it, even if the request is made immediately after 210 the check. This is done by way of the Dekker 221 the check. This is done by way of the Dekker memory barrier pattern 211 (scenario 10 of [lwn-mb]_). As the Dekker pat 222 (scenario 10 of [lwn-mb]_). As the Dekker pattern requires two variables, 212 this solution pairs ``vcpu->mode`` with ``vcpu 223 this solution pairs ``vcpu->mode`` with ``vcpu->requests``. Substituting 213 them into the pattern gives:: 224 them into the pattern gives:: 214 225 215 CPU1 CPU2 226 CPU1 CPU2 216 ================= ==== 227 ================= ================= 217 local_irq_disable(); 228 local_irq_disable(); 218 WRITE_ONCE(vcpu->mode, IN_GUEST_MODE); kvm_ 229 WRITE_ONCE(vcpu->mode, IN_GUEST_MODE); kvm_make_request(REQ, vcpu); 219 smp_mb(); smp_ 230 smp_mb(); smp_mb(); 220 if (kvm_request_pending(vcpu)) { if ( 231 if (kvm_request_pending(vcpu)) { if (READ_ONCE(vcpu->mode) == 221 232 IN_GUEST_MODE) { 222 ...abort guest entry... 233 ...abort guest entry... ...send IPI... 223 } } 234 } } 224 235 225 As stated above, the IPI is only useful for VC 236 As stated above, the IPI is only useful for VCPU threads in guest mode or 226 that have already disabled interrupts. This i 237 that have already disabled interrupts. This is why this specific case of 227 the Dekker pattern has been extended to disabl 238 the Dekker pattern has been extended to disable interrupts before setting 228 ``vcpu->mode`` to IN_GUEST_MODE. WRITE_ONCE() 239 ``vcpu->mode`` to IN_GUEST_MODE. WRITE_ONCE() and READ_ONCE() are used to 229 pedantically implement the memory barrier patt 240 pedantically implement the memory barrier pattern, guaranteeing the 230 compiler doesn't interfere with ``vcpu->mode`` 241 compiler doesn't interfere with ``vcpu->mode``'s carefully planned 231 accesses. 242 accesses. 232 243 233 IPI Reduction 244 IPI Reduction 234 ------------- 245 ------------- 235 246 236 As only one IPI is needed to get a VCPU to che 247 As only one IPI is needed to get a VCPU to check for any/all requests, 237 then they may be coalesced. This is easily do 248 then they may be coalesced. This is easily done by having the first IPI 238 sending kick also change the VCPU mode to some 249 sending kick also change the VCPU mode to something !IN_GUEST_MODE. The 239 transitional state, EXITING_GUEST_MODE, is use 250 transitional state, EXITING_GUEST_MODE, is used for this purpose. 240 251 241 Waiting for Acknowledgements 252 Waiting for Acknowledgements 242 ---------------------------- 253 ---------------------------- 243 254 244 Some requests, those with the KVM_REQUEST_WAIT 255 Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to 245 be sent, and the acknowledgements to be waited 256 be sent, and the acknowledgements to be waited upon, even when the target 246 VCPU threads are in modes other than IN_GUEST_ 257 VCPU threads are in modes other than IN_GUEST_MODE. For example, one case 247 is when a target VCPU thread is in READING_SHA 258 is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which 248 is set after disabling interrupts. To support 259 is set after disabling interrupts. To support these cases, the 249 KVM_REQUEST_WAIT flag changes the condition fo 260 KVM_REQUEST_WAIT flag changes the condition for sending an IPI from 250 checking that the VCPU is IN_GUEST_MODE to che 261 checking that the VCPU is IN_GUEST_MODE to checking that it is not 251 OUTSIDE_GUEST_MODE. 262 OUTSIDE_GUEST_MODE. 252 263 253 Request-less VCPU Kicks 264 Request-less VCPU Kicks 254 ----------------------- 265 ----------------------- 255 266 256 As the determination of whether or not to send 267 As the determination of whether or not to send an IPI depends on the 257 two-variable Dekker memory barrier pattern, th 268 two-variable Dekker memory barrier pattern, then it's clear that 258 request-less VCPU kicks are almost never corre 269 request-less VCPU kicks are almost never correct. Without the assurance 259 that a non-IPI generating kick will still resu 270 that a non-IPI generating kick will still result in an action by the 260 receiving VCPU, as the final kvm_request_pendi 271 receiving VCPU, as the final kvm_request_pending() check does for 261 request-accompanying kicks, then the kick may 272 request-accompanying kicks, then the kick may not do anything useful at 262 all. If, for instance, a request-less kick wa 273 all. If, for instance, a request-less kick was made to a VCPU that was 263 just about to set its mode to IN_GUEST_MODE, m 274 just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then 264 the VCPU thread may continue its entry without 275 the VCPU thread may continue its entry without actually having done 265 whatever it was the kick was meant to initiate 276 whatever it was the kick was meant to initiate. 266 277 267 One exception is x86's posted interrupt mechan 278 One exception is x86's posted interrupt mechanism. In this case, however, 268 even the request-less VCPU kick is coupled wit 279 even the request-less VCPU kick is coupled with the same 269 local_irq_disable() + smp_mb() pattern describ 280 local_irq_disable() + smp_mb() pattern described above; the ON bit 270 (Outstanding Notification) in the posted inter 281 (Outstanding Notification) in the posted interrupt descriptor takes the 271 role of ``vcpu->requests``. When sending a po 282 role of ``vcpu->requests``. When sending a posted interrupt, PIR.ON is 272 set before reading ``vcpu->mode``; dually, in 283 set before reading ``vcpu->mode``; dually, in the VCPU thread, 273 vmx_sync_pir_to_irr() reads PIR after setting 284 vmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to 274 IN_GUEST_MODE. 285 IN_GUEST_MODE. 275 286 276 Additional Considerations 287 Additional Considerations 277 ========================= 288 ========================= 278 289 279 Sleeping VCPUs 290 Sleeping VCPUs 280 -------------- 291 -------------- 281 292 282 VCPU threads may need to consider requests bef 293 VCPU threads may need to consider requests before and/or after calling 283 functions that may put them to sleep, e.g. kvm 294 functions that may put them to sleep, e.g. kvm_vcpu_block(). Whether they 284 do or not, and, if they do, which requests nee 295 do or not, and, if they do, which requests need consideration, is 285 architecture dependent. kvm_vcpu_block() call 296 architecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable() 286 to check if it should awaken. One reason to d 297 to check if it should awaken. One reason to do so is to provide 287 architectures a function where requests may be 298 architectures a function where requests may be checked if necessary. >> 299 >> 300 Clearing Requests >> 301 ----------------- >> 302 >> 303 Generally it only makes sense for the receiving VCPU thread to clear a >> 304 request. However, in some circumstances, such as when the requesting >> 305 thread and the receiving VCPU thread are executed serially, such as when >> 306 they are the same thread, or when they are using some form of concurrency >> 307 control to temporarily execute synchronously, then it's possible to know >> 308 that the request may be cleared immediately, rather than waiting for the >> 309 receiving VCPU thread to handle the request in VCPU RUN. The only current >> 310 examples of this are kvm_vcpu_block() calls made by VCPUs to block >> 311 themselves. A possible side-effect of that call is to make the >> 312 KVM_REQ_UNHALT request, which may then be cleared immediately when the >> 313 VCPU returns from the call. 288 314 289 References 315 References 290 ========== 316 ========== 291 317 292 .. [atomic-ops] Documentation/atomic_bitops.tx 318 .. [atomic-ops] Documentation/atomic_bitops.txt and Documentation/atomic_t.txt 293 .. [memory-barriers] Documentation/memory-barr 319 .. [memory-barriers] Documentation/memory-barriers.txt 294 .. [lwn-mb] https://lwn.net/Articles/573436/ 320 .. [lwn-mb] https://lwn.net/Articles/573436/
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.