1 .. SPDX-License-Identifier: GPL-2.0 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 ============================================== 3 =========================================================== 4 POWER9 eXternal Interrupt Virtualization Engin 4 POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1) 5 ============================================== 5 =========================================================== 6 6 7 Device types supported: 7 Device types supported: 8 - KVM_DEV_TYPE_XIVE POWER9 XIVE Interrup 8 - KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 9 9 10 This device acts as a VM interrupt controller. 10 This device acts as a VM interrupt controller. It provides the KVM 11 interface to configure the interrupt sources o 11 interface to configure the interrupt sources of a VM in the underlying 12 POWER9 XIVE interrupt controller. 12 POWER9 XIVE interrupt controller. 13 13 14 Only one XIVE instance may be instantiated. A 14 Only one XIVE instance may be instantiated. A guest XIVE device 15 requires a POWER9 host and the guest OS should 15 requires a POWER9 host and the guest OS should have support for the 16 XIVE native exploitation interrupt mode. If no 16 XIVE native exploitation interrupt mode. If not, it should run using 17 the legacy interrupt mode, referred as XICS (P 17 the legacy interrupt mode, referred as XICS (POWER7/8). 18 18 19 * Device Mappings 19 * Device Mappings 20 20 21 The KVM device exposes different MMIO ranges 21 The KVM device exposes different MMIO ranges of the XIVE HW which 22 are required for interrupt management. These 22 are required for interrupt management. These are exposed to the 23 guest in VMAs populated with a custom VM fau 23 guest in VMAs populated with a custom VM fault handler. 24 24 25 1. Thread Interrupt Management Area (TIMA) 25 1. Thread Interrupt Management Area (TIMA) 26 26 27 Each thread has an associated Thread Interru 27 Each thread has an associated Thread Interrupt Management context 28 composed of a set of registers. These regist 28 composed of a set of registers. These registers let the thread 29 handle priority management and interrupt ack 29 handle priority management and interrupt acknowledgment. The most 30 important are : 30 important are : 31 31 32 - Interrupt Pending Buffer (IPB) 32 - Interrupt Pending Buffer (IPB) 33 - Current Processor Priority (CPPR) 33 - Current Processor Priority (CPPR) 34 - Notification Source Register (NSR) 34 - Notification Source Register (NSR) 35 35 36 They are exposed to software in four differe 36 They are exposed to software in four different pages each proposing 37 a view with a different privilege. The first 37 a view with a different privilege. The first page is for the 38 physical thread context and the second for t 38 physical thread context and the second for the hypervisor. Only the 39 third (operating system) and the fourth (use 39 third (operating system) and the fourth (user level) are exposed the 40 guest. 40 guest. 41 41 42 2. Event State Buffer (ESB) 42 2. Event State Buffer (ESB) 43 43 44 Each source is associated with an Event Stat 44 Each source is associated with an Event State Buffer (ESB) with 45 either a pair of even/odd pair of pages whic 45 either a pair of even/odd pair of pages which provides commands to 46 manage the source: to trigger, to EOI, to tu 46 manage the source: to trigger, to EOI, to turn off the source for 47 instance. 47 instance. 48 48 49 3. Device pass-through 49 3. Device pass-through 50 50 51 When a device is passed-through into the gue 51 When a device is passed-through into the guest, the source 52 interrupts are from a different HW controlle 52 interrupts are from a different HW controller (PHB4) and the ESB 53 pages exposed to the guest should accommodat 53 pages exposed to the guest should accommodate this change. 54 54 55 The passthru_irq helpers, kvmppc_xive_set_ma 55 The passthru_irq helpers, kvmppc_xive_set_mapped() and 56 kvmppc_xive_clr_mapped() are called when the 56 kvmppc_xive_clr_mapped() are called when the device HW irqs are 57 mapped into or unmapped from the guest IRQ n 57 mapped into or unmapped from the guest IRQ number space. The KVM 58 device extends these helpers to clear the ES 58 device extends these helpers to clear the ESB pages of the guest IRQ 59 number being mapped and then lets the VM fau 59 number being mapped and then lets the VM fault handler repopulate. 60 The handler will insert the ESB page corresp 60 The handler will insert the ESB page corresponding to the HW 61 interrupt of the device being passed-through 61 interrupt of the device being passed-through or the initial IPI ESB 62 page if the device has being removed. 62 page if the device has being removed. 63 63 64 The ESB remapping is fully transparent to th 64 The ESB remapping is fully transparent to the guest and the OS 65 device driver. All handling is done within V 65 device driver. All handling is done within VFIO and the above 66 helpers in KVM-PPC. 66 helpers in KVM-PPC. 67 67 68 * Groups: 68 * Groups: 69 69 70 1. KVM_DEV_XIVE_GRP_CTRL 70 1. KVM_DEV_XIVE_GRP_CTRL 71 Provides global controls on the device 71 Provides global controls on the device 72 72 73 Attributes: 73 Attributes: 74 1.1 KVM_DEV_XIVE_RESET (write only) 74 1.1 KVM_DEV_XIVE_RESET (write only) 75 Resets the interrupt controller configurat 75 Resets the interrupt controller configuration for sources and event 76 queues. To be used by kexec and kdump. 76 queues. To be used by kexec and kdump. 77 77 78 Errors: none 78 Errors: none 79 79 80 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) 80 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) 81 Sync all the sources and queues and mark t 81 Sync all the sources and queues and mark the EQ pages dirty. This 82 to make sure that a consistent memory stat 82 to make sure that a consistent memory state is captured when 83 migrating the VM. 83 migrating the VM. 84 84 85 Errors: none 85 Errors: none 86 86 87 1.3 KVM_DEV_XIVE_NR_SERVERS (write only) 87 1.3 KVM_DEV_XIVE_NR_SERVERS (write only) 88 The kvm_device_attr.addr points to a __u32 88 The kvm_device_attr.addr points to a __u32 value which is the number of 89 interrupt server numbers (ie, highest poss 89 interrupt server numbers (ie, highest possible vcpu id plus one). 90 90 91 Errors: 91 Errors: 92 92 93 ======= =============================== 93 ======= ========================================== 94 -EINVAL Value greater than KVM_MAX_VCPU 94 -EINVAL Value greater than KVM_MAX_VCPU_IDS. 95 -EFAULT Invalid user pointer for attr-> 95 -EFAULT Invalid user pointer for attr->addr. 96 -EBUSY A vCPU is already connected to 96 -EBUSY A vCPU is already connected to the device. 97 ======= =============================== 97 ======= ========================================== 98 98 99 2. KVM_DEV_XIVE_GRP_SOURCE (write only) 99 2. KVM_DEV_XIVE_GRP_SOURCE (write only) 100 Initializes a new source in the XIVE devi 100 Initializes a new source in the XIVE device and mask it. 101 101 102 Attributes: 102 Attributes: 103 Interrupt source number (64-bit) 103 Interrupt source number (64-bit) 104 104 105 The kvm_device_attr.addr points to a __u64 v 105 The kvm_device_attr.addr points to a __u64 value:: 106 106 107 bits: | 63 .... 2 | 1 | 0 107 bits: | 63 .... 2 | 1 | 0 108 values: | unused | level | type 108 values: | unused | level | type 109 109 110 - type: 0:MSI 1:LSI 110 - type: 0:MSI 1:LSI 111 - level: assertion level in case of an LSI. 111 - level: assertion level in case of an LSI. 112 112 113 Errors: 113 Errors: 114 114 115 ======= ================================= 115 ======= ========================================== 116 -E2BIG Interrupt source number is out of 116 -E2BIG Interrupt source number is out of range 117 -ENOMEM Could not create a new source blo 117 -ENOMEM Could not create a new source block 118 -EFAULT Invalid user pointer for attr->ad 118 -EFAULT Invalid user pointer for attr->addr. 119 -ENXIO Could not allocate underlying HW 119 -ENXIO Could not allocate underlying HW interrupt 120 ======= ================================= 120 ======= ========================================== 121 121 122 3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) 122 3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) 123 Configures source targeting 123 Configures source targeting 124 124 125 Attributes: 125 Attributes: 126 Interrupt source number (64-bit) 126 Interrupt source number (64-bit) 127 127 128 The kvm_device_attr.addr points to a __u64 v 128 The kvm_device_attr.addr points to a __u64 value:: 129 129 130 bits: | 63 .... 33 | 32 | 31 .. 3 130 bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 131 values: | eisn | mask | server 131 values: | eisn | mask | server | priority 132 132 133 - priority: 0-7 interrupt priority level 133 - priority: 0-7 interrupt priority level 134 - server: CPU number chosen to handle the in 134 - server: CPU number chosen to handle the interrupt 135 - mask: mask flag (unused) 135 - mask: mask flag (unused) 136 - eisn: Effective Interrupt Source Number 136 - eisn: Effective Interrupt Source Number 137 137 138 Errors: 138 Errors: 139 139 140 ======= ================================= 140 ======= ======================================================= 141 -ENOENT Unknown source number 141 -ENOENT Unknown source number 142 -EINVAL Not initialized source number 142 -EINVAL Not initialized source number 143 -EINVAL Invalid priority 143 -EINVAL Invalid priority 144 -EINVAL Invalid CPU number. 144 -EINVAL Invalid CPU number. 145 -EFAULT Invalid user pointer for attr->ad 145 -EFAULT Invalid user pointer for attr->addr. 146 -ENXIO CPU event queues not configured o 146 -ENXIO CPU event queues not configured or configuration of the 147 underlying HW interrupt failed 147 underlying HW interrupt failed 148 -EBUSY No CPU available to serve interru 148 -EBUSY No CPU available to serve interrupt 149 ======= ================================= 149 ======= ======================================================= 150 150 151 4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) 151 4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) 152 Configures an event queue of a CPU 152 Configures an event queue of a CPU 153 153 154 Attributes: 154 Attributes: 155 EQ descriptor identifier (64-bit) 155 EQ descriptor identifier (64-bit) 156 156 157 The EQ descriptor identifier is a tuple (ser 157 The EQ descriptor identifier is a tuple (server, priority):: 158 158 159 bits: | 63 .... 32 | 31 .. 3 | 2 . 159 bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 160 values: | unused | server | prio 160 values: | unused | server | priority 161 161 162 The kvm_device_attr.addr points to:: 162 The kvm_device_attr.addr points to:: 163 163 164 struct kvm_ppc_xive_eq { 164 struct kvm_ppc_xive_eq { 165 __u32 flags; 165 __u32 flags; 166 __u32 qshift; 166 __u32 qshift; 167 __u64 qaddr; 167 __u64 qaddr; 168 __u32 qtoggle; 168 __u32 qtoggle; 169 __u32 qindex; 169 __u32 qindex; 170 __u8 pad[40]; 170 __u8 pad[40]; 171 }; 171 }; 172 172 173 - flags: queue flags 173 - flags: queue flags 174 KVM_XIVE_EQ_ALWAYS_NOTIFY (required) 174 KVM_XIVE_EQ_ALWAYS_NOTIFY (required) 175 forces notification without using the 175 forces notification without using the coalescing mechanism 176 provided by the XIVE END ESBs. 176 provided by the XIVE END ESBs. 177 - qshift: queue size (power of 2) 177 - qshift: queue size (power of 2) 178 - qaddr: real address of queue 178 - qaddr: real address of queue 179 - qtoggle: current queue toggle bit 179 - qtoggle: current queue toggle bit 180 - qindex: current queue index 180 - qindex: current queue index 181 - pad: reserved for future use 181 - pad: reserved for future use 182 182 183 Errors: 183 Errors: 184 184 185 ======= ================================= 185 ======= ========================================= 186 -ENOENT Invalid CPU number 186 -ENOENT Invalid CPU number 187 -EINVAL Invalid priority 187 -EINVAL Invalid priority 188 -EINVAL Invalid flags 188 -EINVAL Invalid flags 189 -EINVAL Invalid queue size 189 -EINVAL Invalid queue size 190 -EINVAL Invalid queue address 190 -EINVAL Invalid queue address 191 -EFAULT Invalid user pointer for attr->ad 191 -EFAULT Invalid user pointer for attr->addr. 192 -EIO Configuration of the underlying H 192 -EIO Configuration of the underlying HW failed 193 ======= ================================= 193 ======= ========================================= 194 194 195 5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) 195 5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) 196 Synchronize the source to flush event not 196 Synchronize the source to flush event notifications 197 197 198 Attributes: 198 Attributes: 199 Interrupt source number (64-bit) 199 Interrupt source number (64-bit) 200 200 201 Errors: 201 Errors: 202 202 203 ======= ============================= 203 ======= ============================= 204 -ENOENT Unknown source number 204 -ENOENT Unknown source number 205 -EINVAL Not initialized source number 205 -EINVAL Not initialized source number 206 ======= ============================= 206 ======= ============================= 207 207 208 * VCPU state 208 * VCPU state 209 209 210 The XIVE IC maintains VP interrupt state in 210 The XIVE IC maintains VP interrupt state in an internal structure 211 called the NVT. When a VP is not dispatched 211 called the NVT. When a VP is not dispatched on a HW processor 212 thread, this structure can be updated by HW 212 thread, this structure can be updated by HW if the VP is the target 213 of an event notification. 213 of an event notification. 214 214 215 It is important for migration to capture the 215 It is important for migration to capture the cached IPB from the NVT 216 as it synthesizes the priorities of the pend 216 as it synthesizes the priorities of the pending interrupts. We 217 capture a bit more to report debug informati 217 capture a bit more to report debug information. 218 218 219 KVM_REG_PPC_VP_STATE (2 * 64bits):: 219 KVM_REG_PPC_VP_STATE (2 * 64bits):: 220 220 221 bits: | 63 .... 32 | 31 .... 0 221 bits: | 63 .... 32 | 31 .... 0 | 222 values: | TIMA word0 | TIMA word1 222 values: | TIMA word0 | TIMA word1 | 223 bits: | 127 .......... 64 223 bits: | 127 .......... 64 | 224 values: | unused 224 values: | unused | 225 225 226 * Migration: 226 * Migration: 227 227 228 Saving the state of a VM using the XIVE nati 228 Saving the state of a VM using the XIVE native exploitation mode 229 should follow a specific sequence. When the 229 should follow a specific sequence. When the VM is stopped : 230 230 231 1. Mask all sources (PQ=01) to stop the flow 231 1. Mask all sources (PQ=01) to stop the flow of events. 232 232 233 2. Sync the XIVE device with the KVM control 233 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to 234 flush any in-flight event notification and t 234 flush any in-flight event notification and to stabilize the EQs. At 235 this stage, the EQ pages are marked dirty to 235 this stage, the EQ pages are marked dirty to make sure they are 236 transferred in the migration sequence. 236 transferred in the migration sequence. 237 237 238 3. Capture the state of the source targeting 238 3. Capture the state of the source targeting, the EQs configuration 239 and the state of thread interrupt context re 239 and the state of thread interrupt context registers. 240 240 241 Restore is similar: 241 Restore is similar: 242 242 243 1. Restore the EQ configuration. As targetin 243 1. Restore the EQ configuration. As targeting depends on it. 244 2. Restore targeting 244 2. Restore targeting 245 3. Restore the thread interrupt contexts 245 3. Restore the thread interrupt contexts 246 4. Restore the source states 246 4. Restore the source states 247 5. Let the vCPU run 247 5. Let the vCPU run
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.