1 .. SPDX-License-Identifier: GPL-2.0 2 3 ====================== 4 Generic vcpu interface 5 ====================== 6 7 The virtual cpu "device" also accepts the ioctls KVM_SET_DEVICE_ATTR, 8 KVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same struct 9 kvm_device_attr as other devices, but targets VCPU-wide settings and controls. 10 11 The groups and attributes per virtual cpu, if any, are architecture specific. 12 13 1. GROUP: KVM_ARM_VCPU_PMU_V3_CTRL 14 ================================== 15 16 :Architectures: ARM64 17 18 1.1. ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_IRQ 19 --------------------------------------- 20 21 :Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a 22 pointer to an int 23 24 Returns: 25 26 ======= ======================================================== 27 -EBUSY The PMU overflow interrupt is already set 28 -EFAULT Error reading interrupt number 29 -ENXIO PMUv3 not supported or the overflow interrupt not set 30 when attempting to get it 31 -ENODEV KVM_ARM_VCPU_PMU_V3 feature missing from VCPU 32 -EINVAL Invalid PMU overflow interrupt number supplied or 33 trying to set the IRQ number without using an in-kernel 34 irqchip. 35 ======= ======================================================== 36 37 A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt 38 number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt 39 type must be same for each vcpu. As a PPI, the interrupt number is the same for 40 all vcpus, while as an SPI it must be a separate number per vcpu. 41 42 1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT 43 --------------------------------------- 44 45 :Parameters: no additional parameter in kvm_device_attr.addr 46 47 Returns: 48 49 ======= ====================================================== 50 -EEXIST Interrupt number already used 51 -ENODEV PMUv3 not supported or GIC not initialized 52 -ENXIO PMUv3 not supported, missing VCPU feature or interrupt 53 number not set 54 -EBUSY PMUv3 already initialized 55 ======= ====================================================== 56 57 Request the initialization of the PMUv3. If using the PMUv3 with an in-kernel 58 virtual GIC implementation, this must be done after initializing the in-kernel 59 irqchip. 60 61 1.3 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FILTER 62 ----------------------------------------- 63 64 :Parameters: in kvm_device_attr.addr the address for a PMU event filter is a 65 pointer to a struct kvm_pmu_event_filter 66 67 :Returns: 68 69 ======= ====================================================== 70 -ENODEV PMUv3 not supported or GIC not initialized 71 -ENXIO PMUv3 not properly configured or in-kernel irqchip not 72 configured as required prior to calling this attribute 73 -EBUSY PMUv3 already initialized or a VCPU has already run 74 -EINVAL Invalid filter range 75 ======= ====================================================== 76 77 Request the installation of a PMU event filter described as follows:: 78 79 struct kvm_pmu_event_filter { 80 __u16 base_event; 81 __u16 nevents; 82 83 #define KVM_PMU_EVENT_ALLOW 0 84 #define KVM_PMU_EVENT_DENY 1 85 86 __u8 action; 87 __u8 pad[3]; 88 }; 89 90 A filter range is defined as the range [@base_event, @base_event + @nevents), 91 together with an @action (KVM_PMU_EVENT_ALLOW or KVM_PMU_EVENT_DENY). The 92 first registered range defines the global policy (global ALLOW if the first 93 @action is DENY, global DENY if the first @action is ALLOW). Multiple ranges 94 can be programmed, and must fit within the event space defined by the PMU 95 architecture (10 bits on ARMv8.0, 16 bits from ARMv8.1 onwards). 96 97 Note: "Cancelling" a filter by registering the opposite action for the same 98 range doesn't change the default action. For example, installing an ALLOW 99 filter for event range [0:10) as the first filter and then applying a DENY 100 action for the same range will leave the whole range as disabled. 101 102 Restrictions: Event 0 (SW_INCR) is never filtered, as it doesn't count a 103 hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it 104 isn't strictly speaking an event. Filtering the cycle counter is possible 105 using event 0x11 (CPU_CYCLES). 106 107 1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU 108 ------------------------------------------ 109 110 :Parameters: in kvm_device_attr.addr the address to an int representing the PMU 111 identifier. 112 113 :Returns: 114 115 ======= ==================================================== 116 -EBUSY PMUv3 already initialized, a VCPU has already run or 117 an event filter has already been set 118 -EFAULT Error accessing the PMU identifier 119 -ENXIO PMU not found 120 -ENODEV PMUv3 not supported or GIC not initialized 121 -ENOMEM Could not allocate memory 122 ======= ==================================================== 123 124 Request that the VCPU uses the specified hardware PMU when creating guest events 125 for the purpose of PMU emulation. The PMU identifier can be read from the "type" 126 file for the desired PMU instance under /sys/devices (or, equivalent, 127 /sys/bus/even_source). This attribute is particularly useful on heterogeneous 128 systems where there are at least two CPU PMUs on the system. The PMU that is set 129 for one VCPU will be used by all the other VCPUs. It isn't possible to set a PMU 130 if a PMU event filter is already present. 131 132 Note that KVM will not make any attempts to run the VCPU on the physical CPUs 133 associated with the PMU specified by this attribute. This is entirely left to 134 userspace. However, attempting to run the VCPU on a physical CPU not supported 135 by the PMU will fail and KVM_RUN will return with 136 exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct by setting 137 hardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and 138 the cpu field to the processor id. 139 140 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL 141 ================================= 142 143 :Architectures: ARM64 144 145 2.1. ATTRIBUTES: KVM_ARM_VCPU_TIMER_IRQ_VTIMER, KVM_ARM_VCPU_TIMER_IRQ_PTIMER 146 ----------------------------------------------------------------------------- 147 148 :Parameters: in kvm_device_attr.addr the address for the timer interrupt is a 149 pointer to an int 150 151 Returns: 152 153 ======= ================================= 154 -EINVAL Invalid timer interrupt number 155 -EBUSY One or more VCPUs has already run 156 ======= ================================= 157 158 A value describing the architected timer interrupt number when connected to an 159 in-kernel virtual GIC. These must be a PPI (16 <= intid < 32). Setting the 160 attribute overrides the default values (see below). 161 162 ============================= ========================================== 163 KVM_ARM_VCPU_TIMER_IRQ_VTIMER The EL1 virtual timer intid (default: 27) 164 KVM_ARM_VCPU_TIMER_IRQ_PTIMER The EL1 physical timer intid (default: 30) 165 ============================= ========================================== 166 167 Setting the same PPI for different timers will prevent the VCPUs from running. 168 Setting the interrupt number on a VCPU configures all VCPUs created at that 169 time to use the number provided for a given timer, overwriting any previously 170 configured values on other VCPUs. Userspace should configure the interrupt 171 numbers on at least one VCPU after creating all VCPUs and before running any 172 VCPUs. 173 174 .. _kvm_arm_vcpu_pvtime_ctrl: 175 176 3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL 177 ================================== 178 179 :Architectures: ARM64 180 181 3.1 ATTRIBUTE: KVM_ARM_VCPU_PVTIME_IPA 182 -------------------------------------- 183 184 :Parameters: 64-bit base address 185 186 Returns: 187 188 ======= ====================================== 189 -ENXIO Stolen time not implemented 190 -EEXIST Base address already set for this VCPU 191 -EINVAL Base address not 64 byte aligned 192 ======= ====================================== 193 194 Specifies the base address of the stolen time structure for this VCPU. The 195 base address must be 64 byte aligned and exist within a valid guest memory 196 region. See Documentation/virt/kvm/arm/pvtime.rst for more information 197 including the layout of the stolen time structure. 198 199 4. GROUP: KVM_VCPU_TSC_CTRL 200 =========================== 201 202 :Architectures: x86 203 204 4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET 205 206 :Parameters: 64-bit unsigned TSC offset 207 208 Returns: 209 210 ======= ====================================== 211 -EFAULT Error reading/writing the provided 212 parameter address. 213 -ENXIO Attribute not supported 214 ======= ====================================== 215 216 Specifies the guest's TSC offset relative to the host's TSC. The guest's 217 TSC is then derived by the following equation: 218 219 guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET 220 221 This attribute is useful to adjust the guest's TSC on live migration, 222 so that the TSC counts the time during which the VM was paused. The 223 following describes a possible algorithm to use for this purpose. 224 225 From the source VMM process: 226 227 1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_src), 228 kvmclock nanoseconds (guest_src), and host CLOCK_REALTIME nanoseconds 229 (host_src). 230 231 2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the 232 guest TSC offset (ofs_src[i]). 233 234 3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the 235 guest's TSC (freq). 236 237 From the destination VMM process: 238 239 4. Invoke the KVM_SET_CLOCK ioctl, providing the source nanoseconds from 240 kvmclock (guest_src) and CLOCK_REALTIME (host_src) in their respective 241 fields. Ensure that the KVM_CLOCK_REALTIME flag is set in the provided 242 structure. 243 244 KVM will advance the VM's kvmclock to account for elapsed time since 245 recording the clock values. Note that this will cause problems in 246 the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized 247 between the source and destination, and a reasonably short time passes 248 between the source pausing the VMs and the destination executing 249 steps 4-7. 250 251 5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_dest) and 252 kvmclock nanoseconds (guest_dest). 253 254 6. Adjust the guest TSC offsets for every vCPU to account for (1) time 255 elapsed since recording state and (2) difference in TSCs between the 256 source and destination machine: 257 258 ofs_dst[i] = ofs_src[i] - 259 (guest_src - guest_dest) * freq + 260 (tsc_src - tsc_dest) 261 262 ("ofs[i] + tsc - guest * freq" is the guest TSC value corresponding to 263 a time of 0 in kvmclock. The above formula ensures that it is the 264 same on the destination as it was on the source). 265 266 7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the 267 respective value derived in the previous step.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.