1 .. SPDX-License-Identifier: GPL-2.0 2 3 =========================== 4 Hypercall Op-codes (hcalls) 5 =========================== 6 7 Overview 8 ========= 9 10 Virtualization on 64-bit Power Book3S Platforms is based on the PAPR 11 specification [1]_ which describes the run-time environment for a guest 12 operating system and how it should interact with the hypervisor for 13 privileged operations. Currently there are two PAPR compliant hypervisors: 14 15 - **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, 16 IBM-i and Linux as supported guests (termed as Logical Partitions 17 or LPARS). It supports the full PAPR specification. 18 19 - **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. 20 Though it only implements a subset of PAPR specification called LoPAPR [2]_. 21 22 On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called 23 a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must 24 issue hypercalls to the hypervisor whenever it needs to perform an action 25 that is hypervisor privileged [3]_ or for other services managed by the 26 hypervisor. 27 28 Hence a Hypercall (hcall) is essentially a request by the pseries guest 29 asking hypervisor to perform a privileged operation on behalf of the guest. The 30 guest issues a with necessary input operands. The hypervisor after performing 31 the privilege operation returns a status code and output operands back to the 32 guest. 33 34 HCALL ABI 35 ========= 36 The ABI specification for a hcall between a pseries guest and PAPR hypervisor 37 is covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is 38 done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* 39 and any in-arguments for the hcall are provided in registers *r4-r12*. If values 40 have to be passed through a memory buffer, the data stored in that buffer should be 41 in Big-endian byte order. 42 43 Once control returns back to the guest after hypervisor has serviced the 44 'HVCS' instruction the return value of the hcall is available in *r3* and any 45 out values are returned in registers *r4-r12*. Again like in case of in-arguments, 46 any out values stored in a memory buffer will be in Big-endian byte order. 47 48 Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined 49 in a arch specific header [4]_ to issue hcalls from the linux kernel 50 running as pseries guest. 51 52 Register Conventions 53 ==================== 54 55 Any hcall should follow same register convention as described in section 2.2.1.1 56 of "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below 57 summarizes these conventions: 58 59 +----------+----------+-------------------------------------------+ 60 | Register |Volatile | Purpose | 61 | Range |(Y/N) | | 62 +==========+==========+===========================================+ 63 | r0 | Y | Optional-usage | 64 +----------+----------+-------------------------------------------+ 65 | r1 | N | Stack Pointer | 66 +----------+----------+-------------------------------------------+ 67 | r2 | N | TOC | 68 +----------+----------+-------------------------------------------+ 69 | r3 | Y | hcall opcode/return value | 70 +----------+----------+-------------------------------------------+ 71 | r4-r10 | Y | in and out values | 72 +----------+----------+-------------------------------------------+ 73 | r11 | Y | Optional-usage/Environmental pointer | 74 +----------+----------+-------------------------------------------+ 75 | r12 | Y | Optional-usage/Function entry address at | 76 | | | global entry point | 77 +----------+----------+-------------------------------------------+ 78 | r13 | N | Thread-Pointer | 79 +----------+----------+-------------------------------------------+ 80 | r14-r31 | N | Local Variables | 81 +----------+----------+-------------------------------------------+ 82 | LR | Y | Link Register | 83 +----------+----------+-------------------------------------------+ 84 | CTR | Y | Loop Counter | 85 +----------+----------+-------------------------------------------+ 86 | XER | Y | Fixed-point exception register. | 87 +----------+----------+-------------------------------------------+ 88 | CR0-1 | Y | Condition register fields. | 89 +----------+----------+-------------------------------------------+ 90 | CR2-4 | N | Condition register fields. | 91 +----------+----------+-------------------------------------------+ 92 | CR5-7 | Y | Condition register fields. | 93 +----------+----------+-------------------------------------------+ 94 | Others | N | | 95 +----------+----------+-------------------------------------------+ 96 97 DRC & DRC Indexes 98 ================= 99 :: 100 101 DR1 Guest 102 +--+ +------------+ +---------+ 103 | | <----> | | | User | 104 +--+ DRC1 | | DRC | Space | 105 | PAPR | Index +---------+ 106 DR2 | Hypervisor | | | 107 +--+ | | <-----> | Kernel | 108 | | <----> | | Hcall | | 109 +--+ DRC2 +------------+ +---------+ 110 111 PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc 112 available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to 113 an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC) 114 to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number 115 called DRC-Index. The DRC-index value is provided to the LPAR via device-tree 116 where its present as an attribute in the device tree node associated with the 117 DR. 118 119 HCALL Return-values 120 =================== 121 122 After servicing the hcall, hypervisor sets the return-value in *r3* indicating 123 success or failure of the hcall. In case of a failure an error code indicates 124 the cause for error. These codes are defined and documented in arch specific 125 header [4]_. 126 127 In some cases a hcall can potentially take a long time and need to be issued 128 multiple times in order to be completely serviced. These hcalls will usually 129 accept an opaque value *continue-token* within there argument list and a 130 return value of *H_CONTINUE* indicates that hypervisor hasn't still finished 131 servicing the hcall yet. 132 133 To make such hcalls the guest need to set *continue-token == 0* for the 134 initial call and use the hypervisor returned value of *continue-token* 135 for each subsequent hcall until hypervisor returns a non *H_CONTINUE* 136 return value. 137 138 HCALL Op-codes 139 ============== 140 141 Below is a partial list of HCALLs that are supported by PHYP. For the 142 corresponding opcode values please look into the arch specific header [4]_: 143 144 **H_SCM_READ_METADATA** 145 146 | Input: *drcIndex, offset, buffer-address, numBytesToRead* 147 | Out: *numBytesRead* 148 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware* 149 150 Given a DRC Index of an NVDIMM, read N-bytes from the metadata area 151 associated with it, at a specified offset and copy it to provided buffer. 152 The metadata area stores configuration information such as label information, 153 bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage 154 area hence a separate access semantics is provided. 155 156 **H_SCM_WRITE_METADATA** 157 158 | Input: *drcIndex, offset, data, numBytesToWrite* 159 | Out: *None* 160 | Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware* 161 162 Given a DRC Index of an NVDIMM, write N-bytes to the metadata area 163 associated with it, at the specified offset and from the provided buffer. 164 165 **H_SCM_BIND_MEM** 166 167 | Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,* 168 | *targetLogicalMemoryAddress, continue-token* 169 | Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound* 170 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,* 171 | *H_Too_Big, H_P5, H_Busy* 172 173 Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range 174 *(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest 175 at *targetLogicalMemoryAddress* within guest physical address space. In 176 case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor 177 assigns a target address to the guest. The HCALL can fail if the Guest has 178 an active PTE entry to the SCM block being bound. 179 180 **H_SCM_UNBIND_MEM** 181 | Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind 182 | Out: numScmBlocksUnbound 183 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,* 184 | *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* 185 186 Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting 187 at *startingScmLogicalMemoryAddress* from guest physical address space. The 188 HCALL can fail if the Guest has an active PTE entry to the SCM block being 189 unbound. 190 191 **H_SCM_QUERY_BLOCK_MEM_BINDING** 192 193 | Input: *drcIndex, scmBlockIndex* 194 | Out: *Guest-Physical-Address* 195 | Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* 196 197 Given a DRC-Index and an SCM Block index return the guest physical address to 198 which the SCM block is mapped to. 199 200 **H_SCM_QUERY_LOGICAL_MEM_BINDING** 201 202 | Input: *Guest-Physical-Address* 203 | Out: *drcIndex, scmBlockIndex* 204 | Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* 205 206 Given a guest physical address return which DRC Index and SCM block is mapped 207 to that address. 208 209 **H_SCM_UNBIND_ALL** 210 211 | Input: *scmTargetScope, drcIndex* 212 | Out: *None* 213 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,* 214 | *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* 215 216 Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs 217 or all SCM blocks belonging to a single NVDIMM identified by its drcIndex 218 from the LPAR memory. 219 220 **H_SCM_HEALTH** 221 222 | Input: drcIndex 223 | Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)* 224 | Return Value: *H_Success, H_Parameter, H_Hardware* 225 226 Given a DRC Index return the info on predictive failure and overall health of 227 the PMEM device. The asserted bits in the health-bitmap indicate one or more states 228 (described in table below) of the PMEM device and health-bit-valid-bitmap indicate 229 which bits in health-bitmap are valid. The bits are reported in 230 reverse bit ordering for example a value of 0xC400000000000000 231 indicates bits 0, 1, and 5 are valid. 232 233 Health Bitmap Flags: 234 235 +------+-----------------------------------------------------------------------+ 236 | Bit | Definition | 237 +======+=======================================================================+ 238 | 00 | PMEM device is unable to persist memory contents. | 239 | | If the system is powered down, nothing will be saved. | 240 +------+-----------------------------------------------------------------------+ 241 | 01 | PMEM device failed to persist memory contents. Either contents were | 242 | | not saved successfully on power down or were not restored properly on | 243 | | power up. | 244 +------+-----------------------------------------------------------------------+ 245 | 02 | PMEM device contents are persisted from previous IPL. The data from | 246 | | the last boot were successfully restored. | 247 +------+-----------------------------------------------------------------------+ 248 | 03 | PMEM device contents are not persisted from previous IPL. There was no| 249 | | data to restore from the last boot. | 250 +------+-----------------------------------------------------------------------+ 251 | 04 | PMEM device memory life remaining is critically low | 252 +------+-----------------------------------------------------------------------+ 253 | 05 | PMEM device will be garded off next IPL due to failure | 254 +------+-----------------------------------------------------------------------+ 255 | 06 | PMEM device contents cannot persist due to current platform health | 256 | | status. A hardware failure may prevent data from being saved or | 257 | | restored. | 258 +------+-----------------------------------------------------------------------+ 259 | 07 | PMEM device is unable to persist memory contents in certain conditions| 260 +------+-----------------------------------------------------------------------+ 261 | 08 | PMEM device is encrypted | 262 +------+-----------------------------------------------------------------------+ 263 | 09 | PMEM device has successfully completed a requested erase or secure | 264 | | erase procedure. | 265 +------+-----------------------------------------------------------------------+ 266 |10:63 | Reserved / Unused | 267 +------+-----------------------------------------------------------------------+ 268 269 **H_SCM_PERFORMANCE_STATS** 270 271 | Input: drcIndex, resultBuffer Addr 272 | Out: None 273 | Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege* 274 275 Given a DRC Index collect the performance statistics for NVDIMM and copy them 276 to the resultBuffer. 277 278 **H_SCM_FLUSH** 279 280 | Input: *drcIndex, continue-token* 281 | Out: *continue-token* 282 | Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY* 283 284 Given a DRC Index Flush the data to backend NVDIMM device. 285 286 The hcall returns H_BUSY when the flush takes longer time and the hcall needs 287 to be issued multiple times in order to be completely serviced. The 288 *continue-token* from the output to be passed in the argument list of 289 subsequent hcalls to the hypervisor until the hcall is completely serviced 290 at which point H_SUCCESS or other error is returned by the hypervisor. 291 292 References 293 ========== 294 .. [1] "Power Architecture Platform Reference" 295 https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference 296 .. [2] "Linux on Power Architecture Platform Reference" 297 https://members.openpowerfoundation.org/document/dl/469 298 .. [3] "Definitions and Notation" Book III-Section 14.5.3 299 https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 300 .. [4] arch/powerpc/include/asm/hvcall.h 301 .. [5] "64-Bit ELF V2 ABI Specification: Power Architecture" 302 https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.