~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/arch/powerpc/papr_hcalls.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 ===========================
  4 Hypercall Op-codes (hcalls)
  5 ===========================
  6 
  7 Overview
  8 =========
  9 
 10 Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
 11 specification [1]_ which describes the run-time environment for a guest
 12 operating system and how it should interact with the hypervisor for
 13 privileged operations. Currently there are two PAPR compliant hypervisors:
 14 
 15 - **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
 16   IBM-i and  Linux as supported guests (termed as Logical Partitions
 17   or LPARS). It supports the full PAPR specification.
 18 
 19 - **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
 20   Though it only implements a subset of PAPR specification called LoPAPR [2]_.
 21 
 22 On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
 23 a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
 24 issue hypercalls to the hypervisor whenever it needs to perform an action
 25 that is hypervisor privileged [3]_ or for other services managed by the
 26 hypervisor.
 27 
 28 Hence a Hypercall (hcall) is essentially a request by the pseries guest
 29 asking hypervisor to perform a privileged operation on behalf of the guest. The
 30 guest issues a with necessary input operands. The hypervisor after performing
 31 the privilege operation returns a status code and output operands back to the
 32 guest.
 33 
 34 HCALL ABI
 35 =========
 36 The ABI specification for a hcall between a pseries guest and PAPR hypervisor
 37 is covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
 38 done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
 39 and any in-arguments for the hcall are provided in registers *r4-r12*. If values
 40 have to be passed through a memory buffer, the data stored in that buffer should be
 41 in Big-endian byte order.
 42 
 43 Once control returns back to the guest after hypervisor has serviced the
 44 'HVCS' instruction the return value of the hcall is available in *r3* and any
 45 out values are returned in registers *r4-r12*. Again like in case of in-arguments,
 46 any out values stored in a memory buffer will be in Big-endian byte order.
 47 
 48 Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
 49 in a arch specific header [4]_ to issue hcalls from the linux kernel
 50 running as pseries guest.
 51 
 52 Register Conventions
 53 ====================
 54 
 55 Any hcall should follow same register convention as described in section 2.2.1.1
 56 of "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below
 57 summarizes these conventions:
 58 
 59 +----------+----------+-------------------------------------------+
 60 | Register |Volatile  |  Purpose                                  |
 61 | Range    |(Y/N)     |                                           |
 62 +==========+==========+===========================================+
 63 |   r0     |    Y     |  Optional-usage                           |
 64 +----------+----------+-------------------------------------------+
 65 |   r1     |    N     |  Stack Pointer                            |
 66 +----------+----------+-------------------------------------------+
 67 |   r2     |    N     |  TOC                                      |
 68 +----------+----------+-------------------------------------------+
 69 |   r3     |    Y     |  hcall opcode/return value                |
 70 +----------+----------+-------------------------------------------+
 71 |  r4-r10  |    Y     |  in and out values                        |
 72 +----------+----------+-------------------------------------------+
 73 |   r11    |    Y     |  Optional-usage/Environmental pointer     |
 74 +----------+----------+-------------------------------------------+
 75 |   r12    |    Y     |  Optional-usage/Function entry address at |
 76 |          |          |  global entry point                       |
 77 +----------+----------+-------------------------------------------+
 78 |   r13    |    N     |  Thread-Pointer                           |
 79 +----------+----------+-------------------------------------------+
 80 |  r14-r31 |    N     |  Local Variables                          |
 81 +----------+----------+-------------------------------------------+
 82 |    LR    |    Y     |  Link Register                            |
 83 +----------+----------+-------------------------------------------+
 84 |   CTR    |    Y     |  Loop Counter                             |
 85 +----------+----------+-------------------------------------------+
 86 |   XER    |    Y     |  Fixed-point exception register.          |
 87 +----------+----------+-------------------------------------------+
 88 |  CR0-1   |    Y     |  Condition register fields.               |
 89 +----------+----------+-------------------------------------------+
 90 |  CR2-4   |    N     |  Condition register fields.               |
 91 +----------+----------+-------------------------------------------+
 92 |  CR5-7   |    Y     |  Condition register fields.               |
 93 +----------+----------+-------------------------------------------+
 94 |  Others  |    N     |                                           |
 95 +----------+----------+-------------------------------------------+
 96 
 97 DRC & DRC Indexes
 98 =================
 99 ::
100 
101      DR1                                  Guest
102      +--+        +------------+         +---------+
103      |  | <----> |            |         |  User   |
104      +--+  DRC1  |            |   DRC   |  Space  |
105                  |    PAPR    |  Index  +---------+
106      DR2         | Hypervisor |         |         |
107      +--+        |            | <-----> |  Kernel |
108      |  | <----> |            |  Hcall  |         |
109      +--+  DRC2  +------------+         +---------+
110 
111 PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc
112 available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to
113 an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)
114 to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number
115 called DRC-Index. The DRC-index value is provided to the LPAR via device-tree
116 where its present as an attribute in the device tree node associated with the
117 DR.
118 
119 HCALL Return-values
120 ===================
121 
122 After servicing the hcall, hypervisor sets the return-value in *r3* indicating
123 success or failure of the hcall. In case of a failure an error code indicates
124 the cause for error. These codes are defined and documented in arch specific
125 header [4]_.
126 
127 In some cases a hcall can potentially take a long time and need to be issued
128 multiple times in order to be completely serviced. These hcalls will usually
129 accept an opaque value *continue-token* within there argument list and a
130 return value of *H_CONTINUE* indicates that hypervisor hasn't still finished
131 servicing the hcall yet.
132 
133 To make such hcalls the guest need to set *continue-token == 0* for the
134 initial call and use the hypervisor returned value of *continue-token*
135 for each subsequent hcall until hypervisor returns a non *H_CONTINUE*
136 return value.
137 
138 HCALL Op-codes
139 ==============
140 
141 Below is a partial list of HCALLs that are supported by PHYP. For the
142 corresponding opcode values please look into the arch specific header [4]_:
143 
144 **H_SCM_READ_METADATA**
145 
146 | Input: *drcIndex, offset, buffer-address, numBytesToRead*
147 | Out: *numBytesRead*
148 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware*
149 
150 Given a DRC Index of an NVDIMM, read N-bytes from the metadata area
151 associated with it, at a specified offset and copy it to provided buffer.
152 The metadata area stores configuration information such as label information,
153 bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage
154 area hence a separate access semantics is provided.
155 
156 **H_SCM_WRITE_METADATA**
157 
158 | Input: *drcIndex, offset, data, numBytesToWrite*
159 | Out: *None*
160 | Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware*
161 
162 Given a DRC Index of an NVDIMM, write N-bytes to the metadata area
163 associated with it, at the specified offset and from the provided buffer.
164 
165 **H_SCM_BIND_MEM**
166 
167 | Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,*
168 | *targetLogicalMemoryAddress, continue-token*
169 | Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound*
170 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,*
171 | *H_Too_Big, H_P5, H_Busy*
172 
173 Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range
174 *(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest
175 at *targetLogicalMemoryAddress* within guest physical address space. In
176 case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor
177 assigns a target address to the guest. The HCALL can fail if the Guest has
178 an active PTE entry to the SCM block being bound.
179 
180 **H_SCM_UNBIND_MEM**
181 | Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
182 | Out: numScmBlocksUnbound
183 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,*
184 | *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
185 
186 Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting
187 at *startingScmLogicalMemoryAddress* from guest physical address space. The
188 HCALL can fail if the Guest has an active PTE entry to the SCM block being
189 unbound.
190 
191 **H_SCM_QUERY_BLOCK_MEM_BINDING**
192 
193 | Input: *drcIndex, scmBlockIndex*
194 | Out: *Guest-Physical-Address*
195 | Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
196 
197 Given a DRC-Index and an SCM Block index return the guest physical address to
198 which the SCM block is mapped to.
199 
200 **H_SCM_QUERY_LOGICAL_MEM_BINDING**
201 
202 | Input: *Guest-Physical-Address*
203 | Out: *drcIndex, scmBlockIndex*
204 | Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
205 
206 Given a guest physical address return which DRC Index and SCM block is mapped
207 to that address.
208 
209 **H_SCM_UNBIND_ALL**
210 
211 | Input: *scmTargetScope, drcIndex*
212 | Out: *None*
213 | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,*
214 | *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
215 
216 Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs
217 or all SCM blocks belonging to a single NVDIMM identified by its drcIndex
218 from the LPAR memory.
219 
220 **H_SCM_HEALTH**
221 
222 | Input: drcIndex
223 | Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
224 | Return Value: *H_Success, H_Parameter, H_Hardware*
225 
226 Given a DRC Index return the info on predictive failure and overall health of
227 the PMEM device. The asserted bits in the health-bitmap indicate one or more states
228 (described in table below) of the PMEM device and health-bit-valid-bitmap indicate
229 which bits in health-bitmap are valid. The bits are reported in
230 reverse bit ordering for example a value of 0xC400000000000000
231 indicates bits 0, 1, and 5 are valid.
232 
233 Health Bitmap Flags:
234 
235 +------+-----------------------------------------------------------------------+
236 |  Bit |               Definition                                              |
237 +======+=======================================================================+
238 |  00  | PMEM device is unable to persist memory contents.                     |
239 |      | If the system is powered down, nothing will be saved.                 |
240 +------+-----------------------------------------------------------------------+
241 |  01  | PMEM device failed to persist memory contents. Either contents were   |
242 |      | not saved successfully on power down or were not restored properly on |
243 |      | power up.                                                             |
244 +------+-----------------------------------------------------------------------+
245 |  02  | PMEM device contents are persisted from previous IPL. The data from   |
246 |      | the last boot were successfully restored.                             |
247 +------+-----------------------------------------------------------------------+
248 |  03  | PMEM device contents are not persisted from previous IPL. There was no|
249 |      | data to restore from the last boot.                                   |
250 +------+-----------------------------------------------------------------------+
251 |  04  | PMEM device memory life remaining is critically low                   |
252 +------+-----------------------------------------------------------------------+
253 |  05  | PMEM device will be garded off next IPL due to failure                |
254 +------+-----------------------------------------------------------------------+
255 |  06  | PMEM device contents cannot persist due to current platform health    |
256 |      | status. A hardware failure may prevent data from being saved or       |
257 |      | restored.                                                             |
258 +------+-----------------------------------------------------------------------+
259 |  07  | PMEM device is unable to persist memory contents in certain conditions|
260 +------+-----------------------------------------------------------------------+
261 |  08  | PMEM device is encrypted                                              |
262 +------+-----------------------------------------------------------------------+
263 |  09  | PMEM device has successfully completed a requested erase or secure    |
264 |      | erase procedure.                                                      |
265 +------+-----------------------------------------------------------------------+
266 |10:63 | Reserved / Unused                                                     |
267 +------+-----------------------------------------------------------------------+
268 
269 **H_SCM_PERFORMANCE_STATS**
270 
271 | Input: drcIndex, resultBuffer Addr
272 | Out: None
273 | Return Value:  *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege*
274 
275 Given a DRC Index collect the performance statistics for NVDIMM and copy them
276 to the resultBuffer.
277 
278 **H_SCM_FLUSH**
279 
280 | Input: *drcIndex, continue-token*
281 | Out: *continue-token*
282 | Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY*
283 
284 Given a DRC Index Flush the data to backend NVDIMM device.
285 
286 The hcall returns H_BUSY when the flush takes longer time and the hcall needs
287 to be issued multiple times in order to be completely serviced. The
288 *continue-token* from the output to be passed in the argument list of
289 subsequent hcalls to the hypervisor until the hcall is completely serviced
290 at which point H_SUCCESS or other error is returned by the hypervisor.
291 
292 References
293 ==========
294 .. [1] "Power Architecture Platform Reference"
295        https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
296 .. [2] "Linux on Power Architecture Platform Reference"
297        https://members.openpowerfoundation.org/document/dl/469
298 .. [3] "Definitions and Notation" Book III-Section 14.5.3
299        https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
300 .. [4] arch/powerpc/include/asm/hvcall.h
301 .. [5] "64-Bit ELF V2 ABI Specification: Power Architecture"
302        https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php