1 ============================================== 2 Scalable Vector Extension support for AArch64 3 ============================================== 4 5 Author: Dave Martin <Dave.Martin@arm.com> 6 7 Date: 4 August 2017 8 9 This document outlines briefly the interface p 10 order to support use of the ARM Scalable Vecto 11 interactions with Streaming SVE mode added by 12 (SME). 13 14 This is an outline of the most important featu 15 intended to be exhaustive. 16 17 This document does not aim to describe the SVE 18 model. To aid understanding, a minimal descri 19 model features for SVE is included in Appendix 20 21 22 1. General 23 ----------- 24 25 * SVE registers Z0..Z31, P0..P15 and FFR and t 26 tracked per-thread. 27 28 * In streaming mode FFR is not accessible unle 29 in the system, when it is not supported and 30 access streaming mode FFR is read and writte 31 32 * The presence of SVE is reported to userspace 33 AT_HWCAP entry. Presence of this flag impli 34 instructions and registers, and the Linux-sp 35 described in this document. SVE is reported 36 37 * Support for the execution of SVE instruction 38 detected by reading the CPU ID register ID_A 39 instruction, and checking that the value of 40 41 It does not guarantee the presence of the sy 42 following sections: software that needs to v 43 present must check for HWCAP_SVE instead. 44 45 * On hardware that supports the SVE2 extension 46 be reported in the AT_HWCAP2 aux vector entr 47 optional extensions to SVE2 may be reported 48 49 HWCAP2_SVE2 50 HWCAP2_SVEAES 51 HWCAP2_SVEPMULL 52 HWCAP2_SVEBITPERM 53 HWCAP2_SVESHA3 54 HWCAP2_SVESM4 55 HWCAP2_SVE2P1 56 57 This list may be extended over time as the S 58 59 These extensions are also reported via the C 60 which userspace can read using an MRS instru 61 cpu-feature-registers.txt for details. 62 63 * On hardware that supports the SME extensions 64 reported in the AT_HWCAP2 aux vector entry. 65 streaming mode which provides a subset of th 66 separate SME vector length and the same Z/V 67 for more details. 68 69 * Debuggers should restrict themselves to inte 70 NT_ARM_SVE regset. The recommended way of d 71 is to connect to a target process first and 72 ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &i 73 present and streaming SVE mode is in use the 74 will be read via NT_ARM_SVE and NT_ARM_SVE w 75 in the target. 76 77 * Whenever SVE scalable register values (Zn, P 78 between userspace and the kernel, the regist 79 an endianness-invariant layout, with bits [( 80 byte offset i from the start of the memory r 81 example the signal frame (struct sve_context 82 (struct user_sve_header) and associated data 83 84 Beware that on big-endian systems this resul 85 for the FPSIMD V-registers, which are stored 86 values, with bits [(127 - 8 * i) : (120 - 8 87 byte offset i. (struct fpsimd_context, stru 88 89 90 2. Vector length terminology 91 ----------------------------- 92 93 The size of an SVE vector (Z) register is refe 94 95 To avoid confusion about the units used to exp 96 adopts the following conventions: 97 98 * Vector length (VL) = size of a Z-register in 99 100 * Vector quadwords (VQ) = size of a Z-register 101 102 (So, VL = 16 * VQ.) 103 104 The VQ convention is used where the underlying 105 as in data structure definitions. In most oth 106 is used. This is consistent with the meaning 107 the SVE instruction set architecture. 108 109 110 3. System call behaviour 111 ------------------------- 112 113 * On syscall, V0..V31 are preserved (as withou 114 Z0..Z31 are preserved. All other bits of Z0 115 become zero on return from a syscall. 116 117 * The SVE registers are not used to pass argum 118 any syscall. 119 120 * All other SVE state of a thread, including t 121 length, the state of the PR_SVE_VL_INHERIT f 122 length (if any), is preserved across all sys 123 exceptions for execve() described in section 124 125 In particular, on return from a fork() or cl 126 process or thread share identical SVE config 127 parent before the call. 128 129 130 4. Signal handling 131 ------------------- 132 133 * A new signal frame record sve_context encode 134 delivery. [1] 135 136 * This record is supplementary to fpsimd_conte 137 are only present in fpsimd_context. For con 138 is duplicated between sve_context and fpsimd 139 140 * The record contains a flag field which inclu 141 if set indicates that the thread is in strea 142 and register data (if present) describe the 143 length. 144 145 * The signal frame record for SVE always conta 146 the thread's vector length (in sve_context.v 147 148 * The SVE registers may or may not be included 149 whether the registers are live for the threa 150 and only if: 151 sve_context.head.size >= SVE_SIG_CONTEXT_SIZ 152 153 * If the registers are present, the remainder 154 size and layout. Macros SVE_SIG_* are defin 155 the members. 156 157 * Each scalable register (Zn, Pn, FFR) is stor 158 layout, with bits [(8 * i + 7) : (8 * i)] st 159 start of the register's representation in me 160 161 * If the SVE context is too big to fit in sigc 162 space is allocated on the stack, an extra_co 163 __reserved[] referencing this space. sve_co 164 extra space. Refer to [1] for further detai 165 166 167 5. Signal return 168 ----------------- 169 170 When returning from a signal handler: 171 172 * If there is no sve_context record in the sig 173 present but contains no register data as des 174 then the SVE registers/bits become non-live 175 176 * If sve_context is present in the signal fram 177 data, the SVE registers become live and are 178 data. However, for backward compatibility r 179 are always restored from the corresponding m 180 and not from sve_context. The remaining bit 181 182 * Inclusion of fpsimd_context in the signal fr 183 irrespective of whether sve_context is prese 184 185 * The vector length cannot be changed via sign 186 the signal frame does not match the current 187 attempt is treated as illegal, resulting in 188 189 * It is permitted to enter or leave streaming 190 the SVE_SIG_FLAG_SM flag but applications sh 191 when doing so sve_context.vl and any registe 192 vector length in the new mode. 193 194 195 6. prctl extensions 196 -------------------- 197 198 Some new prctl() calls are added to allow prog 199 length: 200 201 prctl(PR_SVE_SET_VL, unsigned long arg) 202 203 Sets the vector length of the calling thre 204 arg == vl | flags. Other threads of the c 205 206 vl is the desired vector length, where sve 207 208 flags: 209 210 PR_SVE_VL_INHERIT 211 212 Inherit the current vector length 213 vector length is reset to the syst 214 Section 9.) 215 216 PR_SVE_SET_VL_ONEXEC 217 218 Defer the requested vector length 219 performed by this thread. 220 221 The effect is equivalent to implic 222 call immediately after the next ex 223 224 prctl(PR_SVE_SET_VL, arg & ~PR 225 226 This allows launching of a new pro 227 length, while avoiding runtime sid 228 229 230 Without PR_SVE_SET_VL_ONEXEC, the 231 immediately. 232 233 234 Return value: a nonnegative on success, or 235 EINVAL: SVE not supported, invalid vec 236 invalid flags. 237 238 239 On success: 240 241 * Either the calling thread's vector lengt 242 to be applied at the next execve() by th 243 PR_SVE_SET_VL_ONEXEC is present in arg), 244 supported by the system that is less tha 245 SVE_VL_MAX, the value set will be the la 246 system. 247 248 * Any previously outstanding deferred vect 249 thread is cancelled. 250 251 * The returned value describes the resulti 252 PR_SVE_GET_VL. The vector length report 253 current vector length for this thread if 254 present in arg; otherwise, the reported 255 vector length that will be applied at th 256 thread. 257 258 * Changing the vector length causes all of 259 Z0..Z31 except for Z0 bits [127:0] .. Z3 260 unspecified. Calling PR_SVE_SET_VL with 261 vector length, or calling PR_SVE_SET_VL 262 flag, does not constitute a change to th 263 264 265 prctl(PR_SVE_GET_VL) 266 267 Gets the vector length of the calling thre 268 269 The following flag may be OR-ed into the r 270 271 PR_SVE_VL_INHERIT 272 273 Vector length will be inherited ac 274 275 There is no way to determine whether there 276 vector length change (which would only nor 277 fork() or vfork() and the corresponding ex 278 279 To extract the vector length from the resu 280 PR_SVE_VL_LEN_MASK. 281 282 Return value: a nonnegative value on succe 283 EINVAL: SVE not supported. 284 285 286 7. ptrace extensions 287 --------------------- 288 289 * New regsets NT_ARM_SVE and NT_ARM_SSVE are d 290 PTRACE_GETREGSET and PTRACE_SETREGSET. NT_AR 291 streaming mode SVE registers and NT_ARM_SVE 292 non-streaming mode SVE registers. 293 294 In this description a register set is referr 295 the target is in the appropriate streaming o 296 using data beyond the subset shared with the 297 298 Refer to [2] for definitions. 299 300 The regset data starts with struct user_sve_he 301 302 size 303 304 Size of the complete regset, in bytes. 305 This depends on vl and possibly on oth 306 307 If a call to PTRACE_GETREGSET requests 308 size, the caller can allocate a larger 309 read the complete regset. 310 311 max_size 312 313 Maximum size in bytes that the regset 314 thread. The regset won't grow bigger 315 thread changes its vector length etc. 316 317 vl 318 319 Target thread's current vector length, 320 321 max_vl 322 323 Maximum possible vector length for the 324 325 flags 326 327 at most one of 328 329 SVE_PT_REGS_FPSIMD 330 331 SVE registers are not live (GE 332 non-live (SETREGSET). 333 334 The payload is of type struct 335 meaning as for NT_PRFPREG, sta 336 SVE_PT_FPSIMD_OFFSET from the 337 338 Extra data might be appended i 339 payload should be obtained usi 340 341 vq should be obtained using sv 342 343 or 344 345 SVE_PT_REGS_SVE 346 347 SVE registers are live (GETREG 348 (SETREGSET). 349 350 The payload contains the SVE r 351 SVE_PT_SVE_OFFSET from the sta 352 size SVE_PT_SVE_SIZE(vq, flags 353 354 ... OR-ed with zero or more of the fol 355 meaning and behaviour as the correspon 356 357 SVE_PT_VL_INHERIT 358 359 SVE_PT_VL_ONEXEC (SETREGSET only). 360 361 If neither FPSIMD nor SVE flags are pr 362 payload is available, this is only pos 363 364 365 * The effects of changing the vector length an 366 those documented for PR_SVE_SET_VL. 367 368 The caller must make a further GETREGSET cal 369 actually set by SETREGSET, unless is it know 370 VL is supported. 371 372 * In the SVE_PT_REGS_SVE case, the size and la 373 the header fields. The SVE_PT_SVE_*() macro 374 access to the members. 375 376 * In either case, for SETREGSET it is permissi 377 case only the vector length and flags are ch 378 consequences of those changes). 379 380 * In systems supporting SME when in streaming 381 NT_REG_SVE will return only the user_sve_hea 382 similarly a GETREGSET for NT_REG_SSVE will n 383 when not in streaming mode. 384 385 * A GETREGSET for NT_ARM_SSVE will never retur 386 387 * For SETREGSET, if an SVE_PT_REGS_SVE payload 388 requested VL is not supported, the effect wi 389 payload were omitted, except that an EIO err 390 attempt is made to translate the payload dat 391 for the vector length actually set. The thr 392 preserved, but the remaining bits of the SVE 393 unspecified. It is up to the caller to tran 394 for the actual VL and retry. 395 396 * Where SME is implemented it is not possible 397 state for normal SVE when in streaming mode, 398 register state when in normal mode, regardle 399 behaviour of the hardware for sharing data b 400 401 * Any SETREGSET of NT_ARM_SVE will exit stream 402 streaming mode and any SETREGSET of NT_ARM_S 403 if the target was not in streaming mode. 404 405 * The effect of writing a partial, incomplete 406 407 408 8. ELF coredump extensions 409 --------------------------- 410 411 * NT_ARM_SVE and NT_ARM_SSVE notes will be add 412 each thread of the dumped process. The cont 413 data that would have been read if a PTRACE_G 414 type were executed for each thread when the 415 416 9. System runtime configuration 417 -------------------------------- 418 419 * To mitigate the ABI impact of expansion of t 420 mechanism is provided for administrators, di 421 to set the default vector length for userspa 422 423 /proc/sys/abi/sve_default_vector_length 424 425 Writing the text representation of an inte 426 default vector length to the specified val 427 using the same rules as for setting vector 428 429 The result can be determined by reopening 430 contents. 431 432 At boot, the default vector length is init 433 supported vector length, whichever is smal 434 vector length of the init process (PID 1). 435 436 Reading this file returns the current syst 437 438 * At every execve() call, the new vector lengt 439 the system default vector length, unless 440 441 * PR_SVE_VL_INHERIT (or equivalently SVE_P 442 calling thread, or 443 444 * a deferred vector length change is pendi 445 PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ 446 447 * Modifying the system default vector length d 448 of any existing process or thread that does 449 450 10. Perf extensions 451 -------------------------------- 452 453 * The arm64 specific DWARF standard [5] added 454 at index 46. This register is used for DWARF 455 SVE registers are pushed onto the stack. 456 457 * Its value is equivalent to the current SVE v 458 by 64. 459 460 * The value is included in Perf samples in the 461 PERF_SAMPLE_REGS_USER is set and the sample_ 462 463 * The value is the current value at the time t 464 change over time. 465 466 * If the system doesn't support SVE when perf_ 467 settings, the event will fail to open. 468 469 Appendix A. SVE programmer's model (informati 470 ============================================== 471 472 This section provides a minimal description of 473 ARMv8-A programmer's model that are relevant t 474 475 Note: This section is for information only and 476 to replace any architectural specification. 477 478 A.1. Registers 479 --------------- 480 481 In A64 state, SVE adds the following: 482 483 * 32 8VL-bit vector registers Z0..Z31 484 For each Zn, Zn bits [127:0] alias the ARMv8 485 486 A register write using a Vn register name ze 487 Zn except for bits [127:0]. 488 489 * 16 VL-bit predicate registers P0..P15 490 491 * 1 VL-bit special-purpose predicate register 492 493 * a VL "pseudo-register" that determines the s 494 495 The SVE instruction set architecture provide 496 Instead, it can be modified only by EL1 and 497 system registers. 498 499 * The value of VL can be configured at runtime 500 16 <= VL <= VLmax, where VL must be a multip 501 502 * The maximum vector length is determined by t 503 16 <= VLmax <= 256. 504 505 (The SVE architecture specifies 256, but per 506 revisions to raise this limit.) 507 508 * FPSR and FPCR are retained from ARMv8-A, and 509 operations in a similar way to the way in wh 510 floating-point operations:: 511 512 8VL-1 128 513 +---- //// ------- 514 Z0 | : 515 : 516 Z7 | : 517 Z8 | : 518 : 519 Z15 | : 520 Z16 | : 521 : 522 Z31 | : 523 +---- //// ------- 524 525 VL-1 0 526 +---- //// --+ FPS 527 P0 | | 528 : | | *FPC 529 P15 | | 530 +---- //// --+ 531 FFR | | 532 +---- //// --+ V 533 534 535 (*) callee-save: 536 This only applies to bits [63:0] of Z-/V-r 537 FPCR contains callee-save and caller-save 538 539 540 A.2. Procedure call standard 541 ----------------------------- 542 543 The ARMv8-A base procedure call standard is ex 544 the additional SVE register state: 545 546 * All SVE register bits that are not shared wi 547 548 * Z8 bits [63:0] .. Z15 bits [63:0] are callee 549 550 This follows from the way these bits are map 551 save in the base procedure call standard. 552 553 554 Appendix B. ARMv8-A FP/SIMD programmer's mode 555 ============================================== 556 557 Note: This section is for information only and 558 to replace any architectural specification. 559 560 Refer to [4] for more information. 561 562 ARMv8-A defines the following floating-point / 563 564 * 32 128-bit vector registers V0..V31 565 * 2 32-bit status/control registers FPSR, FPCR 566 567 :: 568 569 127 0 bit index 570 +---------------+ 571 V0 | | 572 : : : 573 V7 | | 574 * V8 | | 575 : : : : 576 *V15 | | 577 V16 | | 578 : : : 579 V31 | | 580 +---------------+ 581 582 31 0 583 +-------+ 584 FPSR | | 585 +-------+ 586 *FPCR | | 587 +-------+ 588 589 (*) callee-save: 590 This only applies to bits [63:0] of V-regi 591 FPCR contains a mixture of callee-save and 592 593 594 References 595 ========== 596 597 [1] arch/arm64/include/uapi/asm/sigcontext.h 598 AArch64 Linux signal ABI definitions 599 600 [2] arch/arm64/include/uapi/asm/ptrace.h 601 AArch64 Linux ptrace ABI definitions 602 603 [3] Documentation/arch/arm64/cpu-feature-regis 604 605 [4] ARM IHI0055C 606 http://infocenter.arm.com/help/topic/com.a 607 http://infocenter.arm.com/help/topic/com.a 608 Procedure Call Standard for the ARM 64-bit 609 610 [5] https://github.com/ARM-software/abi-aa/blo
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.