1 .. SPDX-License-Identifier: (GPL-2.0+ OR MIT) 1 .. SPDX-License-Identifier: (GPL-2.0+ OR MIT) 2 2 3 ==================== 3 ==================== 4 Asynchronous VM_BIND 4 Asynchronous VM_BIND 5 ==================== 5 ==================== 6 6 7 Nomenclature: 7 Nomenclature: 8 ============= 8 ============= 9 9 10 * ``VRAM``: On-device memory. Sometimes referr 10 * ``VRAM``: On-device memory. Sometimes referred to as device local memory. 11 11 12 * ``gpu_vm``: A virtual GPU address space. Typ 12 * ``gpu_vm``: A virtual GPU address space. Typically per process, but 13 can be shared by multiple processes. 13 can be shared by multiple processes. 14 14 15 * ``VM_BIND``: An operation or a list of opera 15 * ``VM_BIND``: An operation or a list of operations to modify a gpu_vm using 16 an IOCTL. The operations include mapping and 16 an IOCTL. The operations include mapping and unmapping system- or 17 VRAM memory. 17 VRAM memory. 18 18 19 * ``syncobj``: A container that abstracts sync 19 * ``syncobj``: A container that abstracts synchronization objects. The 20 synchronization objects can be either generi 20 synchronization objects can be either generic, like dma-fences or 21 driver specific. A syncobj typically indicat 21 driver specific. A syncobj typically indicates the type of the 22 underlying synchronization object. 22 underlying synchronization object. 23 23 24 * ``in-syncobj``: Argument to a VM_BIND IOCTL, 24 * ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation waits 25 for these before starting. 25 for these before starting. 26 26 27 * ``out-syncobj``: Argument to a VM_BIND_IOCTL 27 * ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation 28 signals these when the bind operation is com 28 signals these when the bind operation is complete. 29 29 30 * ``dma-fence``: A cross-driver synchronizatio 30 * ``dma-fence``: A cross-driver synchronization object. A basic 31 understanding of dma-fences is required to d 31 understanding of dma-fences is required to digest this 32 document. Please refer to the ``DMA Fences`` 32 document. Please refer to the ``DMA Fences`` section of the 33 :doc:`dma-buf doc </driver-api/dma-buf>`. 33 :doc:`dma-buf doc </driver-api/dma-buf>`. 34 34 35 * ``memory fence``: A synchronization object, 35 * ``memory fence``: A synchronization object, different from a dma-fence. 36 A memory fence uses the value of a specified 36 A memory fence uses the value of a specified memory location to determine 37 signaled status. A memory fence can be await 37 signaled status. A memory fence can be awaited and signaled by both 38 the GPU and CPU. Memory fences are sometimes 38 the GPU and CPU. Memory fences are sometimes referred to as 39 user-fences, userspace-fences or gpu futexes 39 user-fences, userspace-fences or gpu futexes and do not necessarily obey 40 the dma-fence rule of signaling within a "re 40 the dma-fence rule of signaling within a "reasonable amount of time". 41 The kernel should thus avoid waiting for mem 41 The kernel should thus avoid waiting for memory fences with locks held. 42 42 43 * ``long-running workload``: A workload that m 43 * ``long-running workload``: A workload that may take more than the 44 current stipulated dma-fence maximum signal 44 current stipulated dma-fence maximum signal delay to complete and 45 which therefore needs to set the gpu_vm or t 45 which therefore needs to set the gpu_vm or the GPU execution context in 46 a certain mode that disallows completion dma 46 a certain mode that disallows completion dma-fences. 47 47 48 * ``exec function``: An exec function is a fun 48 * ``exec function``: An exec function is a function that revalidates all 49 affected gpu_vmas, submits a GPU command bat 49 affected gpu_vmas, submits a GPU command batch and registers the 50 dma_fence representing the GPU command's act 50 dma_fence representing the GPU command's activity with all affected 51 dma_resvs. For completeness, although not co 51 dma_resvs. For completeness, although not covered by this document, 52 it's worth mentioning that an exec function 52 it's worth mentioning that an exec function may also be the 53 revalidation worker that is used by some dri 53 revalidation worker that is used by some drivers in compute / 54 long-running mode. 54 long-running mode. 55 55 56 * ``bind context``: A context identifier used 56 * ``bind context``: A context identifier used for the VM_BIND 57 operation. VM_BIND operations that use the s 57 operation. VM_BIND operations that use the same bind context can be 58 assumed, where it matters, to complete in or 58 assumed, where it matters, to complete in order of submission. No such 59 assumptions can be made for VM_BIND operatio 59 assumptions can be made for VM_BIND operations using separate bind contexts. 60 60 61 * ``UMD``: User-mode driver. 61 * ``UMD``: User-mode driver. 62 62 63 * ``KMD``: Kernel-mode driver. 63 * ``KMD``: Kernel-mode driver. 64 64 65 65 66 Synchronous / Asynchronous VM_BIND operation 66 Synchronous / Asynchronous VM_BIND operation 67 ============================================ 67 ============================================ 68 68 69 Synchronous VM_BIND 69 Synchronous VM_BIND 70 ___________________ 70 ___________________ 71 With Synchronous VM_BIND, the VM_BIND operatio 71 With Synchronous VM_BIND, the VM_BIND operations all complete before the 72 IOCTL returns. A synchronous VM_BIND takes nei 72 IOCTL returns. A synchronous VM_BIND takes neither in-fences nor 73 out-fences. Synchronous VM_BIND may block and 73 out-fences. Synchronous VM_BIND may block and wait for GPU operations; 74 for example swap-in or clearing, or even previ 74 for example swap-in or clearing, or even previous binds. 75 75 76 Asynchronous VM_BIND 76 Asynchronous VM_BIND 77 ____________________ 77 ____________________ 78 Asynchronous VM_BIND accepts both in-syncobjs 78 Asynchronous VM_BIND accepts both in-syncobjs and out-syncobjs. While the 79 IOCTL may return immediately, the VM_BIND oper 79 IOCTL may return immediately, the VM_BIND operations wait for the in-syncobjs 80 before modifying the GPU page-tables, and sign 80 before modifying the GPU page-tables, and signal the out-syncobjs when 81 the modification is done in the sense that the 81 the modification is done in the sense that the next exec function that 82 awaits for the out-syncobjs will see the chang 82 awaits for the out-syncobjs will see the change. Errors are reported 83 synchronously. 83 synchronously. 84 In low-memory situations the implementation ma 84 In low-memory situations the implementation may block, performing the 85 VM_BIND synchronously, because there might not 85 VM_BIND synchronously, because there might not be enough memory 86 immediately available for preparing the asynch 86 immediately available for preparing the asynchronous operation. 87 87 88 If the VM_BIND IOCTL takes a list or an array 88 If the VM_BIND IOCTL takes a list or an array of operations as an argument, 89 the in-syncobjs needs to signal before the fir 89 the in-syncobjs needs to signal before the first operation starts to 90 execute, and the out-syncobjs signal after the 90 execute, and the out-syncobjs signal after the last operation 91 completes. Operations in the operation list ca 91 completes. Operations in the operation list can be assumed, where it 92 matters, to complete in order. 92 matters, to complete in order. 93 93 94 Since asynchronous VM_BIND operations may use 94 Since asynchronous VM_BIND operations may use dma-fences embedded in 95 out-syncobjs and internally in KMD to signal b 95 out-syncobjs and internally in KMD to signal bind completion, any 96 memory fences given as VM_BIND in-fences need 96 memory fences given as VM_BIND in-fences need to be awaited 97 synchronously before the VM_BIND ioctl returns 97 synchronously before the VM_BIND ioctl returns, since dma-fences, 98 required to signal in a reasonable amount of t 98 required to signal in a reasonable amount of time, can never be made 99 to depend on memory fences that don't have suc 99 to depend on memory fences that don't have such a restriction. 100 100 101 The purpose of an Asynchronous VM_BIND operati 101 The purpose of an Asynchronous VM_BIND operation is for user-mode 102 drivers to be able to pipeline interleaved gpu 102 drivers to be able to pipeline interleaved gpu_vm modifications and 103 exec functions. For long-running workloads, su 103 exec functions. For long-running workloads, such pipelining of a bind 104 operation is not allowed and any in-fences nee 104 operation is not allowed and any in-fences need to be awaited 105 synchronously. The reason for this is twofold. 105 synchronously. The reason for this is twofold. First, any memory 106 fences gated by a long-running workload and us 106 fences gated by a long-running workload and used as in-syncobjs for the 107 VM_BIND operation will need to be awaited sync 107 VM_BIND operation will need to be awaited synchronously anyway (see 108 above). Second, any dma-fences used as in-sync 108 above). Second, any dma-fences used as in-syncobjs for VM_BIND 109 operations for long-running workloads will not 109 operations for long-running workloads will not allow for pipelining 110 anyway since long-running workloads don't allo 110 anyway since long-running workloads don't allow for dma-fences as 111 out-syncobjs, so while theoretically possible 111 out-syncobjs, so while theoretically possible the use of them is 112 questionable and should be rejected until ther 112 questionable and should be rejected until there is a valuable use-case. 113 Note that this is not a limitation imposed by 113 Note that this is not a limitation imposed by dma-fence rules, but 114 rather a limitation imposed to keep KMD implem 114 rather a limitation imposed to keep KMD implementation simple. It does 115 not affect using dma-fences as dependencies fo 115 not affect using dma-fences as dependencies for the long-running 116 workload itself, which is allowed by dma-fence 116 workload itself, which is allowed by dma-fence rules, but rather for 117 the VM_BIND operation only. 117 the VM_BIND operation only. 118 118 119 An asynchronous VM_BIND operation may take sub 119 An asynchronous VM_BIND operation may take substantial time to 120 complete and signal the out_fence. In particul 120 complete and signal the out_fence. In particular if the operation is 121 deeply pipelined behind other VM_BIND operatio 121 deeply pipelined behind other VM_BIND operations and workloads 122 submitted using exec functions. In that case, 122 submitted using exec functions. In that case, UMD might want to avoid a 123 subsequent VM_BIND operation to be queued behi 123 subsequent VM_BIND operation to be queued behind the first one if 124 there are no explicit dependencies. In order t 124 there are no explicit dependencies. In order to circumvent such a queue-up, a 125 VM_BIND implementation may allow for VM_BIND c 125 VM_BIND implementation may allow for VM_BIND contexts to be 126 created. For each context, VM_BIND operations 126 created. For each context, VM_BIND operations will be guaranteed to 127 complete in the order they were submitted, but 127 complete in the order they were submitted, but that is not the case 128 for VM_BIND operations executing on separate V 128 for VM_BIND operations executing on separate VM_BIND contexts. Instead 129 KMD will attempt to execute such VM_BIND opera 129 KMD will attempt to execute such VM_BIND operations in parallel but 130 leaving no guarantee that they will actually b 130 leaving no guarantee that they will actually be executed in 131 parallel. There may be internal implicit depen 131 parallel. There may be internal implicit dependencies that only KMD knows 132 about, for example page-table structure change 132 about, for example page-table structure changes. A way to attempt 133 to avoid such internal dependencies is to have 133 to avoid such internal dependencies is to have different VM_BIND 134 contexts use separate regions of a VM. 134 contexts use separate regions of a VM. 135 135 136 Also for VM_BINDS for long-running gpu_vms the 136 Also for VM_BINDS for long-running gpu_vms the user-mode driver should typically 137 select memory fences as out-fences since that 137 select memory fences as out-fences since that gives greater flexibility for 138 the kernel mode driver to inject other operati 138 the kernel mode driver to inject other operations into the bind / 139 unbind operations. Like for example inserting 139 unbind operations. Like for example inserting breakpoints into batch 140 buffers. The workload execution can then easil 140 buffers. The workload execution can then easily be pipelined behind 141 the bind completion using the memory out-fence 141 the bind completion using the memory out-fence as the signal condition 142 for a GPU semaphore embedded by UMD in the wor 142 for a GPU semaphore embedded by UMD in the workload. 143 143 144 There is no difference in the operations suppo 144 There is no difference in the operations supported or in 145 multi-operation support between asynchronous V 145 multi-operation support between asynchronous VM_BIND and synchronous VM_BIND. 146 146 147 Multi-operation VM_BIND IOCTL error handling a 147 Multi-operation VM_BIND IOCTL error handling and interrupts 148 ============================================== 148 =========================================================== 149 149 150 The VM_BIND operations of the IOCTL may error 150 The VM_BIND operations of the IOCTL may error for various reasons, for 151 example due to lack of resources to complete a 151 example due to lack of resources to complete and due to interrupted 152 waits. 152 waits. 153 In these situations UMD should preferably rest 153 In these situations UMD should preferably restart the IOCTL after 154 taking suitable action. 154 taking suitable action. 155 If UMD has over-committed a memory resource, a 155 If UMD has over-committed a memory resource, an -ENOSPC error will be 156 returned, and UMD may then unbind resources th 156 returned, and UMD may then unbind resources that are not used at the 157 moment and rerun the IOCTL. On -EINTR, UMD sho 157 moment and rerun the IOCTL. On -EINTR, UMD should simply rerun the 158 IOCTL and on -ENOMEM user-space may either att 158 IOCTL and on -ENOMEM user-space may either attempt to free known 159 system memory resources or fail. In case of UM 159 system memory resources or fail. In case of UMD deciding to fail a 160 bind operation, due to an error return, no add 160 bind operation, due to an error return, no additional action is needed 161 to clean up the failed operation, and the VM i 161 to clean up the failed operation, and the VM is left in the same state 162 as it was before the failing IOCTL. 162 as it was before the failing IOCTL. 163 Unbind operations are guaranteed not to return 163 Unbind operations are guaranteed not to return any errors due to 164 resource constraints, but may return errors du 164 resource constraints, but may return errors due to, for example, 165 invalid arguments or the gpu_vm being banned. 165 invalid arguments or the gpu_vm being banned. 166 In the case an unexpected error happens during 166 In the case an unexpected error happens during the asynchronous bind 167 process, the gpu_vm will be banned, and attemp 167 process, the gpu_vm will be banned, and attempts to use it after banning 168 will return -ENOENT. 168 will return -ENOENT. 169 169 170 Example: The Xe VM_BIND uAPI 170 Example: The Xe VM_BIND uAPI 171 ============================ 171 ============================ 172 172 173 Starting with the VM_BIND operation struct, th 173 Starting with the VM_BIND operation struct, the IOCTL call can take 174 zero, one or many such operations. A zero numb 174 zero, one or many such operations. A zero number means only the 175 synchronization part of the IOCTL is carried o 175 synchronization part of the IOCTL is carried out: an asynchronous 176 VM_BIND updates the syncobjects, whereas a syn 176 VM_BIND updates the syncobjects, whereas a sync VM_BIND waits for the 177 implicit dependencies to be fulfilled. 177 implicit dependencies to be fulfilled. 178 178 179 .. code-block:: c 179 .. code-block:: c 180 180 181 struct drm_xe_vm_bind_op { 181 struct drm_xe_vm_bind_op { 182 /** 182 /** 183 * @obj: GEM object to operate on, MBZ 183 * @obj: GEM object to operate on, MBZ for MAP_USERPTR, MBZ for UNMAP 184 */ 184 */ 185 __u32 obj; 185 __u32 obj; 186 186 187 /** @pad: MBZ */ 187 /** @pad: MBZ */ 188 __u32 pad; 188 __u32 pad; 189 189 190 union { 190 union { 191 /** 191 /** 192 * @obj_offset: Offset into th 192 * @obj_offset: Offset into the object for MAP. 193 */ 193 */ 194 __u64 obj_offset; 194 __u64 obj_offset; 195 195 196 /** @userptr: user virtual add 196 /** @userptr: user virtual address for MAP_USERPTR */ 197 __u64 userptr; 197 __u64 userptr; 198 }; 198 }; 199 199 200 /** 200 /** 201 * @range: Number of bytes from the ob 201 * @range: Number of bytes from the object to bind to addr, MBZ for UNMAP_ALL 202 */ 202 */ 203 __u64 range; 203 __u64 range; 204 204 205 /** @addr: Address to operate on, MBZ 205 /** @addr: Address to operate on, MBZ for UNMAP_ALL */ 206 __u64 addr; 206 __u64 addr; 207 207 208 /** 208 /** 209 * @tile_mask: Mask for which tiles to 209 * @tile_mask: Mask for which tiles to create binds for, 0 == All tiles, 210 * only applies to creating new VMAs 210 * only applies to creating new VMAs 211 */ 211 */ 212 __u64 tile_mask; 212 __u64 tile_mask; 213 213 214 /* Map (parts of) an object into the GP 214 /* Map (parts of) an object into the GPU virtual address range. 215 #define XE_VM_BIND_OP_MAP 0x0 215 #define XE_VM_BIND_OP_MAP 0x0 216 /* Unmap a GPU virtual address range * 216 /* Unmap a GPU virtual address range */ 217 #define XE_VM_BIND_OP_UNMAP 0x1 217 #define XE_VM_BIND_OP_UNMAP 0x1 218 /* 218 /* 219 * Map a CPU virtual address range int 219 * Map a CPU virtual address range into a GPU virtual 220 * address range. 220 * address range. 221 */ 221 */ 222 #define XE_VM_BIND_OP_MAP_USERPTR 0x2 222 #define XE_VM_BIND_OP_MAP_USERPTR 0x2 223 /* Unmap a gem object from the VM. */ 223 /* Unmap a gem object from the VM. */ 224 #define XE_VM_BIND_OP_UNMAP_ALL 0x3 224 #define XE_VM_BIND_OP_UNMAP_ALL 0x3 225 /* 225 /* 226 * Make the backing memory of an addre 226 * Make the backing memory of an address range resident if 227 * possible. Note that this doesn't pi 227 * possible. Note that this doesn't pin backing memory. 228 */ 228 */ 229 #define XE_VM_BIND_OP_PREFETCH 0x4 229 #define XE_VM_BIND_OP_PREFETCH 0x4 230 230 231 /* Make the GPU map readonly. */ 231 /* Make the GPU map readonly. */ 232 #define XE_VM_BIND_FLAG_READONLY (0x1 < 232 #define XE_VM_BIND_FLAG_READONLY (0x1 << 16) 233 /* 233 /* 234 * Valid on a faulting VM only, do the 234 * Valid on a faulting VM only, do the MAP operation immediately rather 235 * than deferring the MAP to the page 235 * than deferring the MAP to the page fault handler. 236 */ 236 */ 237 #define XE_VM_BIND_FLAG_IMMEDIATE (0x1 < 237 #define XE_VM_BIND_FLAG_IMMEDIATE (0x1 << 17) 238 /* 238 /* 239 * When the NULL flag is set, the page 239 * When the NULL flag is set, the page tables are setup with a special 240 * bit which indicates writes are drop 240 * bit which indicates writes are dropped and all reads return zero. In 241 * the future, the NULL flags will onl 241 * the future, the NULL flags will only be valid for XE_VM_BIND_OP_MAP 242 * operations, the BO handle MBZ, and 242 * operations, the BO handle MBZ, and the BO offset MBZ. This flag is 243 * intended to implement VK sparse bin 243 * intended to implement VK sparse bindings. 244 */ 244 */ 245 #define XE_VM_BIND_FLAG_NULL (0x1 < 245 #define XE_VM_BIND_FLAG_NULL (0x1 << 18) 246 /** @op: Operation to perform (lower 1 246 /** @op: Operation to perform (lower 16 bits) and flags (upper 16 bits) */ 247 __u32 op; 247 __u32 op; 248 248 249 /** @mem_region: Memory region to pref 249 /** @mem_region: Memory region to prefetch VMA to, instance not a mask */ 250 __u32 region; 250 __u32 region; 251 251 252 /** @reserved: Reserved */ 252 /** @reserved: Reserved */ 253 __u64 reserved[2]; 253 __u64 reserved[2]; 254 }; 254 }; 255 255 256 256 257 The VM_BIND IOCTL argument itself, looks like 257 The VM_BIND IOCTL argument itself, looks like follows. Note that for 258 synchronous VM_BIND, the num_syncs and syncs f 258 synchronous VM_BIND, the num_syncs and syncs fields must be zero. Here 259 the ``exec_queue_id`` field is the VM_BIND con 259 the ``exec_queue_id`` field is the VM_BIND context discussed previously 260 that is used to facilitate out-of-order VM_BIN 260 that is used to facilitate out-of-order VM_BINDs. 261 261 262 .. code-block:: c 262 .. code-block:: c 263 263 264 struct drm_xe_vm_bind { 264 struct drm_xe_vm_bind { 265 /** @extensions: Pointer to the first 265 /** @extensions: Pointer to the first extension struct, if any */ 266 __u64 extensions; 266 __u64 extensions; 267 267 268 /** @vm_id: The ID of the VM to bind t 268 /** @vm_id: The ID of the VM to bind to */ 269 __u32 vm_id; 269 __u32 vm_id; 270 270 271 /** 271 /** 272 * @exec_queue_id: exec_queue_id, must 272 * @exec_queue_id: exec_queue_id, must be of class DRM_XE_ENGINE_CLASS_VM_BIND 273 * and exec queue must have same vm_id 273 * and exec queue must have same vm_id. If zero, the default VM bind engine 274 * is used. 274 * is used. 275 */ 275 */ 276 __u32 exec_queue_id; 276 __u32 exec_queue_id; 277 277 278 /** @num_binds: number of binds in thi 278 /** @num_binds: number of binds in this IOCTL */ 279 __u32 num_binds; 279 __u32 num_binds; 280 280 281 /* If set, perform an async VM_BIND, i 281 /* If set, perform an async VM_BIND, if clear a sync VM_BIND */ 282 #define XE_VM_BIND_IOCTL_FLAG_ASYNC (0x1 < 282 #define XE_VM_BIND_IOCTL_FLAG_ASYNC (0x1 << 0) 283 283 284 /** @flag: Flags controlling all opera 284 /** @flag: Flags controlling all operations in this ioctl. */ 285 __u32 flags; 285 __u32 flags; 286 286 287 union { 287 union { 288 /** @bind: used if num_binds = 288 /** @bind: used if num_binds == 1 */ 289 struct drm_xe_vm_bind_op bind; 289 struct drm_xe_vm_bind_op bind; 290 290 291 /** 291 /** 292 * @vector_of_binds: userptr t 292 * @vector_of_binds: userptr to array of struct 293 * drm_xe_vm_bind_op if num_bi 293 * drm_xe_vm_bind_op if num_binds > 1 294 */ 294 */ 295 __u64 vector_of_binds; 295 __u64 vector_of_binds; 296 }; 296 }; 297 297 298 /** @num_syncs: amount of syncs to wai 298 /** @num_syncs: amount of syncs to wait for or to signal on completion. */ 299 __u32 num_syncs; 299 __u32 num_syncs; 300 300 301 /** @pad2: MBZ */ 301 /** @pad2: MBZ */ 302 __u32 pad2; 302 __u32 pad2; 303 303 304 /** @syncs: pointer to struct drm_xe_s 304 /** @syncs: pointer to struct drm_xe_sync array */ 305 __u64 syncs; 305 __u64 syncs; 306 306 307 /** @reserved: Reserved */ 307 /** @reserved: Reserved */ 308 __u64 reserved[2]; 308 __u64 reserved[2]; 309 }; 309 };
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.