1 ====================== 1 ====================== 2 Userspace verbs access 2 Userspace verbs access 3 ====================== 3 ====================== 4 4 5 The ib_uverbs module, built by enabling CONF 5 The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS, 6 enables direct userspace access to IB hardwa 6 enables direct userspace access to IB hardware via "verbs," as 7 described in chapter 11 of the InfiniBand Ar 7 described in chapter 11 of the InfiniBand Architecture Specification. 8 8 9 To use the verbs, the libibverbs library, av 9 To use the verbs, the libibverbs library, available from 10 https://github.com/linux-rdma/rdma-core, is 10 https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a 11 device-independent API for using the ib_uver 11 device-independent API for using the ib_uverbs interface. 12 libibverbs also requires appropriate device- 12 libibverbs also requires appropriate device-dependent kernel and 13 userspace driver for your InfiniBand hardwar 13 userspace driver for your InfiniBand hardware. For example, to use 14 a Mellanox HCA, you will need the ib_mthca k 14 a Mellanox HCA, you will need the ib_mthca kernel module and the 15 libmthca userspace driver be installed. 15 libmthca userspace driver be installed. 16 16 17 User-kernel communication 17 User-kernel communication 18 ========================= 18 ========================= 19 19 20 Userspace communicates with the kernel for s 20 Userspace communicates with the kernel for slow path, resource 21 management operations via the /dev/infiniban 21 management operations via the /dev/infiniband/uverbsN character 22 devices. Fast path operations are typically 22 devices. Fast path operations are typically performed by writing 23 directly to hardware registers mmap()ed into 23 directly to hardware registers mmap()ed into userspace, with no 24 system call or context switch into the kerne 24 system call or context switch into the kernel. 25 25 26 Commands are sent to the kernel via write()s 26 Commands are sent to the kernel via write()s on these device files. 27 The ABI is defined in drivers/infiniband/inc 27 The ABI is defined in drivers/infiniband/include/ib_user_verbs.h. 28 The structs for commands that require a resp 28 The structs for commands that require a response from the kernel 29 contain a 64-bit field used to pass a pointe 29 contain a 64-bit field used to pass a pointer to an output buffer. 30 Status is returned to userspace as the retur 30 Status is returned to userspace as the return value of the write() 31 system call. 31 system call. 32 32 33 Resource management 33 Resource management 34 =================== 34 =================== 35 35 36 Since creation and destruction of all IB res 36 Since creation and destruction of all IB resources is done by 37 commands passed through a file descriptor, t 37 commands passed through a file descriptor, the kernel can keep track 38 of which resources are attached to a given u 38 of which resources are attached to a given userspace context. The 39 ib_uverbs module maintains idr tables that a 39 ib_uverbs module maintains idr tables that are used to translate 40 between kernel pointers and opaque userspace 40 between kernel pointers and opaque userspace handles, so that kernel 41 pointers are never exposed to userspace and 41 pointers are never exposed to userspace and userspace cannot trick 42 the kernel into following a bogus pointer. 42 the kernel into following a bogus pointer. 43 43 44 This also allows the kernel to clean up when 44 This also allows the kernel to clean up when a process exits and 45 prevent one process from touching another pr 45 prevent one process from touching another process's resources. 46 46 47 Memory pinning 47 Memory pinning 48 ============== 48 ============== 49 49 50 Direct userspace I/O requires that memory re 50 Direct userspace I/O requires that memory regions that are potential 51 I/O targets be kept resident at the same phy 51 I/O targets be kept resident at the same physical address. The 52 ib_uverbs module manages pinning and unpinni 52 ib_uverbs module manages pinning and unpinning memory regions via 53 get_user_pages() and put_page() calls. It a 53 get_user_pages() and put_page() calls. It also accounts for the 54 amount of memory pinned in the process's pin 54 amount of memory pinned in the process's pinned_vm, and checks that 55 unprivileged processes do not exceed their R 55 unprivileged processes do not exceed their RLIMIT_MEMLOCK limit. 56 56 57 Pages that are pinned multiple times are cou 57 Pages that are pinned multiple times are counted each time they are 58 pinned, so the value of pinned_vm may be an 58 pinned, so the value of pinned_vm may be an overestimate of the 59 number of pages pinned by a process. 59 number of pages pinned by a process. 60 60 61 /dev files 61 /dev files 62 ========== 62 ========== 63 63 64 To create the appropriate character device f 64 To create the appropriate character device files automatically with 65 udev, a rule like:: 65 udev, a rule like:: 66 66 67 KERNEL=="uverbs*", NAME="infiniband/%k" 67 KERNEL=="uverbs*", NAME="infiniband/%k" 68 68 69 can be used. This will create device nodes 69 can be used. This will create device nodes named:: 70 70 71 /dev/infiniband/uverbs0 71 /dev/infiniband/uverbs0 72 72 73 and so on. Since the InfiniBand userspace v 73 and so on. Since the InfiniBand userspace verbs should be safe for 74 use by non-privileged processes, it may be u 74 use by non-privileged processes, it may be useful to add an 75 appropriate MODE or GROUP to the udev rule. 75 appropriate MODE or GROUP to the udev rule.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.