Linux/Documentation/process/botching-up-ioctls.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

1 ================================= 1 ================================= 2 (How to avoid) Botching up ioctls 2 (How to avoid) Botching up ioctls 3 ================================= 3 ================================= 4 4 5 From: https://blog.ffwll.ch/2013/11/botching-u 5 From: https://blog.ffwll.ch/2013/11/botching-up-ioctls.html 6 6 7 By: Daniel Vetter, Copyright © 2013 Intel Cor 7 By: Daniel Vetter, Copyright © 2013 Intel Corporation 8 8 9 One clear insight kernel graphics hackers gain 9 One clear insight kernel graphics hackers gained in the past few years is that 10 trying to come up with a unified interface to 10 trying to come up with a unified interface to manage the execution units and 11 memory on completely different GPUs is a futil 11 memory on completely different GPUs is a futile effort. So nowadays every 12 driver has its own set of ioctls to allocate m 12 driver has its own set of ioctls to allocate memory and submit work to the GPU. 13 Which is nice, since there's no more insanity 13 Which is nice, since there's no more insanity in the form of fake-generic, but 14 actually only used once interfaces. But the cl 14 actually only used once interfaces. But the clear downside is that there's much 15 more potential to screw things up. 15 more potential to screw things up. 16 16 17 To avoid repeating all the same mistakes again 17 To avoid repeating all the same mistakes again I've written up some of the 18 lessons learned while botching the job for the 18 lessons learned while botching the job for the drm/i915 driver. Most of these 19 only cover technicalities and not the big-pict 19 only cover technicalities and not the big-picture issues like what the command 20 submission ioctl exactly should look like. Lea 20 submission ioctl exactly should look like. Learning these lessons is probably 21 something every GPU driver has to do on its ow 21 something every GPU driver has to do on its own. 22 22 23 23 24 Prerequisites 24 Prerequisites 25 ------------- 25 ------------- 26 26 27 First the prerequisites. Without these you hav 27 First the prerequisites. Without these you have already failed, because you 28 will need to add a 32-bit compat layer: 28 will need to add a 32-bit compat layer: 29 29 30 * Only use fixed sized integers. To avoid con 30 * Only use fixed sized integers. To avoid conflicts with typedefs in userspace 31 the kernel has special types like __u32, __ 31 the kernel has special types like __u32, __s64. Use them. 32 32 33 * Align everything to the natural size and us 33 * Align everything to the natural size and use explicit padding. 32-bit 34 platforms don't necessarily align 64-bit va 34 platforms don't necessarily align 64-bit values to 64-bit boundaries, but 35 64-bit platforms do. So we always need padd 35 64-bit platforms do. So we always need padding to the natural size to get 36 this right. 36 this right. 37 37 38 * Pad the entire struct to a multiple of 64-b 38 * Pad the entire struct to a multiple of 64-bits if the structure contains 39 64-bit types - the structure size will othe 39 64-bit types - the structure size will otherwise differ on 32-bit versus 40 64-bit. Having a different structure size h 40 64-bit. Having a different structure size hurts when passing arrays of 41 structures to the kernel, or if the kernel 41 structures to the kernel, or if the kernel checks the structure size, which 42 e.g. the drm core does. 42 e.g. the drm core does. 43 43 44 * Pointers are __u64, cast from/to a uintptr_ !! 44 * Pointers are __u64, cast from/to a uintprt_t on the userspace side and 45 from/to a void __user * in the kernel. Try 45 from/to a void __user * in the kernel. Try really hard not to delay this 46 conversion or worse, fiddle the raw __u64 t 46 conversion or worse, fiddle the raw __u64 through your code since that 47 diminishes the checking tools like sparse c 47 diminishes the checking tools like sparse can provide. The macro 48 u64_to_user_ptr can be used in the kernel t 48 u64_to_user_ptr can be used in the kernel to avoid warnings about integers 49 and pointers of different sizes. 49 and pointers of different sizes. 50 50 51 51 52 Basics 52 Basics 53 ------ 53 ------ 54 54 55 With the joys of writing a compat layer avoide 55 With the joys of writing a compat layer avoided we can take a look at the basic 56 fumbles. Neglecting these will make backward a 56 fumbles. Neglecting these will make backward and forward compatibility a real 57 pain. And since getting things wrong on the fi 57 pain. And since getting things wrong on the first attempt is guaranteed you 58 will have a second iteration or at least an ex 58 will have a second iteration or at least an extension for any given interface. 59 59 60 * Have a clear way for userspace to figure ou 60 * Have a clear way for userspace to figure out whether your new ioctl or ioctl 61 extension is supported on a given kernel. I 61 extension is supported on a given kernel. If you can't rely on old kernels 62 rejecting the new flags/modes or ioctls (si 62 rejecting the new flags/modes or ioctls (since doing that was botched in the 63 past) then you need a driver feature flag o 63 past) then you need a driver feature flag or revision number somewhere. 64 64 65 * Have a plan for extending ioctls with new f 65 * Have a plan for extending ioctls with new flags or new fields at the end of 66 the structure. The drm core checks the pass 66 the structure. The drm core checks the passed-in size for each ioctl call 67 and zero-extends any mismatches between ker 67 and zero-extends any mismatches between kernel and userspace. That helps, 68 but isn't a complete solution since newer u 68 but isn't a complete solution since newer userspace on older kernels won't 69 notice that the newly added fields at the e 69 notice that the newly added fields at the end get ignored. So this still 70 needs a new driver feature flags. 70 needs a new driver feature flags. 71 71 72 * Check all unused fields and flags and all t 72 * Check all unused fields and flags and all the padding for whether it's 0, 73 and reject the ioctl if that's not the case 73 and reject the ioctl if that's not the case. Otherwise your nice plan for 74 future extensions is going right down the g 74 future extensions is going right down the gutters since someone will submit 75 an ioctl struct with random stack garbage i 75 an ioctl struct with random stack garbage in the yet unused parts. Which 76 then bakes in the ABI that those fields can 76 then bakes in the ABI that those fields can never be used for anything else 77 but garbage. This is also the reason why yo 77 but garbage. This is also the reason why you must explicitly pad all 78 structures, even if you never use them in a 78 structures, even if you never use them in an array - the padding the compiler 79 might insert could contain garbage. 79 might insert could contain garbage. 80 80 81 * Have simple testcases for all of the above. 81 * Have simple testcases for all of the above. 82 82 83 83 84 Fun with Error Paths 84 Fun with Error Paths 85 -------------------- 85 -------------------- 86 86 87 Nowadays we don't have any excuse left any mor 87 Nowadays we don't have any excuse left any more for drm drivers being neat 88 little root exploits. This means we both need 88 little root exploits. This means we both need full input validation and solid 89 error handling paths - GPUs will die eventuall 89 error handling paths - GPUs will die eventually in the oddmost corner cases 90 anyway: 90 anyway: 91 91 92 * The ioctl must check for array overflows. A 92 * The ioctl must check for array overflows. Also it needs to check for 93 over/underflows and clamping issues of inte 93 over/underflows and clamping issues of integer values in general. The usual 94 example is sprite positioning values fed di 94 example is sprite positioning values fed directly into the hardware with the 95 hardware just having 12 bits or so. Works n 95 hardware just having 12 bits or so. Works nicely until some odd display 96 server doesn't bother with clamping itself 96 server doesn't bother with clamping itself and the cursor wraps around the 97 screen. 97 screen. 98 98 99 * Have simple testcases for every input valid 99 * Have simple testcases for every input validation failure case in your ioctl. 100 Check that the error code matches your expe 100 Check that the error code matches your expectations. And finally make sure 101 that you only test for one single error pat 101 that you only test for one single error path in each subtest by submitting 102 otherwise perfectly valid data. Without thi 102 otherwise perfectly valid data. Without this an earlier check might reject 103 the ioctl already and shadow the codepath y 103 the ioctl already and shadow the codepath you actually want to test, hiding 104 bugs and regressions. 104 bugs and regressions. 105 105 106 * Make all your ioctls restartable. First X r 106 * Make all your ioctls restartable. First X really loves signals and second 107 this will allow you to test 90% of all erro 107 this will allow you to test 90% of all error handling paths by just 108 interrupting your main test suite constantl 108 interrupting your main test suite constantly with signals. Thanks to X's 109 love for signal you'll get an excellent bas 109 love for signal you'll get an excellent base coverage of all your error 110 paths pretty much for free for graphics dri 110 paths pretty much for free for graphics drivers. Also, be consistent with 111 how you handle ioctl restarting - e.g. drm 111 how you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in its 112 userspace library. The i915 driver botched 112 userspace library. The i915 driver botched this with the set_tiling ioctl, 113 now we're stuck forever with some arcane se 113 now we're stuck forever with some arcane semantics in both the kernel and 114 userspace. 114 userspace. 115 115 116 * If you can't make a given codepath restarta 116 * If you can't make a given codepath restartable make a stuck task at least 117 killable. GPUs just die and your users won' 117 killable. GPUs just die and your users won't like you more if you hang their 118 entire box (by means of an unkillable X pro 118 entire box (by means of an unkillable X process). If the state recovery is 119 still too tricky have a timeout or hangchec 119 still too tricky have a timeout or hangcheck safety net as a last-ditch 120 effort in case the hardware has gone banana 120 effort in case the hardware has gone bananas. 121 121 122 * Have testcases for the really tricky corner 122 * Have testcases for the really tricky corner cases in your error recovery code 123 - it's way too easy to create a deadlock be 123 - it's way too easy to create a deadlock between your hangcheck code and 124 waiters. 124 waiters. 125 125 126 126 127 Time, Waiting and Missing it 127 Time, Waiting and Missing it 128 ---------------------------- 128 ---------------------------- 129 129 130 GPUs do most everything asynchronously, so we 130 GPUs do most everything asynchronously, so we have a need to time operations and 131 wait for outstanding ones. This is really tric 131 wait for outstanding ones. This is really tricky business; at the moment none of 132 the ioctls supported by the drm/i915 get this 132 the ioctls supported by the drm/i915 get this fully right, which means there's 133 still tons more lessons to learn here. 133 still tons more lessons to learn here. 134 134 135 * Use CLOCK_MONOTONIC as your reference time, 135 * Use CLOCK_MONOTONIC as your reference time, always. It's what alsa, drm and 136 v4l use by default nowadays. But let usersp 136 v4l use by default nowadays. But let userspace know which timestamps are 137 derived from different clock domains like y 137 derived from different clock domains like your main system clock (provided 138 by the kernel) or some independent hardware 138 by the kernel) or some independent hardware counter somewhere else. Clocks 139 will mismatch if you look close enough, but 139 will mismatch if you look close enough, but if performance measuring tools 140 have this information they can at least com 140 have this information they can at least compensate. If your userspace can 141 get at the raw values of some clocks (e.g. 141 get at the raw values of some clocks (e.g. through in-command-stream 142 performance counter sampling instructions) 142 performance counter sampling instructions) consider exposing those also. 143 143 144 * Use __s64 seconds plus __u64 nanoseconds to 144 * Use __s64 seconds plus __u64 nanoseconds to specify time. It's not the most 145 convenient time specification, but it's mos 145 convenient time specification, but it's mostly the standard. 146 146 147 * Check that input time values are normalized 147 * Check that input time values are normalized and reject them if not. Note 148 that the kernel native struct ktime has a s 148 that the kernel native struct ktime has a signed integer for both seconds 149 and nanoseconds, so beware here. 149 and nanoseconds, so beware here. 150 150 151 * For timeouts, use absolute times. If you're 151 * For timeouts, use absolute times. If you're a good fellow and made your 152 ioctl restartable relative timeouts tend to 152 ioctl restartable relative timeouts tend to be too coarse and can 153 indefinitely extend your wait time due to r 153 indefinitely extend your wait time due to rounding on each restart. 154 Especially if your reference clock is somet 154 Especially if your reference clock is something really slow like the display 155 frame counter. With a spec lawyer hat on th 155 frame counter. With a spec lawyer hat on this isn't a bug since timeouts can 156 always be extended - but users will surely 156 always be extended - but users will surely hate you if their neat animations 157 starts to stutter due to this. 157 starts to stutter due to this. 158 158 159 * Consider ditching any synchronous wait ioct 159 * Consider ditching any synchronous wait ioctls with timeouts and just deliver 160 an asynchronous event on a pollable file de 160 an asynchronous event on a pollable file descriptor. It fits much better 161 into event driven applications' main loop. 161 into event driven applications' main loop. 162 162 163 * Have testcases for corner-cases, especially 163 * Have testcases for corner-cases, especially whether the return values for 164 already-completed events, successful waits 164 already-completed events, successful waits and timed-out waits are all sane 165 and suiting to your needs. 165 and suiting to your needs. 166 166 167 167 168 Leaking Resources, Not 168 Leaking Resources, Not 169 ---------------------- 169 ---------------------- 170 170 171 A full-blown drm driver essentially implements 171 A full-blown drm driver essentially implements a little OS, but specialized to 172 the given GPU platforms. This means a driver n 172 the given GPU platforms. This means a driver needs to expose tons of handles 173 for different objects and other resources to u 173 for different objects and other resources to userspace. Doing that right 174 entails its own little set of pitfalls: 174 entails its own little set of pitfalls: 175 175 176 * Always attach the lifetime of your dynamica 176 * Always attach the lifetime of your dynamically created resources to the 177 lifetime of a file descriptor. Consider usi 177 lifetime of a file descriptor. Consider using a 1:1 mapping if your resource 178 needs to be shared across processes - fd-p 178 needs to be shared across processes - fd-passing over unix domain sockets 179 also simplifies lifetime management for use 179 also simplifies lifetime management for userspace. 180 180 181 * Always have O_CLOEXEC support. 181 * Always have O_CLOEXEC support. 182 182 183 * Ensure that you have sufficient insulation 183 * Ensure that you have sufficient insulation between different clients. By 184 default pick a private per-fd namespace whi 184 default pick a private per-fd namespace which forces any sharing to be done 185 explicitly. Only go with a more global per- 185 explicitly. Only go with a more global per-device namespace if the objects 186 are truly device-unique. One counterexample 186 are truly device-unique. One counterexample in the drm modeset interfaces is 187 that the per-device modeset objects like co 187 that the per-device modeset objects like connectors share a namespace with 188 framebuffer objects, which mostly are not s 188 framebuffer objects, which mostly are not shared at all. A separate 189 namespace, private by default, for framebuf 189 namespace, private by default, for framebuffers would have been more 190 suitable. 190 suitable. 191 191 192 * Think about uniqueness requirements for use 192 * Think about uniqueness requirements for userspace handles. E.g. for most drm 193 drivers it's a userspace bug to submit the 193 drivers it's a userspace bug to submit the same object twice in the same 194 command submission ioctl. But then if objec 194 command submission ioctl. But then if objects are shareable userspace needs 195 to know whether it has seen an imported obj 195 to know whether it has seen an imported object from a different process 196 already or not. I haven't tried this myself 196 already or not. I haven't tried this myself yet due to lack of a new class 197 of objects, but consider using inode number 197 of objects, but consider using inode numbers on your shared file descriptors 198 as unique identifiers - it's how real files 198 as unique identifiers - it's how real files are told apart, too. 199 Unfortunately this requires a full-blown vi 199 Unfortunately this requires a full-blown virtual filesystem in the kernel. 200 200 201 201 202 Last, but not Least 202 Last, but not Least 203 ------------------- 203 ------------------- 204 204 205 Not every problem needs a new ioctl: 205 Not every problem needs a new ioctl: 206 206 207 * Think hard whether you really want a driver 207 * Think hard whether you really want a driver-private interface. Of course 208 it's much quicker to push a driver-private 208 it's much quicker to push a driver-private interface than engaging in 209 lengthy discussions for a more generic solu 209 lengthy discussions for a more generic solution. And occasionally doing a 210 private interface to spearhead a new concep 210 private interface to spearhead a new concept is what's required. But in the 211 end, once the generic interface comes aroun !! 211 end, once the generic interface comes around you'll end up maintainer two 212 interfaces. Indefinitely. 212 interfaces. Indefinitely. 213 213 214 * Consider other interfaces than ioctls. A sy 214 * Consider other interfaces than ioctls. A sysfs attribute is much better for 215 per-device settings, or for child objects w 215 per-device settings, or for child objects with fairly static lifetimes (like 216 output connectors in drm with all the detec 216 output connectors in drm with all the detection override attributes). Or 217 maybe only your testsuite needs this interf 217 maybe only your testsuite needs this interface, and then debugfs with its 218 disclaimer of not having a stable ABI would 218 disclaimer of not having a stable ABI would be better. 219 219 220 Finally, the name of the game is to get it rig 220 Finally, the name of the game is to get it right on the first attempt, since if 221 your driver proves popular and your hardware p 221 your driver proves popular and your hardware platforms long-lived then you'll 222 be stuck with a given ioctl essentially foreve 222 be stuck with a given ioctl essentially forever. You can try to deprecate 223 horrible ioctls on newer iterations of your ha 223 horrible ioctls on newer iterations of your hardware, but generally it takes 224 years to accomplish this. And then again years 224 years to accomplish this. And then again years until the last user able to 225 complain about regressions disappears, too. 225 complain about regressions disappears, too.

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

TOMOYO Linux Cross Reference
Linux/Documentation/process/botching-up-ioctls.rst

Diff markup

Differences between /Documentation/process/botching-up-ioctls.rst (Version linux-6.11.5) and /Documentation/process/botching-up-ioctls.rst (Version linux-5.18.19)

TOMOYO Linux Cross Reference Linux/Documentation/process/botching-up-ioctls.rst

Diff markup

Differences between /Documentation/process/botching-up-ioctls.rst (Version linux-6.11.5) and /Documentation/process/botching-up-ioctls.rst (Version linux-5.18.19)

TOMOYO Linux Cross Reference
Linux/Documentation/process/botching-up-ioctls.rst