~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/process/botching-up-ioctls.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/process/botching-up-ioctls.rst (Version linux-6.11.5) and /Documentation/process/botching-up-ioctls.rst (Version linux-6.2.16)


  1 =================================                   1 =================================
  2 (How to avoid) Botching up ioctls                   2 (How to avoid) Botching up ioctls
  3 =================================                   3 =================================
  4                                                     4 
  5 From: https://blog.ffwll.ch/2013/11/botching-u      5 From: https://blog.ffwll.ch/2013/11/botching-up-ioctls.html
  6                                                     6 
  7 By: Daniel Vetter, Copyright © 2013 Intel Cor      7 By: Daniel Vetter, Copyright © 2013 Intel Corporation
  8                                                     8 
  9 One clear insight kernel graphics hackers gain      9 One clear insight kernel graphics hackers gained in the past few years is that
 10 trying to come up with a unified interface to      10 trying to come up with a unified interface to manage the execution units and
 11 memory on completely different GPUs is a futil     11 memory on completely different GPUs is a futile effort. So nowadays every
 12 driver has its own set of ioctls to allocate m     12 driver has its own set of ioctls to allocate memory and submit work to the GPU.
 13 Which is nice, since there's no more insanity      13 Which is nice, since there's no more insanity in the form of fake-generic, but
 14 actually only used once interfaces. But the cl     14 actually only used once interfaces. But the clear downside is that there's much
 15 more potential to screw things up.                 15 more potential to screw things up.
 16                                                    16 
 17 To avoid repeating all the same mistakes again     17 To avoid repeating all the same mistakes again I've written up some of the
 18 lessons learned while botching the job for the     18 lessons learned while botching the job for the drm/i915 driver. Most of these
 19 only cover technicalities and not the big-pict     19 only cover technicalities and not the big-picture issues like what the command
 20 submission ioctl exactly should look like. Lea     20 submission ioctl exactly should look like. Learning these lessons is probably
 21 something every GPU driver has to do on its ow     21 something every GPU driver has to do on its own.
 22                                                    22 
 23                                                    23 
 24 Prerequisites                                      24 Prerequisites
 25 -------------                                      25 -------------
 26                                                    26 
 27 First the prerequisites. Without these you hav     27 First the prerequisites. Without these you have already failed, because you
 28 will need to add a 32-bit compat layer:            28 will need to add a 32-bit compat layer:
 29                                                    29 
 30  * Only use fixed sized integers. To avoid con     30  * Only use fixed sized integers. To avoid conflicts with typedefs in userspace
 31    the kernel has special types like __u32, __     31    the kernel has special types like __u32, __s64. Use them.
 32                                                    32 
 33  * Align everything to the natural size and us     33  * Align everything to the natural size and use explicit padding. 32-bit
 34    platforms don't necessarily align 64-bit va     34    platforms don't necessarily align 64-bit values to 64-bit boundaries, but
 35    64-bit platforms do. So we always need padd     35    64-bit platforms do. So we always need padding to the natural size to get
 36    this right.                                     36    this right.
 37                                                    37 
 38  * Pad the entire struct to a multiple of 64-b     38  * Pad the entire struct to a multiple of 64-bits if the structure contains
 39    64-bit types - the structure size will othe     39    64-bit types - the structure size will otherwise differ on 32-bit versus
 40    64-bit. Having a different structure size h     40    64-bit. Having a different structure size hurts when passing arrays of
 41    structures to the kernel, or if the kernel      41    structures to the kernel, or if the kernel checks the structure size, which
 42    e.g. the drm core does.                         42    e.g. the drm core does.
 43                                                    43 
 44  * Pointers are __u64, cast from/to a uintptr_ !!  44  * Pointers are __u64, cast from/to a uintprt_t on the userspace side and
 45    from/to a void __user * in the kernel. Try      45    from/to a void __user * in the kernel. Try really hard not to delay this
 46    conversion or worse, fiddle the raw __u64 t     46    conversion or worse, fiddle the raw __u64 through your code since that
 47    diminishes the checking tools like sparse c     47    diminishes the checking tools like sparse can provide. The macro
 48    u64_to_user_ptr can be used in the kernel t     48    u64_to_user_ptr can be used in the kernel to avoid warnings about integers
 49    and pointers of different sizes.                49    and pointers of different sizes.
 50                                                    50 
 51                                                    51 
 52 Basics                                             52 Basics
 53 ------                                             53 ------
 54                                                    54 
 55 With the joys of writing a compat layer avoide     55 With the joys of writing a compat layer avoided we can take a look at the basic
 56 fumbles. Neglecting these will make backward a     56 fumbles. Neglecting these will make backward and forward compatibility a real
 57 pain. And since getting things wrong on the fi     57 pain. And since getting things wrong on the first attempt is guaranteed you
 58 will have a second iteration or at least an ex     58 will have a second iteration or at least an extension for any given interface.
 59                                                    59 
 60  * Have a clear way for userspace to figure ou     60  * Have a clear way for userspace to figure out whether your new ioctl or ioctl
 61    extension is supported on a given kernel. I     61    extension is supported on a given kernel. If you can't rely on old kernels
 62    rejecting the new flags/modes or ioctls (si     62    rejecting the new flags/modes or ioctls (since doing that was botched in the
 63    past) then you need a driver feature flag o     63    past) then you need a driver feature flag or revision number somewhere.
 64                                                    64 
 65  * Have a plan for extending ioctls with new f     65  * Have a plan for extending ioctls with new flags or new fields at the end of
 66    the structure. The drm core checks the pass     66    the structure. The drm core checks the passed-in size for each ioctl call
 67    and zero-extends any mismatches between ker     67    and zero-extends any mismatches between kernel and userspace. That helps,
 68    but isn't a complete solution since newer u     68    but isn't a complete solution since newer userspace on older kernels won't
 69    notice that the newly added fields at the e     69    notice that the newly added fields at the end get ignored. So this still
 70    needs a new driver feature flags.               70    needs a new driver feature flags.
 71                                                    71 
 72  * Check all unused fields and flags and all t     72  * Check all unused fields and flags and all the padding for whether it's 0,
 73    and reject the ioctl if that's not the case     73    and reject the ioctl if that's not the case. Otherwise your nice plan for
 74    future extensions is going right down the g     74    future extensions is going right down the gutters since someone will submit
 75    an ioctl struct with random stack garbage i     75    an ioctl struct with random stack garbage in the yet unused parts. Which
 76    then bakes in the ABI that those fields can     76    then bakes in the ABI that those fields can never be used for anything else
 77    but garbage. This is also the reason why yo     77    but garbage. This is also the reason why you must explicitly pad all
 78    structures, even if you never use them in a     78    structures, even if you never use them in an array - the padding the compiler
 79    might insert could contain garbage.             79    might insert could contain garbage.
 80                                                    80 
 81  * Have simple testcases for all of the above.     81  * Have simple testcases for all of the above.
 82                                                    82 
 83                                                    83 
 84 Fun with Error Paths                               84 Fun with Error Paths
 85 --------------------                               85 --------------------
 86                                                    86 
 87 Nowadays we don't have any excuse left any mor     87 Nowadays we don't have any excuse left any more for drm drivers being neat
 88 little root exploits. This means we both need      88 little root exploits. This means we both need full input validation and solid
 89 error handling paths - GPUs will die eventuall     89 error handling paths - GPUs will die eventually in the oddmost corner cases
 90 anyway:                                            90 anyway:
 91                                                    91 
 92  * The ioctl must check for array overflows. A     92  * The ioctl must check for array overflows. Also it needs to check for
 93    over/underflows and clamping issues of inte     93    over/underflows and clamping issues of integer values in general. The usual
 94    example is sprite positioning values fed di     94    example is sprite positioning values fed directly into the hardware with the
 95    hardware just having 12 bits or so. Works n     95    hardware just having 12 bits or so. Works nicely until some odd display
 96    server doesn't bother with clamping itself      96    server doesn't bother with clamping itself and the cursor wraps around the
 97    screen.                                         97    screen.
 98                                                    98 
 99  * Have simple testcases for every input valid     99  * Have simple testcases for every input validation failure case in your ioctl.
100    Check that the error code matches your expe    100    Check that the error code matches your expectations. And finally make sure
101    that you only test for one single error pat    101    that you only test for one single error path in each subtest by submitting
102    otherwise perfectly valid data. Without thi    102    otherwise perfectly valid data. Without this an earlier check might reject
103    the ioctl already and shadow the codepath y    103    the ioctl already and shadow the codepath you actually want to test, hiding
104    bugs and regressions.                          104    bugs and regressions.
105                                                   105 
106  * Make all your ioctls restartable. First X r    106  * Make all your ioctls restartable. First X really loves signals and second
107    this will allow you to test 90% of all erro    107    this will allow you to test 90% of all error handling paths by just
108    interrupting your main test suite constantl    108    interrupting your main test suite constantly with signals. Thanks to X's
109    love for signal you'll get an excellent bas    109    love for signal you'll get an excellent base coverage of all your error
110    paths pretty much for free for graphics dri    110    paths pretty much for free for graphics drivers. Also, be consistent with
111    how you handle ioctl restarting - e.g. drm     111    how you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in its
112    userspace library. The i915 driver botched     112    userspace library. The i915 driver botched this with the set_tiling ioctl,
113    now we're stuck forever with some arcane se    113    now we're stuck forever with some arcane semantics in both the kernel and
114    userspace.                                     114    userspace.
115                                                   115 
116  * If you can't make a given codepath restarta    116  * If you can't make a given codepath restartable make a stuck task at least
117    killable. GPUs just die and your users won'    117    killable. GPUs just die and your users won't like you more if you hang their
118    entire box (by means of an unkillable X pro    118    entire box (by means of an unkillable X process). If the state recovery is
119    still too tricky have a timeout or hangchec    119    still too tricky have a timeout or hangcheck safety net as a last-ditch
120    effort in case the hardware has gone banana    120    effort in case the hardware has gone bananas.
121                                                   121 
122  * Have testcases for the really tricky corner    122  * Have testcases for the really tricky corner cases in your error recovery code
123    - it's way too easy to create a deadlock be    123    - it's way too easy to create a deadlock between your hangcheck code and
124    waiters.                                       124    waiters.
125                                                   125 
126                                                   126 
127 Time, Waiting and Missing it                      127 Time, Waiting and Missing it
128 ----------------------------                      128 ----------------------------
129                                                   129 
130 GPUs do most everything asynchronously, so we     130 GPUs do most everything asynchronously, so we have a need to time operations and
131 wait for outstanding ones. This is really tric    131 wait for outstanding ones. This is really tricky business; at the moment none of
132 the ioctls supported by the drm/i915 get this     132 the ioctls supported by the drm/i915 get this fully right, which means there's
133 still tons more lessons to learn here.            133 still tons more lessons to learn here.
134                                                   134 
135  * Use CLOCK_MONOTONIC as your reference time,    135  * Use CLOCK_MONOTONIC as your reference time, always. It's what alsa, drm and
136    v4l use by default nowadays. But let usersp    136    v4l use by default nowadays. But let userspace know which timestamps are
137    derived from different clock domains like y    137    derived from different clock domains like your main system clock (provided
138    by the kernel) or some independent hardware    138    by the kernel) or some independent hardware counter somewhere else. Clocks
139    will mismatch if you look close enough, but    139    will mismatch if you look close enough, but if performance measuring tools
140    have this information they can at least com    140    have this information they can at least compensate. If your userspace can
141    get at the raw values of some clocks (e.g.     141    get at the raw values of some clocks (e.g. through in-command-stream
142    performance counter sampling instructions)     142    performance counter sampling instructions) consider exposing those also.
143                                                   143 
144  * Use __s64 seconds plus __u64 nanoseconds to    144  * Use __s64 seconds plus __u64 nanoseconds to specify time. It's not the most
145    convenient time specification, but it's mos    145    convenient time specification, but it's mostly the standard.
146                                                   146 
147  * Check that input time values are normalized    147  * Check that input time values are normalized and reject them if not. Note
148    that the kernel native struct ktime has a s    148    that the kernel native struct ktime has a signed integer for both seconds
149    and nanoseconds, so beware here.               149    and nanoseconds, so beware here.
150                                                   150 
151  * For timeouts, use absolute times. If you're    151  * For timeouts, use absolute times. If you're a good fellow and made your
152    ioctl restartable relative timeouts tend to    152    ioctl restartable relative timeouts tend to be too coarse and can
153    indefinitely extend your wait time due to r    153    indefinitely extend your wait time due to rounding on each restart.
154    Especially if your reference clock is somet    154    Especially if your reference clock is something really slow like the display
155    frame counter. With a spec lawyer hat on th    155    frame counter. With a spec lawyer hat on this isn't a bug since timeouts can
156    always be extended - but users will surely     156    always be extended - but users will surely hate you if their neat animations
157    starts to stutter due to this.                 157    starts to stutter due to this.
158                                                   158 
159  * Consider ditching any synchronous wait ioct    159  * Consider ditching any synchronous wait ioctls with timeouts and just deliver
160    an asynchronous event on a pollable file de    160    an asynchronous event on a pollable file descriptor. It fits much better
161    into event driven applications' main loop.     161    into event driven applications' main loop.
162                                                   162 
163  * Have testcases for corner-cases, especially    163  * Have testcases for corner-cases, especially whether the return values for
164    already-completed events, successful waits     164    already-completed events, successful waits and timed-out waits are all sane
165    and suiting to your needs.                     165    and suiting to your needs.
166                                                   166 
167                                                   167 
168 Leaking Resources, Not                            168 Leaking Resources, Not
169 ----------------------                            169 ----------------------
170                                                   170 
171 A full-blown drm driver essentially implements    171 A full-blown drm driver essentially implements a little OS, but specialized to
172 the given GPU platforms. This means a driver n    172 the given GPU platforms. This means a driver needs to expose tons of handles
173 for different objects and other resources to u    173 for different objects and other resources to userspace. Doing that right
174 entails its own little set of pitfalls:           174 entails its own little set of pitfalls:
175                                                   175 
176  * Always attach the lifetime of your dynamica    176  * Always attach the lifetime of your dynamically created resources to the
177    lifetime of a file descriptor. Consider usi    177    lifetime of a file descriptor. Consider using a 1:1 mapping if your resource
178    needs to be shared across processes -  fd-p    178    needs to be shared across processes -  fd-passing over unix domain sockets
179    also simplifies lifetime management for use    179    also simplifies lifetime management for userspace.
180                                                   180 
181  * Always have O_CLOEXEC support.                 181  * Always have O_CLOEXEC support.
182                                                   182 
183  * Ensure that you have sufficient insulation     183  * Ensure that you have sufficient insulation between different clients. By
184    default pick a private per-fd namespace whi    184    default pick a private per-fd namespace which forces any sharing to be done
185    explicitly. Only go with a more global per-    185    explicitly. Only go with a more global per-device namespace if the objects
186    are truly device-unique. One counterexample    186    are truly device-unique. One counterexample in the drm modeset interfaces is
187    that the per-device modeset objects like co    187    that the per-device modeset objects like connectors share a namespace with
188    framebuffer objects, which mostly are not s    188    framebuffer objects, which mostly are not shared at all. A separate
189    namespace, private by default, for framebuf    189    namespace, private by default, for framebuffers would have been more
190    suitable.                                      190    suitable.
191                                                   191 
192  * Think about uniqueness requirements for use    192  * Think about uniqueness requirements for userspace handles. E.g. for most drm
193    drivers it's a userspace bug to submit the     193    drivers it's a userspace bug to submit the same object twice in the same
194    command submission ioctl. But then if objec    194    command submission ioctl. But then if objects are shareable userspace needs
195    to know whether it has seen an imported obj    195    to know whether it has seen an imported object from a different process
196    already or not. I haven't tried this myself    196    already or not. I haven't tried this myself yet due to lack of a new class
197    of objects, but consider using inode number    197    of objects, but consider using inode numbers on your shared file descriptors
198    as unique identifiers - it's how real files    198    as unique identifiers - it's how real files are told apart, too.
199    Unfortunately this requires a full-blown vi    199    Unfortunately this requires a full-blown virtual filesystem in the kernel.
200                                                   200 
201                                                   201 
202 Last, but not Least                               202 Last, but not Least
203 -------------------                               203 -------------------
204                                                   204 
205 Not every problem needs a new ioctl:              205 Not every problem needs a new ioctl:
206                                                   206 
207  * Think hard whether you really want a driver    207  * Think hard whether you really want a driver-private interface. Of course
208    it's much quicker to push a driver-private     208    it's much quicker to push a driver-private interface than engaging in
209    lengthy discussions for a more generic solu    209    lengthy discussions for a more generic solution. And occasionally doing a
210    private interface to spearhead a new concep    210    private interface to spearhead a new concept is what's required. But in the
211    end, once the generic interface comes aroun !! 211    end, once the generic interface comes around you'll end up maintainer two
212    interfaces. Indefinitely.                      212    interfaces. Indefinitely.
213                                                   213 
214  * Consider other interfaces than ioctls. A sy    214  * Consider other interfaces than ioctls. A sysfs attribute is much better for
215    per-device settings, or for child objects w    215    per-device settings, or for child objects with fairly static lifetimes (like
216    output connectors in drm with all the detec    216    output connectors in drm with all the detection override attributes). Or
217    maybe only your testsuite needs this interf    217    maybe only your testsuite needs this interface, and then debugfs with its
218    disclaimer of not having a stable ABI would    218    disclaimer of not having a stable ABI would be better.
219                                                   219 
220 Finally, the name of the game is to get it rig    220 Finally, the name of the game is to get it right on the first attempt, since if
221 your driver proves popular and your hardware p    221 your driver proves popular and your hardware platforms long-lived then you'll
222 be stuck with a given ioctl essentially foreve    222 be stuck with a given ioctl essentially forever. You can try to deprecate
223 horrible ioctls on newer iterations of your ha    223 horrible ioctls on newer iterations of your hardware, but generally it takes
224 years to accomplish this. And then again years    224 years to accomplish this. And then again years until the last user able to
225 complain about regressions disappears, too.       225 complain about regressions disappears, too.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php