~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/core-api/errseq.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 =====================
  2 The errseq_t datatype
  3 =====================
  4 
  5 An errseq_t is a way of recording errors in one place, and allowing any
  6 number of "subscribers" to tell whether it has changed since a previous
  7 point where it was sampled.
  8 
  9 The initial use case for this is tracking errors for file
 10 synchronization syscalls (fsync, fdatasync, msync and sync_file_range),
 11 but it may be usable in other situations.
 12 
 13 It's implemented as an unsigned 32-bit value.  The low order bits are
 14 designated to hold an error code (between 1 and MAX_ERRNO).  The upper bits
 15 are used as a counter.  This is done with atomics instead of locking so that
 16 these functions can be called from any context.
 17 
 18 Note that there is a risk of collisions if new errors are being recorded
 19 frequently, since we have so few bits to use as a counter.
 20 
 21 To mitigate this, the bit between the error value and counter is used as
 22 a flag to tell whether the value has been sampled since a new value was
 23 recorded.  That allows us to avoid bumping the counter if no one has
 24 sampled it since the last time an error was recorded.
 25 
 26 Thus we end up with a value that looks something like this:
 27 
 28 +--------------------------------------+----+------------------------+
 29 | 31..13                               | 12 | 11..0                  |
 30 +--------------------------------------+----+------------------------+
 31 | counter                              | SF | errno                  |
 32 +--------------------------------------+----+------------------------+
 33 
 34 The general idea is for "watchers" to sample an errseq_t value and keep
 35 it as a running cursor.  That value can later be used to tell whether
 36 any new errors have occurred since that sampling was done, and atomically
 37 record the state at the time that it was checked.  This allows us to
 38 record errors in one place, and then have a number of "watchers" that
 39 can tell whether the value has changed since they last checked it.
 40 
 41 A new errseq_t should always be zeroed out.  An errseq_t value of all zeroes
 42 is the special (but common) case where there has never been an error. An all
 43 zero value thus serves as the "epoch" if one wishes to know whether there
 44 has ever been an error set since it was first initialized.
 45 
 46 API usage
 47 =========
 48 
 49 Let me tell you a story about a worker drone.  Now, he's a good worker
 50 overall, but the company is a little...management heavy.  He has to
 51 report to 77 supervisors today, and tomorrow the "big boss" is coming in
 52 from out of town and he's sure to test the poor fellow too.
 53 
 54 They're all handing him work to do -- so much he can't keep track of who
 55 handed him what, but that's not really a big problem.  The supervisors
 56 just want to know when he's finished all of the work they've handed him so
 57 far and whether he made any mistakes since they last asked.
 58 
 59 He might have made the mistake on work they didn't actually hand him,
 60 but he can't keep track of things at that level of detail, all he can
 61 remember is the most recent mistake that he made.
 62 
 63 Here's our worker_drone representation::
 64 
 65         struct worker_drone {
 66                 errseq_t        wd_err; /* for recording errors */
 67         };
 68 
 69 Every day, the worker_drone starts out with a blank slate::
 70 
 71         struct worker_drone wd;
 72 
 73         wd.wd_err = (errseq_t)0;
 74 
 75 The supervisors come in and get an initial read for the day.  They
 76 don't care about anything that happened before their watch begins::
 77 
 78         struct supervisor {
 79                 errseq_t        s_wd_err; /* private "cursor" for wd_err */
 80                 spinlock_t      s_wd_err_lock; /* protects s_wd_err */
 81         }
 82 
 83         struct supervisor       su;
 84 
 85         su.s_wd_err = errseq_sample(&wd.wd_err);
 86         spin_lock_init(&su.s_wd_err_lock);
 87 
 88 Now they start handing him tasks to do.  Every few minutes they ask him to
 89 finish up all of the work they've handed him so far.  Then they ask him
 90 whether he made any mistakes on any of it::
 91 
 92         spin_lock(&su.su_wd_err_lock);
 93         err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
 94         spin_unlock(&su.su_wd_err_lock);
 95 
 96 Up to this point, that just keeps returning 0.
 97 
 98 Now, the owners of this company are quite miserly and have given him
 99 substandard equipment with which to do his job. Occasionally it
100 glitches and he makes a mistake.  He sighs a heavy sigh, and marks it
101 down::
102 
103         errseq_set(&wd.wd_err, -EIO);
104 
105 ...and then gets back to work.  The supervisors eventually poll again
106 and they each get the error when they next check.  Subsequent calls will
107 return 0, until another error is recorded, at which point it's reported
108 to each of them once.
109 
110 Note that the supervisors can't tell how many mistakes he made, only
111 whether one was made since they last checked, and the latest value
112 recorded.
113 
114 Occasionally the big boss comes in for a spot check and asks the worker
115 to do a one-off job for him. He's not really watching the worker
116 full-time like the supervisors, but he does need to know whether a
117 mistake occurred while his job was processing.
118 
119 He can just sample the current errseq_t in the worker, and then use that
120 to tell whether an error has occurred later::
121 
122         errseq_t since = errseq_sample(&wd.wd_err);
123         /* submit some work and wait for it to complete */
124         err = errseq_check(&wd.wd_err, since);
125 
126 Since he's just going to discard "since" after that point, he doesn't
127 need to advance it here. He also doesn't need any locking since it's
128 not usable by anyone else.
129 
130 Serializing errseq_t cursor updates
131 ===================================
132 
133 Note that the errseq_t API does not protect the errseq_t cursor during a
134 check_and_advance_operation. Only the canonical error code is handled
135 atomically.  In a situation where more than one task might be using the
136 same errseq_t cursor at the same time, it's important to serialize
137 updates to that cursor.
138 
139 If that's not done, then it's possible for the cursor to go backward
140 in which case the same error could be reported more than once.
141 
142 Because of this, it's often advantageous to first do an errseq_check to
143 see if anything has changed, and only later do an
144 errseq_check_and_advance after taking the lock. e.g.::
145 
146         if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) {
147                 /* su.s_wd_err is protected by s_wd_err_lock */
148                 spin_lock(&su.s_wd_err_lock);
149                 err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
150                 spin_unlock(&su.s_wd_err_lock);
151         }
152 
153 That avoids the spinlock in the common case where nothing has changed
154 since the last time it was checked.
155 
156 Functions
157 =========
158 
159 .. kernel-doc:: lib/errseq.c

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php