~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/lib/rcuref.c

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 // SPDX-License-Identifier: GPL-2.0-only
  2 
  3 /*
  4  * rcuref - A scalable reference count implementation for RCU managed objects
  5  *
  6  * rcuref is provided to replace open coded reference count implementations
  7  * based on atomic_t. It protects explicitely RCU managed objects which can
  8  * be visible even after the last reference has been dropped and the object
  9  * is heading towards destruction.
 10  *
 11  * A common usage pattern is:
 12  *
 13  * get()
 14  *      rcu_read_lock();
 15  *      p = get_ptr();
 16  *      if (p && !atomic_inc_not_zero(&p->refcnt))
 17  *              p = NULL;
 18  *      rcu_read_unlock();
 19  *      return p;
 20  *
 21  * put()
 22  *      if (!atomic_dec_return(&->refcnt)) {
 23  *              remove_ptr(p);
 24  *              kfree_rcu((p, rcu);
 25  *      }
 26  *
 27  * atomic_inc_not_zero() is implemented with a try_cmpxchg() loop which has
 28  * O(N^2) behaviour under contention with N concurrent operations.
 29  *
 30  * rcuref uses atomic_add_negative_relaxed() for the fast path, which scales
 31  * better under contention.
 32  *
 33  * Why not refcount?
 34  * =================
 35  *
 36  * In principle it should be possible to make refcount use the rcuref
 37  * scheme, but the destruction race described below cannot be prevented
 38  * unless the protected object is RCU managed.
 39  *
 40  * Theory of operation
 41  * ===================
 42  *
 43  * rcuref uses an unsigned integer reference counter. As long as the
 44  * counter value is greater than or equal to RCUREF_ONEREF and not larger
 45  * than RCUREF_MAXREF the reference is alive:
 46  *
 47  * ONEREF   MAXREF               SATURATED             RELEASED      DEAD    NOREF
 48  * 0        0x7FFFFFFF 0x8000000 0xA0000000 0xBFFFFFFF 0xC0000000 0xE0000000 0xFFFFFFFF
 49  * <---valid --------> <-------saturation zone-------> <-----dead zone----->
 50  *
 51  * The get() and put() operations do unconditional increments and
 52  * decrements. The result is checked after the operation. This optimizes
 53  * for the fast path.
 54  *
 55  * If the reference count is saturated or dead, then the increments and
 56  * decrements are not harmful as the reference count still stays in the
 57  * respective zones and is always set back to STATURATED resp. DEAD. The
 58  * zones have room for 2^28 racing operations in each direction, which
 59  * makes it practically impossible to escape the zones.
 60  *
 61  * Once the last reference is dropped the reference count becomes
 62  * RCUREF_NOREF which forces rcuref_put() into the slowpath operation. The
 63  * slowpath then tries to set the reference count from RCUREF_NOREF to
 64  * RCUREF_DEAD via a cmpxchg(). This opens a small window where a
 65  * concurrent rcuref_get() can acquire the reference count and bring it
 66  * back to RCUREF_ONEREF or even drop the reference again and mark it DEAD.
 67  *
 68  * If the cmpxchg() succeeds then a concurrent rcuref_get() will result in
 69  * DEAD + 1, which is inside the dead zone. If that happens the reference
 70  * count is put back to DEAD.
 71  *
 72  * The actual race is possible due to the unconditional increment and
 73  * decrements in rcuref_get() and rcuref_put():
 74  *
 75  *      T1                              T2
 76  *      get()                           put()
 77  *                                      if (atomic_add_negative(-1, &ref->refcnt))
 78  *              succeeds->                      atomic_cmpxchg(&ref->refcnt, NOREF, DEAD);
 79  *
 80  *      atomic_add_negative(1, &ref->refcnt);   <- Elevates refcount to DEAD + 1
 81  *
 82  * As the result of T1's add is negative, the get() goes into the slow path
 83  * and observes refcnt being in the dead zone which makes the operation fail.
 84  *
 85  * Possible critical states:
 86  *
 87  *      Context Counter References      Operation
 88  *      T1      0       1               init()
 89  *      T2      1       2               get()
 90  *      T1      0       1               put()
 91  *      T2     -1       0               put() tries to mark dead
 92  *      T1      0       1               get()
 93  *      T2      0       1               put() mark dead fails
 94  *      T1     -1       0               put() tries to mark dead
 95  *      T1    DEAD      0               put() mark dead succeeds
 96  *      T2    DEAD+1    0               get() fails and puts it back to DEAD
 97  *
 98  * Of course there are more complex scenarios, but the above illustrates
 99  * the working principle. The rest is left to the imagination of the
100  * reader.
101  *
102  * Deconstruction race
103  * ===================
104  *
105  * The release operation must be protected by prohibiting a grace period in
106  * order to prevent a possible use after free:
107  *
108  *      T1                              T2
109  *      put()                           get()
110  *      // ref->refcnt = ONEREF
111  *      if (!atomic_add_negative(-1, &ref->refcnt))
112  *              return false;                           <- Not taken
113  *
114  *      // ref->refcnt == NOREF
115  *      --> preemption
116  *                                      // Elevates ref->refcnt to ONEREF
117  *                                      if (!atomic_add_negative(1, &ref->refcnt))
118  *                                              return true;                    <- taken
119  *
120  *                                      if (put(&p->ref)) { <-- Succeeds
121  *                                              remove_pointer(p);
122  *                                              kfree_rcu(p, rcu);
123  *                                      }
124  *
125  *              RCU grace period ends, object is freed
126  *
127  *      atomic_cmpxchg(&ref->refcnt, NOREF, DEAD);      <- UAF
128  *
129  * This is prevented by disabling preemption around the put() operation as
130  * that's in most kernel configurations cheaper than a rcu_read_lock() /
131  * rcu_read_unlock() pair and in many cases even a NOOP. In any case it
132  * prevents the grace period which keeps the object alive until all put()
133  * operations complete.
134  *
135  * Saturation protection
136  * =====================
137  *
138  * The reference count has a saturation limit RCUREF_MAXREF (INT_MAX).
139  * Once this is exceedded the reference count becomes stale by setting it
140  * to RCUREF_SATURATED, which will cause a memory leak, but it prevents
141  * wrap arounds which obviously cause worse problems than a memory
142  * leak. When saturation is reached a warning is emitted.
143  *
144  * Race conditions
145  * ===============
146  *
147  * All reference count increment/decrement operations are unconditional and
148  * only verified after the fact. This optimizes for the good case and takes
149  * the occasional race vs. a dead or already saturated refcount into
150  * account. The saturation and dead zones are large enough to accomodate
151  * for that.
152  *
153  * Memory ordering
154  * ===============
155  *
156  * Memory ordering rules are slightly relaxed wrt regular atomic_t functions
157  * and provide only what is strictly required for refcounts.
158  *
159  * The increments are fully relaxed; these will not provide ordering. The
160  * rationale is that whatever is used to obtain the object to increase the
161  * reference count on will provide the ordering. For locked data
162  * structures, its the lock acquire, for RCU/lockless data structures its
163  * the dependent load.
164  *
165  * rcuref_get() provides a control dependency ordering future stores which
166  * ensures that the object is not modified when acquiring a reference
167  * fails.
168  *
169  * rcuref_put() provides release order, i.e. all prior loads and stores
170  * will be issued before. It also provides a control dependency ordering
171  * against the subsequent destruction of the object.
172  *
173  * If rcuref_put() successfully dropped the last reference and marked the
174  * object DEAD it also provides acquire ordering.
175  */
176 
177 #include <linux/export.h>
178 #include <linux/rcuref.h>
179 
180 /**
181  * rcuref_get_slowpath - Slowpath of rcuref_get()
182  * @ref:        Pointer to the reference count
183  *
184  * Invoked when the reference count is outside of the valid zone.
185  *
186  * Return:
187  *      False if the reference count was already marked dead
188  *
189  *      True if the reference count is saturated, which prevents the
190  *      object from being deconstructed ever.
191  */
192 bool rcuref_get_slowpath(rcuref_t *ref)
193 {
194         unsigned int cnt = atomic_read(&ref->refcnt);
195 
196         /*
197          * If the reference count was already marked dead, undo the
198          * increment so it stays in the middle of the dead zone and return
199          * fail.
200          */
201         if (cnt >= RCUREF_RELEASED) {
202                 atomic_set(&ref->refcnt, RCUREF_DEAD);
203                 return false;
204         }
205 
206         /*
207          * If it was saturated, warn and mark it so. In case the increment
208          * was already on a saturated value restore the saturation
209          * marker. This keeps it in the middle of the saturation zone and
210          * prevents the reference count from overflowing. This leaks the
211          * object memory, but prevents the obvious reference count overflow
212          * damage.
213          */
214         if (WARN_ONCE(cnt > RCUREF_MAXREF, "rcuref saturated - leaking memory"))
215                 atomic_set(&ref->refcnt, RCUREF_SATURATED);
216         return true;
217 }
218 EXPORT_SYMBOL_GPL(rcuref_get_slowpath);
219 
220 /**
221  * rcuref_put_slowpath - Slowpath of __rcuref_put()
222  * @ref:        Pointer to the reference count
223  *
224  * Invoked when the reference count is outside of the valid zone.
225  *
226  * Return:
227  *      True if this was the last reference with no future references
228  *      possible. This signals the caller that it can safely schedule the
229  *      object, which is protected by the reference counter, for
230  *      deconstruction.
231  *
232  *      False if there are still active references or the put() raced
233  *      with a concurrent get()/put() pair. Caller is not allowed to
234  *      deconstruct the protected object.
235  */
236 bool rcuref_put_slowpath(rcuref_t *ref)
237 {
238         unsigned int cnt = atomic_read(&ref->refcnt);
239 
240         /* Did this drop the last reference? */
241         if (likely(cnt == RCUREF_NOREF)) {
242                 /*
243                  * Carefully try to set the reference count to RCUREF_DEAD.
244                  *
245                  * This can fail if a concurrent get() operation has
246                  * elevated it again or the corresponding put() even marked
247                  * it dead already. Both are valid situations and do not
248                  * require a retry. If this fails the caller is not
249                  * allowed to deconstruct the object.
250                  */
251                 if (!atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD))
252                         return false;
253 
254                 /*
255                  * The caller can safely schedule the object for
256                  * deconstruction. Provide acquire ordering.
257                  */
258                 smp_acquire__after_ctrl_dep();
259                 return true;
260         }
261 
262         /*
263          * If the reference count was already in the dead zone, then this
264          * put() operation is imbalanced. Warn, put the reference count back to
265          * DEAD and tell the caller to not deconstruct the object.
266          */
267         if (WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")) {
268                 atomic_set(&ref->refcnt, RCUREF_DEAD);
269                 return false;
270         }
271 
272         /*
273          * This is a put() operation on a saturated refcount. Restore the
274          * mean saturation value and tell the caller to not deconstruct the
275          * object.
276          */
277         if (cnt > RCUREF_MAXREF)
278                 atomic_set(&ref->refcnt, RCUREF_SATURATED);
279         return false;
280 }
281 EXPORT_SYMBOL_GPL(rcuref_put_slowpath);
282 

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php