~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/core-api/xarray.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0+
  2 
  3 ======
  4 XArray
  5 ======
  6 
  7 :Author: Matthew Wilcox
  8 
  9 Overview
 10 ========
 11 
 12 The XArray is an abstract data type which behaves like a very large array
 13 of pointers.  It meets many of the same needs as a hash or a conventional
 14 resizable array.  Unlike a hash, it allows you to sensibly go to the
 15 next or previous entry in a cache-efficient manner.  In contrast to a
 16 resizable array, there is no need to copy data or change MMU mappings in
 17 order to grow the array.  It is more memory-efficient, parallelisable
 18 and cache friendly than a doubly-linked list.  It takes advantage of
 19 RCU to perform lookups without locking.
 20 
 21 The XArray implementation is efficient when the indices used are densely
 22 clustered; hashing the object and using the hash as the index will not
 23 perform well.  The XArray is optimised for small indices, but still has
 24 good performance with large indices.  If your index can be larger than
 25 ``ULONG_MAX`` then the XArray is not the data type for you.  The most
 26 important user of the XArray is the page cache.
 27 
 28 Normal pointers may be stored in the XArray directly.  They must be 4-byte
 29 aligned, which is true for any pointer returned from kmalloc() and
 30 alloc_page().  It isn't true for arbitrary user-space pointers,
 31 nor for function pointers.  You can store pointers to statically allocated
 32 objects, as long as those objects have an alignment of at least 4.
 33 
 34 You can also store integers between 0 and ``LONG_MAX`` in the XArray.
 35 You must first convert it into an entry using xa_mk_value().
 36 When you retrieve an entry from the XArray, you can check whether it is
 37 a value entry by calling xa_is_value(), and convert it back to
 38 an integer by calling xa_to_value().
 39 
 40 Some users want to tag the pointers they store in the XArray.  You can
 41 call xa_tag_pointer() to create an entry with a tag, xa_untag_pointer()
 42 to turn a tagged entry back into an untagged pointer and xa_pointer_tag()
 43 to retrieve the tag of an entry.  Tagged pointers use the same bits that
 44 are used to distinguish value entries from normal pointers, so you must
 45 decide whether they want to store value entries or tagged pointers in
 46 any particular XArray.
 47 
 48 The XArray does not support storing IS_ERR() pointers as some
 49 conflict with value entries or internal entries.
 50 
 51 An unusual feature of the XArray is the ability to create entries which
 52 occupy a range of indices.  Once stored to, looking up any index in
 53 the range will return the same entry as looking up any other index in
 54 the range.  Storing to any index will store to all of them.  Multi-index
 55 entries can be explicitly split into smaller entries, or storing ``NULL``
 56 into any entry will cause the XArray to forget about the range.
 57 
 58 Normal API
 59 ==========
 60 
 61 Start by initialising an XArray, either with DEFINE_XARRAY()
 62 for statically allocated XArrays or xa_init() for dynamically
 63 allocated ones.  A freshly-initialised XArray contains a ``NULL``
 64 pointer at every index.
 65 
 66 You can then set entries using xa_store() and get entries
 67 using xa_load().  xa_store will overwrite any entry with the
 68 new entry and return the previous entry stored at that index.  You can
 69 use xa_erase() instead of calling xa_store() with a
 70 ``NULL`` entry.  There is no difference between an entry that has never
 71 been stored to, one that has been erased and one that has most recently
 72 had ``NULL`` stored to it.
 73 
 74 You can conditionally replace an entry at an index by using
 75 xa_cmpxchg().  Like cmpxchg(), it will only succeed if
 76 the entry at that index has the 'old' value.  It also returns the entry
 77 which was at that index; if it returns the same entry which was passed as
 78 'old', then xa_cmpxchg() succeeded.
 79 
 80 If you want to only store a new entry to an index if the current entry
 81 at that index is ``NULL``, you can use xa_insert() which
 82 returns ``-EBUSY`` if the entry is not empty.
 83 
 84 You can copy entries out of the XArray into a plain array by calling
 85 xa_extract().  Or you can iterate over the present entries in the XArray
 86 by calling xa_for_each(), xa_for_each_start() or xa_for_each_range().
 87 You may prefer to use xa_find() or xa_find_after() to move to the next
 88 present entry in the XArray.
 89 
 90 Calling xa_store_range() stores the same entry in a range
 91 of indices.  If you do this, some of the other operations will behave
 92 in a slightly odd way.  For example, marking the entry at one index
 93 may result in the entry being marked at some, but not all of the other
 94 indices.  Storing into one index may result in the entry retrieved by
 95 some, but not all of the other indices changing.
 96 
 97 Sometimes you need to ensure that a subsequent call to xa_store()
 98 will not need to allocate memory.  The xa_reserve() function
 99 will store a reserved entry at the indicated index.  Users of the
100 normal API will see this entry as containing ``NULL``.  If you do
101 not need to use the reserved entry, you can call xa_release()
102 to remove the unused entry.  If another user has stored to the entry
103 in the meantime, xa_release() will do nothing; if instead you
104 want the entry to become ``NULL``, you should use xa_erase().
105 Using xa_insert() on a reserved entry will fail.
106 
107 If all entries in the array are ``NULL``, the xa_empty() function
108 will return ``true``.
109 
110 Finally, you can remove all entries from an XArray by calling
111 xa_destroy().  If the XArray entries are pointers, you may wish
112 to free the entries first.  You can do this by iterating over all present
113 entries in the XArray using the xa_for_each() iterator.
114 
115 Search Marks
116 ------------
117 
118 Each entry in the array has three bits associated with it called marks.
119 Each mark may be set or cleared independently of the others.  You can
120 iterate over marked entries by using the xa_for_each_marked() iterator.
121 
122 You can enquire whether a mark is set on an entry by using
123 xa_get_mark().  If the entry is not ``NULL``, you can set a mark on it
124 by using xa_set_mark() and remove the mark from an entry by calling
125 xa_clear_mark().  You can ask whether any entry in the XArray has a
126 particular mark set by calling xa_marked().  Erasing an entry from the
127 XArray causes all marks associated with that entry to be cleared.
128 
129 Setting or clearing a mark on any index of a multi-index entry will
130 affect all indices covered by that entry.  Querying the mark on any
131 index will return the same result.
132 
133 There is no way to iterate over entries which are not marked; the data
134 structure does not allow this to be implemented efficiently.  There are
135 not currently iterators to search for logical combinations of bits (eg
136 iterate over all entries which have both ``XA_MARK_1`` and ``XA_MARK_2``
137 set, or iterate over all entries which have ``XA_MARK_0`` or ``XA_MARK_2``
138 set).  It would be possible to add these if a user arises.
139 
140 Allocating XArrays
141 ------------------
142 
143 If you use DEFINE_XARRAY_ALLOC() to define the XArray, or
144 initialise it by passing ``XA_FLAGS_ALLOC`` to xa_init_flags(),
145 the XArray changes to track whether entries are in use or not.
146 
147 You can call xa_alloc() to store the entry at an unused index
148 in the XArray.  If you need to modify the array from interrupt context,
149 you can use xa_alloc_bh() or xa_alloc_irq() to disable
150 interrupts while allocating the ID.
151 
152 Using xa_store(), xa_cmpxchg() or xa_insert() will
153 also mark the entry as being allocated.  Unlike a normal XArray, storing
154 ``NULL`` will mark the entry as being in use, like xa_reserve().
155 To free an entry, use xa_erase() (or xa_release() if
156 you only want to free the entry if it's ``NULL``).
157 
158 By default, the lowest free entry is allocated starting from 0.  If you
159 want to allocate entries starting at 1, it is more efficient to use
160 DEFINE_XARRAY_ALLOC1() or ``XA_FLAGS_ALLOC1``.  If you want to
161 allocate IDs up to a maximum, then wrap back around to the lowest free
162 ID, you can use xa_alloc_cyclic().
163 
164 You cannot use ``XA_MARK_0`` with an allocating XArray as this mark
165 is used to track whether an entry is free or not.  The other marks are
166 available for your use.
167 
168 Memory allocation
169 -----------------
170 
171 The xa_store(), xa_cmpxchg(), xa_alloc(),
172 xa_reserve() and xa_insert() functions take a gfp_t
173 parameter in case the XArray needs to allocate memory to store this entry.
174 If the entry is being deleted, no memory allocation needs to be performed,
175 and the GFP flags specified will be ignored.
176 
177 It is possible for no memory to be allocatable, particularly if you pass
178 a restrictive set of GFP flags.  In that case, the functions return a
179 special value which can be turned into an errno using xa_err().
180 If you don't need to know exactly which error occurred, using
181 xa_is_err() is slightly more efficient.
182 
183 Locking
184 -------
185 
186 When using the Normal API, you do not have to worry about locking.
187 The XArray uses RCU and an internal spinlock to synchronise access:
188 
189 No lock needed:
190  * xa_empty()
191  * xa_marked()
192 
193 Takes RCU read lock:
194  * xa_load()
195  * xa_for_each()
196  * xa_for_each_start()
197  * xa_for_each_range()
198  * xa_find()
199  * xa_find_after()
200  * xa_extract()
201  * xa_get_mark()
202 
203 Takes xa_lock internally:
204  * xa_store()
205  * xa_store_bh()
206  * xa_store_irq()
207  * xa_insert()
208  * xa_insert_bh()
209  * xa_insert_irq()
210  * xa_erase()
211  * xa_erase_bh()
212  * xa_erase_irq()
213  * xa_cmpxchg()
214  * xa_cmpxchg_bh()
215  * xa_cmpxchg_irq()
216  * xa_store_range()
217  * xa_alloc()
218  * xa_alloc_bh()
219  * xa_alloc_irq()
220  * xa_reserve()
221  * xa_reserve_bh()
222  * xa_reserve_irq()
223  * xa_destroy()
224  * xa_set_mark()
225  * xa_clear_mark()
226 
227 Assumes xa_lock held on entry:
228  * __xa_store()
229  * __xa_insert()
230  * __xa_erase()
231  * __xa_cmpxchg()
232  * __xa_alloc()
233  * __xa_set_mark()
234  * __xa_clear_mark()
235 
236 If you want to take advantage of the lock to protect the data structures
237 that you are storing in the XArray, you can call xa_lock()
238 before calling xa_load(), then take a reference count on the
239 object you have found before calling xa_unlock().  This will
240 prevent stores from removing the object from the array between looking
241 up the object and incrementing the refcount.  You can also use RCU to
242 avoid dereferencing freed memory, but an explanation of that is beyond
243 the scope of this document.
244 
245 The XArray does not disable interrupts or softirqs while modifying
246 the array.  It is safe to read the XArray from interrupt or softirq
247 context as the RCU lock provides enough protection.
248 
249 If, for example, you want to store entries in the XArray in process
250 context and then erase them in softirq context, you can do that this way::
251 
252     void foo_init(struct foo *foo)
253     {
254         xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH);
255     }
256 
257     int foo_store(struct foo *foo, unsigned long index, void *entry)
258     {
259         int err;
260 
261         xa_lock_bh(&foo->array);
262         err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL));
263         if (!err)
264             foo->count++;
265         xa_unlock_bh(&foo->array);
266         return err;
267     }
268 
269     /* foo_erase() is only called from softirq context */
270     void foo_erase(struct foo *foo, unsigned long index)
271     {
272         xa_lock(&foo->array);
273         __xa_erase(&foo->array, index);
274         foo->count--;
275         xa_unlock(&foo->array);
276     }
277 
278 If you are going to modify the XArray from interrupt or softirq context,
279 you need to initialise the array using xa_init_flags(), passing
280 ``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
281 
282 The above example also shows a common pattern of wanting to extend the
283 coverage of the xa_lock on the store side to protect some statistics
284 associated with the array.
285 
286 Sharing the XArray with interrupt context is also possible, either
287 using xa_lock_irqsave() in both the interrupt handler and process
288 context, or xa_lock_irq() in process context and xa_lock()
289 in the interrupt handler.  Some of the more common patterns have helper
290 functions such as xa_store_bh(), xa_store_irq(),
291 xa_erase_bh(), xa_erase_irq(), xa_cmpxchg_bh()
292 and xa_cmpxchg_irq().
293 
294 Sometimes you need to protect access to the XArray with a mutex because
295 that lock sits above another mutex in the locking hierarchy.  That does
296 not entitle you to use functions like __xa_erase() without taking
297 the xa_lock; the xa_lock is used for lockdep validation and will be used
298 for other purposes in the future.
299 
300 The __xa_set_mark() and __xa_clear_mark() functions are also
301 available for situations where you look up an entry and want to atomically
302 set or clear a mark.  It may be more efficient to use the advanced API
303 in this case, as it will save you from walking the tree twice.
304 
305 Advanced API
306 ============
307 
308 The advanced API offers more flexibility and better performance at the
309 cost of an interface which can be harder to use and has fewer safeguards.
310 No locking is done for you by the advanced API, and you are required
311 to use the xa_lock while modifying the array.  You can choose whether
312 to use the xa_lock or the RCU lock while doing read-only operations on
313 the array.  You can mix advanced and normal operations on the same array;
314 indeed the normal API is implemented in terms of the advanced API.  The
315 advanced API is only available to modules with a GPL-compatible license.
316 
317 The advanced API is based around the xa_state.  This is an opaque data
318 structure which you declare on the stack using the XA_STATE() macro.
319 This macro initialises the xa_state ready to start walking around the
320 XArray.  It is used as a cursor to maintain the position in the XArray
321 and let you compose various operations together without having to restart
322 from the top every time.  The contents of the xa_state are protected by
323 the rcu_read_lock() or the xas_lock().  If you need to drop whichever of
324 those locks is protecting your state and tree, you must call xas_pause()
325 so that future calls do not rely on the parts of the state which were
326 left unprotected.
327 
328 The xa_state is also used to store errors.  You can call
329 xas_error() to retrieve the error.  All operations check whether
330 the xa_state is in an error state before proceeding, so there's no need
331 for you to check for an error after each call; you can make multiple
332 calls in succession and only check at a convenient point.  The only
333 errors currently generated by the XArray code itself are ``ENOMEM`` and
334 ``EINVAL``, but it supports arbitrary errors in case you want to call
335 xas_set_err() yourself.
336 
337 If the xa_state is holding an ``ENOMEM`` error, calling xas_nomem()
338 will attempt to allocate more memory using the specified gfp flags and
339 cache it in the xa_state for the next attempt.  The idea is that you take
340 the xa_lock, attempt the operation and drop the lock.  The operation
341 attempts to allocate memory while holding the lock, but it is more
342 likely to fail.  Once you have dropped the lock, xas_nomem()
343 can try harder to allocate more memory.  It will return ``true`` if it
344 is worth retrying the operation (i.e. that there was a memory error *and*
345 more memory was allocated).  If it has previously allocated memory, and
346 that memory wasn't used, and there is no error (or some error that isn't
347 ``ENOMEM``), then it will free the memory previously allocated.
348 
349 Internal Entries
350 ----------------
351 
352 The XArray reserves some entries for its own purposes.  These are never
353 exposed through the normal API, but when using the advanced API, it's
354 possible to see them.  Usually the best way to handle them is to pass them
355 to xas_retry(), and retry the operation if it returns ``true``.
356 
357 .. flat-table::
358    :widths: 1 1 6
359 
360    * - Name
361      - Test
362      - Usage
363 
364    * - Node
365      - xa_is_node()
366      - An XArray node.  May be visible when using a multi-index xa_state.
367 
368    * - Sibling
369      - xa_is_sibling()
370      - A non-canonical entry for a multi-index entry.  The value indicates
371        which slot in this node has the canonical entry.
372 
373    * - Retry
374      - xa_is_retry()
375      - This entry is currently being modified by a thread which has the
376        xa_lock.  The node containing this entry may be freed at the end
377        of this RCU period.  You should restart the lookup from the head
378        of the array.
379 
380    * - Zero
381      - xa_is_zero()
382      - Zero entries appear as ``NULL`` through the Normal API, but occupy
383        an entry in the XArray which can be used to reserve the index for
384        future use.  This is used by allocating XArrays for allocated entries
385        which are ``NULL``.
386 
387 Other internal entries may be added in the future.  As far as possible, they
388 will be handled by xas_retry().
389 
390 Additional functionality
391 ------------------------
392 
393 The xas_create_range() function allocates all the necessary memory
394 to store every entry in a range.  It will set ENOMEM in the xa_state if
395 it cannot allocate memory.
396 
397 You can use xas_init_marks() to reset the marks on an entry
398 to their default state.  This is usually all marks clear, unless the
399 XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
400 and all other marks are clear.  Replacing one entry with another using
401 xas_store() will not reset the marks on that entry; if you want
402 the marks reset, you should do that explicitly.
403 
404 The xas_load() will walk the xa_state as close to the entry
405 as it can.  If you know the xa_state has already been walked to the
406 entry and need to check that the entry hasn't changed, you can use
407 xas_reload() to save a function call.
408 
409 If you need to move to a different index in the XArray, call
410 xas_set().  This resets the cursor to the top of the tree, which
411 will generally make the next operation walk the cursor to the desired
412 spot in the tree.  If you want to move to the next or previous index,
413 call xas_next() or xas_prev().  Setting the index does
414 not walk the cursor around the array so does not require a lock to be
415 held, while moving to the next or previous index does.
416 
417 You can search for the next present entry using xas_find().  This
418 is the equivalent of both xa_find() and xa_find_after();
419 if the cursor has been walked to an entry, then it will find the next
420 entry after the one currently referenced.  If not, it will return the
421 entry at the index of the xa_state.  Using xas_next_entry() to
422 move to the next present entry instead of xas_find() will save
423 a function call in the majority of cases at the expense of emitting more
424 inline code.
425 
426 The xas_find_marked() function is similar.  If the xa_state has
427 not been walked, it will return the entry at the index of the xa_state,
428 if it is marked.  Otherwise, it will return the first marked entry after
429 the entry referenced by the xa_state.  The xas_next_marked()
430 function is the equivalent of xas_next_entry().
431 
432 When iterating over a range of the XArray using xas_for_each()
433 or xas_for_each_marked(), it may be necessary to temporarily stop
434 the iteration.  The xas_pause() function exists for this purpose.
435 After you have done the necessary work and wish to resume, the xa_state
436 is in an appropriate state to continue the iteration after the entry
437 you last processed.  If you have interrupts disabled while iterating,
438 then it is good manners to pause the iteration and reenable interrupts
439 every ``XA_CHECK_SCHED`` entries.
440 
441 The xas_get_mark(), xas_set_mark() and xas_clear_mark() functions require
442 the xa_state cursor to have been moved to the appropriate location in the
443 XArray; they will do nothing if you have called xas_pause() or xas_set()
444 immediately before.
445 
446 You can call xas_set_update() to have a callback function
447 called each time the XArray updates a node.  This is used by the page
448 cache workingset code to maintain its list of nodes which contain only
449 shadow entries.
450 
451 Multi-Index Entries
452 -------------------
453 
454 The XArray has the ability to tie multiple indices together so that
455 operations on one index affect all indices.  For example, storing into
456 any index will change the value of the entry retrieved from any index.
457 Setting or clearing a mark on any index will set or clear the mark
458 on every index that is tied together.  The current implementation
459 only allows tying ranges which are aligned powers of two together;
460 eg indices 64-127 may be tied together, but 2-6 may not be.  This may
461 save substantial quantities of memory; for example tying 512 entries
462 together will save over 4kB.
463 
464 You can create a multi-index entry by using XA_STATE_ORDER()
465 or xas_set_order() followed by a call to xas_store().
466 Calling xas_load() with a multi-index xa_state will walk the
467 xa_state to the right location in the tree, but the return value is not
468 meaningful, potentially being an internal entry or ``NULL`` even when there
469 is an entry stored within the range.  Calling xas_find_conflict()
470 will return the first entry within the range or ``NULL`` if there are no
471 entries in the range.  The xas_for_each_conflict() iterator will
472 iterate over every entry which overlaps the specified range.
473 
474 If xas_load() encounters a multi-index entry, the xa_index
475 in the xa_state will not be changed.  When iterating over an XArray
476 or calling xas_find(), if the initial index is in the middle
477 of a multi-index entry, it will not be altered.  Subsequent calls
478 or iterations will move the index to the first index in the range.
479 Each entry will only be returned once, no matter how many indices it
480 occupies.
481 
482 Using xas_next() or xas_prev() with a multi-index xa_state is not
483 supported.  Using either of these functions on a multi-index entry will
484 reveal sibling entries; these should be skipped over by the caller.
485 
486 Storing ``NULL`` into any index of a multi-index entry will set the
487 entry at every index to ``NULL`` and dissolve the tie.  A multi-index
488 entry can be split into entries occupying smaller ranges by calling
489 xas_split_alloc() without the xa_lock held, followed by taking the lock
490 and calling xas_split().
491 
492 Functions and structures
493 ========================
494 
495 .. kernel-doc:: include/linux/xarray.h
496 .. kernel-doc:: lib/xarray.c

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php