~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/scheduler/sched-ext.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/scheduler/sched-ext.rst (Architecture m68k) and /Documentation/scheduler/sched-ext.rst (Architecture sparc)


  1 ==========================                          1 ==========================
  2 Extensible Scheduler Class                          2 Extensible Scheduler Class
  3 ==========================                          3 ==========================
  4                                                     4 
  5 sched_ext is a scheduler class whose behavior       5 sched_ext is a scheduler class whose behavior can be defined by a set of BPF
  6 programs - the BPF scheduler.                       6 programs - the BPF scheduler.
  7                                                     7 
  8 * sched_ext exports a full scheduling interfac      8 * sched_ext exports a full scheduling interface so that any scheduling
  9   algorithm can be implemented on top.              9   algorithm can be implemented on top.
 10                                                    10 
 11 * The BPF scheduler can group CPUs however it      11 * The BPF scheduler can group CPUs however it sees fit and schedule them
 12   together, as tasks aren't tied to specific C     12   together, as tasks aren't tied to specific CPUs at the time of wakeup.
 13                                                    13 
 14 * The BPF scheduler can be turned on and off d     14 * The BPF scheduler can be turned on and off dynamically anytime.
 15                                                    15 
 16 * The system integrity is maintained no matter     16 * The system integrity is maintained no matter what the BPF scheduler does.
 17   The default scheduling behavior is restored      17   The default scheduling behavior is restored anytime an error is detected,
 18   a runnable task stalls, or on invoking the S     18   a runnable task stalls, or on invoking the SysRq key sequence
 19   :kbd:`SysRq-S`.                                  19   :kbd:`SysRq-S`.
 20                                                    20 
 21 * When the BPF scheduler triggers an error, de     21 * When the BPF scheduler triggers an error, debug information is dumped to
 22   aid debugging. The debug dump is passed to a     22   aid debugging. The debug dump is passed to and printed out by the
 23   scheduler binary. The debug dump can also be     23   scheduler binary. The debug dump can also be accessed through the
 24   `sched_ext_dump` tracepoint. The SysRq key s     24   `sched_ext_dump` tracepoint. The SysRq key sequence :kbd:`SysRq-D`
 25   triggers a debug dump. This doesn't terminat     25   triggers a debug dump. This doesn't terminate the BPF scheduler and can
 26   only be read through the tracepoint.             26   only be read through the tracepoint.
 27                                                    27 
 28 Switching to and from sched_ext                    28 Switching to and from sched_ext
 29 ===============================                    29 ===============================
 30                                                    30 
 31 ``CONFIG_SCHED_CLASS_EXT`` is the config optio     31 ``CONFIG_SCHED_CLASS_EXT`` is the config option to enable sched_ext and
 32 ``tools/sched_ext`` contains the example sched     32 ``tools/sched_ext`` contains the example schedulers. The following config
 33 options should be enabled to use sched_ext:        33 options should be enabled to use sched_ext:
 34                                                    34 
 35 .. code-block:: none                               35 .. code-block:: none
 36                                                    36 
 37     CONFIG_BPF=y                                   37     CONFIG_BPF=y
 38     CONFIG_SCHED_CLASS_EXT=y                       38     CONFIG_SCHED_CLASS_EXT=y
 39     CONFIG_BPF_SYSCALL=y                           39     CONFIG_BPF_SYSCALL=y
 40     CONFIG_BPF_JIT=y                               40     CONFIG_BPF_JIT=y
 41     CONFIG_DEBUG_INFO_BTF=y                        41     CONFIG_DEBUG_INFO_BTF=y
 42     CONFIG_BPF_JIT_ALWAYS_ON=y                     42     CONFIG_BPF_JIT_ALWAYS_ON=y
 43     CONFIG_BPF_JIT_DEFAULT_ON=y                    43     CONFIG_BPF_JIT_DEFAULT_ON=y
 44     CONFIG_PAHOLE_HAS_SPLIT_BTF=y                  44     CONFIG_PAHOLE_HAS_SPLIT_BTF=y
 45     CONFIG_PAHOLE_HAS_BTF_TAG=y                    45     CONFIG_PAHOLE_HAS_BTF_TAG=y
 46                                                    46 
 47 sched_ext is used only when the BPF scheduler      47 sched_ext is used only when the BPF scheduler is loaded and running.
 48                                                    48 
 49 If a task explicitly sets its scheduling polic     49 If a task explicitly sets its scheduling policy to ``SCHED_EXT``, it will be
 50 treated as ``SCHED_NORMAL`` and scheduled by C     50 treated as ``SCHED_NORMAL`` and scheduled by CFS until the BPF scheduler is
 51 loaded.                                            51 loaded.
 52                                                    52 
 53 When the BPF scheduler is loaded and ``SCX_OPS     53 When the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is not set
 54 in ``ops->flags``, all ``SCHED_NORMAL``, ``SCH     54 in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
 55 ``SCHED_EXT`` tasks are scheduled by sched_ext     55 ``SCHED_EXT`` tasks are scheduled by sched_ext.
 56                                                    56 
 57 However, when the BPF scheduler is loaded and      57 However, when the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is
 58 set in ``ops->flags``, only tasks with the ``S     58 set in ``ops->flags``, only tasks with the ``SCHED_EXT`` policy are scheduled
 59 by sched_ext, while tasks with ``SCHED_NORMAL`     59 by sched_ext, while tasks with ``SCHED_NORMAL``, ``SCHED_BATCH`` and
 60 ``SCHED_IDLE`` policies are scheduled by CFS.      60 ``SCHED_IDLE`` policies are scheduled by CFS.
 61                                                    61 
 62 Terminating the sched_ext scheduler program, t     62 Terminating the sched_ext scheduler program, triggering :kbd:`SysRq-S`, or
 63 detection of any internal error including stal     63 detection of any internal error including stalled runnable tasks aborts the
 64 BPF scheduler and reverts all tasks back to CF     64 BPF scheduler and reverts all tasks back to CFS.
 65                                                    65 
 66 .. code-block:: none                               66 .. code-block:: none
 67                                                    67 
 68     # make -j16 -C tools/sched_ext                 68     # make -j16 -C tools/sched_ext
 69     # tools/sched_ext/build/bin/scx_simple         69     # tools/sched_ext/build/bin/scx_simple
 70     local=0 global=3                               70     local=0 global=3
 71     local=5 global=24                              71     local=5 global=24
 72     local=9 global=44                              72     local=9 global=44
 73     local=13 global=56                             73     local=13 global=56
 74     local=17 global=72                             74     local=17 global=72
 75     ^CEXIT: BPF scheduler unregistered             75     ^CEXIT: BPF scheduler unregistered
 76                                                    76 
 77 The current status of the BPF scheduler can be     77 The current status of the BPF scheduler can be determined as follows:
 78                                                    78 
 79 .. code-block:: none                               79 .. code-block:: none
 80                                                    80 
 81     # cat /sys/kernel/sched_ext/state              81     # cat /sys/kernel/sched_ext/state
 82     enabled                                        82     enabled
 83     # cat /sys/kernel/sched_ext/root/ops           83     # cat /sys/kernel/sched_ext/root/ops
 84     simple                                         84     simple
 85                                                    85 
 86 You can check if any BPF scheduler has ever be     86 You can check if any BPF scheduler has ever been loaded since boot by examining
 87 this monotonically incrementing counter (a val     87 this monotonically incrementing counter (a value of zero indicates that no BPF
 88 scheduler has been loaded):                        88 scheduler has been loaded):
 89                                                    89 
 90 .. code-block:: none                               90 .. code-block:: none
 91                                                    91 
 92     # cat /sys/kernel/sched_ext/enable_seq         92     # cat /sys/kernel/sched_ext/enable_seq
 93     1                                              93     1
 94                                                    94 
 95 ``tools/sched_ext/scx_show_state.py`` is a drg     95 ``tools/sched_ext/scx_show_state.py`` is a drgn script which shows more
 96 detailed information:                              96 detailed information:
 97                                                    97 
 98 .. code-block:: none                               98 .. code-block:: none
 99                                                    99 
100     # tools/sched_ext/scx_show_state.py           100     # tools/sched_ext/scx_show_state.py
101     ops           : simple                        101     ops           : simple
102     enabled       : 1                             102     enabled       : 1
103     switching_all : 1                             103     switching_all : 1
104     switched_all  : 1                             104     switched_all  : 1
105     enable_state  : enabled (2)                   105     enable_state  : enabled (2)
106     bypass_depth  : 0                             106     bypass_depth  : 0
107     nr_rejected   : 0                             107     nr_rejected   : 0
108     enable_seq    : 1                             108     enable_seq    : 1
109                                                   109 
110 If ``CONFIG_SCHED_DEBUG`` is set, whether a gi    110 If ``CONFIG_SCHED_DEBUG`` is set, whether a given task is on sched_ext can
111 be determined as follows:                         111 be determined as follows:
112                                                   112 
113 .. code-block:: none                              113 .. code-block:: none
114                                                   114 
115     # grep ext /proc/self/sched                   115     # grep ext /proc/self/sched
116     ext.enabled                                   116     ext.enabled                                  :                    1
117                                                   117 
118 The Basics                                        118 The Basics
119 ==========                                        119 ==========
120                                                   120 
121 Userspace can implement an arbitrary BPF sched    121 Userspace can implement an arbitrary BPF scheduler by loading a set of BPF
122 programs that implement ``struct sched_ext_ops    122 programs that implement ``struct sched_ext_ops``. The only mandatory field
123 is ``ops.name`` which must be a valid BPF obje    123 is ``ops.name`` which must be a valid BPF object name. All operations are
124 optional. The following modified excerpt is fr    124 optional. The following modified excerpt is from
125 ``tools/sched_ext/scx_simple.bpf.c`` showing a    125 ``tools/sched_ext/scx_simple.bpf.c`` showing a minimal global FIFO scheduler.
126                                                   126 
127 .. code-block:: c                                 127 .. code-block:: c
128                                                   128 
129     /*                                            129     /*
130      * Decide which CPU a task should be migra    130      * Decide which CPU a task should be migrated to before being
131      * enqueued (either at wakeup, fork time,     131      * enqueued (either at wakeup, fork time, or exec time). If an
132      * idle core is found by the default ops.s    132      * idle core is found by the default ops.select_cpu() implementation,
133      * then dispatch the task directly to SCX_    133      * then dispatch the task directly to SCX_DSQ_LOCAL and skip the
134      * ops.enqueue() callback.                    134      * ops.enqueue() callback.
135      *                                            135      *
136      * Note that this implementation has exact    136      * Note that this implementation has exactly the same behavior as the
137      * default ops.select_cpu implementation.     137      * default ops.select_cpu implementation. The behavior of the scheduler
138      * would be exactly same if the implementa    138      * would be exactly same if the implementation just didn't define the
139      * simple_select_cpu() struct_ops prog.       139      * simple_select_cpu() struct_ops prog.
140      */                                           140      */
141     s32 BPF_STRUCT_OPS(simple_select_cpu, stru    141     s32 BPF_STRUCT_OPS(simple_select_cpu, struct task_struct *p,
142                        s32 prev_cpu, u64 wake_    142                        s32 prev_cpu, u64 wake_flags)
143     {                                             143     {
144             s32 cpu;                              144             s32 cpu;
145             /* Need to initialize or the BPF v    145             /* Need to initialize or the BPF verifier will reject the program */
146             bool direct = false;                  146             bool direct = false;
147                                                   147 
148             cpu = scx_bpf_select_cpu_dfl(p, pr    148             cpu = scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, &direct);
149                                                   149 
150             if (direct)                           150             if (direct)
151                     scx_bpf_dispatch(p, SCX_DS    151                     scx_bpf_dispatch(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
152                                                   152 
153             return cpu;                           153             return cpu;
154     }                                             154     }
155                                                   155 
156     /*                                            156     /*
157      * Do a direct dispatch of a task to the g    157      * Do a direct dispatch of a task to the global DSQ. This ops.enqueue()
158      * callback will only be invoked if we fai    158      * callback will only be invoked if we failed to find a core to dispatch
159      * to in ops.select_cpu() above.              159      * to in ops.select_cpu() above.
160      *                                            160      *
161      * Note that this implementation has exact    161      * Note that this implementation has exactly the same behavior as the
162      * default ops.enqueue implementation, whi    162      * default ops.enqueue implementation, which just dispatches the task
163      * to SCX_DSQ_GLOBAL. The behavior of the     163      * to SCX_DSQ_GLOBAL. The behavior of the scheduler would be exactly same
164      * if the implementation just didn't defin    164      * if the implementation just didn't define the simple_enqueue struct_ops
165      * prog.                                      165      * prog.
166      */                                           166      */
167     void BPF_STRUCT_OPS(simple_enqueue, struct    167     void BPF_STRUCT_OPS(simple_enqueue, struct task_struct *p, u64 enq_flags)
168     {                                             168     {
169             scx_bpf_dispatch(p, SCX_DSQ_GLOBAL    169             scx_bpf_dispatch(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
170     }                                             170     }
171                                                   171 
172     s32 BPF_STRUCT_OPS_SLEEPABLE(simple_init)     172     s32 BPF_STRUCT_OPS_SLEEPABLE(simple_init)
173     {                                             173     {
174             /*                                    174             /*
175              * By default, all SCHED_EXT, SCHE    175              * By default, all SCHED_EXT, SCHED_OTHER, SCHED_IDLE, and
176              * SCHED_BATCH tasks should use sc    176              * SCHED_BATCH tasks should use sched_ext.
177              */                                   177              */
178             return 0;                             178             return 0;
179     }                                             179     }
180                                                   180 
181     void BPF_STRUCT_OPS(simple_exit, struct sc    181     void BPF_STRUCT_OPS(simple_exit, struct scx_exit_info *ei)
182     {                                             182     {
183             exit_type = ei->type;                 183             exit_type = ei->type;
184     }                                             184     }
185                                                   185 
186     SEC(".struct_ops")                            186     SEC(".struct_ops")
187     struct sched_ext_ops simple_ops = {           187     struct sched_ext_ops simple_ops = {
188             .select_cpu             = (void *)    188             .select_cpu             = (void *)simple_select_cpu,
189             .enqueue                = (void *)    189             .enqueue                = (void *)simple_enqueue,
190             .init                   = (void *)    190             .init                   = (void *)simple_init,
191             .exit                   = (void *)    191             .exit                   = (void *)simple_exit,
192             .name                   = "simple"    192             .name                   = "simple",
193     };                                            193     };
194                                                   194 
195 Dispatch Queues                                   195 Dispatch Queues
196 ---------------                                   196 ---------------
197                                                   197 
198 To match the impedance between the scheduler c    198 To match the impedance between the scheduler core and the BPF scheduler,
199 sched_ext uses DSQs (dispatch queues) which ca    199 sched_ext uses DSQs (dispatch queues) which can operate as both a FIFO and a
200 priority queue. By default, there is one globa    200 priority queue. By default, there is one global FIFO (``SCX_DSQ_GLOBAL``),
201 and one local dsq per CPU (``SCX_DSQ_LOCAL``).    201 and one local dsq per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage
202 an arbitrary number of dsq's using ``scx_bpf_c    202 an arbitrary number of dsq's using ``scx_bpf_create_dsq()`` and
203 ``scx_bpf_destroy_dsq()``.                        203 ``scx_bpf_destroy_dsq()``.
204                                                   204 
205 A CPU always executes a task from its local DS    205 A CPU always executes a task from its local DSQ. A task is "dispatched" to a
206 DSQ. A non-local DSQ is "consumed" to transfer    206 DSQ. A non-local DSQ is "consumed" to transfer a task to the consuming CPU's
207 local DSQ.                                        207 local DSQ.
208                                                   208 
209 When a CPU is looking for the next task to run    209 When a CPU is looking for the next task to run, if the local DSQ is not
210 empty, the first task is picked. Otherwise, th    210 empty, the first task is picked. Otherwise, the CPU tries to consume the
211 global DSQ. If that doesn't yield a runnable t    211 global DSQ. If that doesn't yield a runnable task either, ``ops.dispatch()``
212 is invoked.                                       212 is invoked.
213                                                   213 
214 Scheduling Cycle                                  214 Scheduling Cycle
215 ----------------                                  215 ----------------
216                                                   216 
217 The following briefly shows how a waking task     217 The following briefly shows how a waking task is scheduled and executed.
218                                                   218 
219 1. When a task is waking up, ``ops.select_cpu(    219 1. When a task is waking up, ``ops.select_cpu()`` is the first operation
220    invoked. This serves two purposes. First, C    220    invoked. This serves two purposes. First, CPU selection optimization
221    hint. Second, waking up the selected CPU if    221    hint. Second, waking up the selected CPU if idle.
222                                                   222 
223    The CPU selected by ``ops.select_cpu()`` is    223    The CPU selected by ``ops.select_cpu()`` is an optimization hint and not
224    binding. The actual decision is made at the    224    binding. The actual decision is made at the last step of scheduling.
225    However, there is a small performance gain     225    However, there is a small performance gain if the CPU
226    ``ops.select_cpu()`` returns matches the CP    226    ``ops.select_cpu()`` returns matches the CPU the task eventually runs on.
227                                                   227 
228    A side-effect of selecting a CPU is waking     228    A side-effect of selecting a CPU is waking it up from idle. While a BPF
229    scheduler can wake up any cpu using the ``s    229    scheduler can wake up any cpu using the ``scx_bpf_kick_cpu()`` helper,
230    using ``ops.select_cpu()`` judiciously can     230    using ``ops.select_cpu()`` judiciously can be simpler and more efficient.
231                                                   231 
232    A task can be immediately dispatched to a D    232    A task can be immediately dispatched to a DSQ from ``ops.select_cpu()`` by
233    calling ``scx_bpf_dispatch()``. If the task    233    calling ``scx_bpf_dispatch()``. If the task is dispatched to
234    ``SCX_DSQ_LOCAL`` from ``ops.select_cpu()``    234    ``SCX_DSQ_LOCAL`` from ``ops.select_cpu()``, it will be dispatched to the
235    local DSQ of whichever CPU is returned from    235    local DSQ of whichever CPU is returned from ``ops.select_cpu()``.
236    Additionally, dispatching directly from ``o    236    Additionally, dispatching directly from ``ops.select_cpu()`` will cause the
237    ``ops.enqueue()`` callback to be skipped.      237    ``ops.enqueue()`` callback to be skipped.
238                                                   238 
239    Note that the scheduler core will ignore an    239    Note that the scheduler core will ignore an invalid CPU selection, for
240    example, if it's outside the allowed cpumas    240    example, if it's outside the allowed cpumask of the task.
241                                                   241 
242 2. Once the target CPU is selected, ``ops.enqu    242 2. Once the target CPU is selected, ``ops.enqueue()`` is invoked (unless the
243    task was dispatched directly from ``ops.sel    243    task was dispatched directly from ``ops.select_cpu()``). ``ops.enqueue()``
244    can make one of the following decisions:       244    can make one of the following decisions:
245                                                   245 
246    * Immediately dispatch the task to either t    246    * Immediately dispatch the task to either the global or local DSQ by
247      calling ``scx_bpf_dispatch()`` with ``SCX    247      calling ``scx_bpf_dispatch()`` with ``SCX_DSQ_GLOBAL`` or
248      ``SCX_DSQ_LOCAL``, respectively.             248      ``SCX_DSQ_LOCAL``, respectively.
249                                                   249 
250    * Immediately dispatch the task to a custom    250    * Immediately dispatch the task to a custom DSQ by calling
251      ``scx_bpf_dispatch()`` with a DSQ ID whic    251      ``scx_bpf_dispatch()`` with a DSQ ID which is smaller than 2^63.
252                                                   252 
253    * Queue the task on the BPF side.              253    * Queue the task on the BPF side.
254                                                   254 
255 3. When a CPU is ready to schedule, it first l    255 3. When a CPU is ready to schedule, it first looks at its local DSQ. If
256    empty, it then looks at the global DSQ. If     256    empty, it then looks at the global DSQ. If there still isn't a task to
257    run, ``ops.dispatch()`` is invoked which ca    257    run, ``ops.dispatch()`` is invoked which can use the following two
258    functions to populate the local DSQ.           258    functions to populate the local DSQ.
259                                                   259 
260    * ``scx_bpf_dispatch()`` dispatches a task     260    * ``scx_bpf_dispatch()`` dispatches a task to a DSQ. Any target DSQ can
261      be used - ``SCX_DSQ_LOCAL``, ``SCX_DSQ_LO    261      be used - ``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``,
262      ``SCX_DSQ_GLOBAL`` or a custom DSQ. While    262      ``SCX_DSQ_GLOBAL`` or a custom DSQ. While ``scx_bpf_dispatch()``
263      currently can't be called with BPF locks     263      currently can't be called with BPF locks held, this is being worked on
264      and will be supported. ``scx_bpf_dispatch    264      and will be supported. ``scx_bpf_dispatch()`` schedules dispatching
265      rather than performing them immediately.     265      rather than performing them immediately. There can be up to
266      ``ops.dispatch_max_batch`` pending tasks.    266      ``ops.dispatch_max_batch`` pending tasks.
267                                                   267 
268    * ``scx_bpf_consume()`` tranfers a task fro    268    * ``scx_bpf_consume()`` tranfers a task from the specified non-local DSQ
269      to the dispatching DSQ. This function can    269      to the dispatching DSQ. This function cannot be called with any BPF
270      locks held. ``scx_bpf_consume()`` flushes    270      locks held. ``scx_bpf_consume()`` flushes the pending dispatched tasks
271      before trying to consume the specified DS    271      before trying to consume the specified DSQ.
272                                                   272 
273 4. After ``ops.dispatch()`` returns, if there     273 4. After ``ops.dispatch()`` returns, if there are tasks in the local DSQ,
274    the CPU runs the first one. If empty, the f    274    the CPU runs the first one. If empty, the following steps are taken:
275                                                   275 
276    * Try to consume the global DSQ. If success    276    * Try to consume the global DSQ. If successful, run the task.
277                                                   277 
278    * If ``ops.dispatch()`` has dispatched any     278    * If ``ops.dispatch()`` has dispatched any tasks, retry #3.
279                                                   279 
280    * If the previous task is an SCX task and s    280    * If the previous task is an SCX task and still runnable, keep executing
281      it (see ``SCX_OPS_ENQ_LAST``).               281      it (see ``SCX_OPS_ENQ_LAST``).
282                                                   282 
283    * Go idle.                                     283    * Go idle.
284                                                   284 
285 Note that the BPF scheduler can always choose     285 Note that the BPF scheduler can always choose to dispatch tasks immediately
286 in ``ops.enqueue()`` as illustrated in the abo    286 in ``ops.enqueue()`` as illustrated in the above simple example. If only the
287 built-in DSQs are used, there is no need to im    287 built-in DSQs are used, there is no need to implement ``ops.dispatch()`` as
288 a task is never queued on the BPF scheduler an    288 a task is never queued on the BPF scheduler and both the local and global
289 DSQs are consumed automatically.                  289 DSQs are consumed automatically.
290                                                   290 
291 ``scx_bpf_dispatch()`` queues the task on the     291 ``scx_bpf_dispatch()`` queues the task on the FIFO of the target DSQ. Use
292 ``scx_bpf_dispatch_vtime()`` for the priority     292 ``scx_bpf_dispatch_vtime()`` for the priority queue. Internal DSQs such as
293 ``SCX_DSQ_LOCAL`` and ``SCX_DSQ_GLOBAL`` do no    293 ``SCX_DSQ_LOCAL`` and ``SCX_DSQ_GLOBAL`` do not support priority-queue
294 dispatching, and must be dispatched to with ``    294 dispatching, and must be dispatched to with ``scx_bpf_dispatch()``.  See the
295 function documentation and usage in ``tools/sc    295 function documentation and usage in ``tools/sched_ext/scx_simple.bpf.c`` for
296 more information.                                 296 more information.
297                                                   297 
298 Where to Look                                     298 Where to Look
299 =============                                     299 =============
300                                                   300 
301 * ``include/linux/sched/ext.h`` defines the co    301 * ``include/linux/sched/ext.h`` defines the core data structures, ops table
302   and constants.                                  302   and constants.
303                                                   303 
304 * ``kernel/sched/ext.c`` contains sched_ext co    304 * ``kernel/sched/ext.c`` contains sched_ext core implementation and helpers.
305   The functions prefixed with ``scx_bpf_`` can    305   The functions prefixed with ``scx_bpf_`` can be called from the BPF
306   scheduler.                                      306   scheduler.
307                                                   307 
308 * ``tools/sched_ext/`` hosts example BPF sched    308 * ``tools/sched_ext/`` hosts example BPF scheduler implementations.
309                                                   309 
310   * ``scx_simple[.bpf].c``: Minimal global FIF    310   * ``scx_simple[.bpf].c``: Minimal global FIFO scheduler example using a
311     custom DSQ.                                   311     custom DSQ.
312                                                   312 
313   * ``scx_qmap[.bpf].c``: A multi-level FIFO s    313   * ``scx_qmap[.bpf].c``: A multi-level FIFO scheduler supporting five
314     levels of priority implemented with ``BPF_    314     levels of priority implemented with ``BPF_MAP_TYPE_QUEUE``.
315                                                   315 
316 ABI Instability                                   316 ABI Instability
317 ===============                                   317 ===============
318                                                   318 
319 The APIs provided by sched_ext to BPF schedule    319 The APIs provided by sched_ext to BPF schedulers programs have no stability
320 guarantees. This includes the ops table callba    320 guarantees. This includes the ops table callbacks and constants defined in
321 ``include/linux/sched/ext.h``, as well as the     321 ``include/linux/sched/ext.h``, as well as the ``scx_bpf_`` kfuncs defined in
322 ``kernel/sched/ext.c``.                           322 ``kernel/sched/ext.c``.
323                                                   323 
324 While we will attempt to provide a relatively     324 While we will attempt to provide a relatively stable API surface when
325 possible, they are subject to change without w    325 possible, they are subject to change without warning between kernel
326 versions.                                         326 versions.
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php