1 .. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) 2 .. [see the bottom of this file for redistribution information] 3 4 Reporting regressions 5 +++++++++++++++++++++ 6 7 "*We don't cause regressions*" is the first rule of Linux kernel development; 8 Linux founder and lead developer Linus Torvalds established it himself and 9 ensures it's obeyed. 10 11 This document describes what the rule means for users and how the Linux kernel's 12 development model ensures to address all reported regressions; aspects relevant 13 for kernel developers are left to Documentation/process/handling-regressions.rst. 14 15 16 The important bits (aka "TL;DR") 17 ================================ 18 19 #. It's a regression if something running fine with one Linux kernel works worse 20 or not at all with a newer version. Note, the newer kernel has to be compiled 21 using a similar configuration; the detailed explanations below describes this 22 and other fine print in more detail. 23 24 #. Report your issue as outlined in Documentation/admin-guide/reporting-issues.rst, 25 it already covers all aspects important for regressions and repeated 26 below for convenience. Two of them are important: start your report's subject 27 with "[REGRESSION]" and CC or forward it to `the regression mailing list 28 <https://lore.kernel.org/regressions/>`_ (regressions@lists.linux.dev). 29 30 #. Optional, but recommended: when sending or forwarding your report, make the 31 Linux kernel regression tracking bot "regzbot" track the issue by specifying 32 when the regression started like this:: 33 34 #regzbot introduced: v5.13..v5.14-rc1 35 36 37 All the details on Linux kernel regressions relevant for users 38 ============================================================== 39 40 41 The important basics 42 -------------------- 43 44 45 What is a "regression" and what is the "no regressions" rule? 46 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 47 48 It's a regression if some application or practical use case running fine with 49 one Linux kernel works worse or not at all with a newer version compiled using a 50 similar configuration. The "no regressions" rule forbids this to take place; if 51 it happens by accident, developers that caused it are expected to quickly fix 52 the issue. 53 54 It thus is a regression when a WiFi driver from Linux 5.13 works fine, but with 55 5.14 doesn't work at all, works significantly slower, or misbehaves somehow. 56 It's also a regression if a perfectly working application suddenly shows erratic 57 behavior with a newer kernel version; such issues can be caused by changes in 58 procfs, sysfs, or one of the many other interfaces Linux provides to userland 59 software. But keep in mind, as mentioned earlier: 5.14 in this example needs to 60 be built from a configuration similar to the one from 5.13. This can be achieved 61 using ``make olddefconfig``, as explained in more detail below. 62 63 Note the "practical use case" in the first sentence of this section: developers 64 despite the "no regressions" rule are free to change any aspect of the kernel 65 and even APIs or ABIs to userland, as long as no existing application or use 66 case breaks. 67 68 Also be aware the "no regressions" rule covers only interfaces the kernel 69 provides to the userland. It thus does not apply to kernel-internal interfaces 70 like the module API, which some externally developed drivers use to hook into 71 the kernel. 72 73 How do I report a regression? 74 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 75 76 Just report the issue as outlined in 77 Documentation/admin-guide/reporting-issues.rst, it already describes the 78 important points. The following aspects outlined there are especially relevant 79 for regressions: 80 81 * When checking for existing reports to join, also search the `archives of the 82 Linux regressions mailing list <https://lore.kernel.org/regressions/>`_ and 83 `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_. 84 85 * Start your report's subject with "[REGRESSION]". 86 87 * In your report, clearly mention the last kernel version that worked fine and 88 the first broken one. Ideally try to find the exact change causing the 89 regression using a bisection, as explained below in more detail. 90 91 * Remember to let the Linux regressions mailing list 92 (regressions@lists.linux.dev) know about your report: 93 94 * If you report the regression by mail, CC the regressions list. 95 96 * If you report your regression to some bug tracker, forward the submitted 97 report by mail to the regressions list while CCing the maintainer and the 98 mailing list for the subsystem in question. 99 100 If it's a regression within a stable or longterm series (e.g. 101 v5.15.3..v5.15.5), remember to CC the `Linux stable mailing list 102 <https://lore.kernel.org/stable/>`_ (stable@vger.kernel.org). 103 104 In case you performed a successful bisection, add everyone to the CC the 105 culprit's commit message mentions in lines starting with "Signed-off-by:". 106 107 When CCing for forwarding your report to the list, consider directly telling the 108 aforementioned Linux kernel regression tracking bot about your report. To do 109 that, include a paragraph like this in your mail:: 110 111 #regzbot introduced: v5.13..v5.14-rc1 112 113 Regzbot will then consider your mail a report for a regression introduced in the 114 specified version range. In above case Linux v5.13 still worked fine and Linux 115 v5.14-rc1 was the first version where you encountered the issue. If you 116 performed a bisection to find the commit that caused the regression, specify the 117 culprit's commit-id instead:: 118 119 #regzbot introduced: 1f2e3d4c5d 120 121 Placing such a "regzbot command" is in your interest, as it will ensure the 122 report won't fall through the cracks unnoticed. If you omit this, the Linux 123 kernel's regressions tracker will take care of telling regzbot about your 124 regression, as long as you send a copy to the regressions mailing lists. But the 125 regression tracker is just one human which sometimes has to rest or occasionally 126 might even enjoy some time away from computers (as crazy as that might sound). 127 Relying on this person thus will result in an unnecessary delay before the 128 regressions becomes mentioned `on the list of tracked and unresolved Linux 129 kernel regressions <https://linux-regtracking.leemhuis.info/regzbot/>`_ and the 130 weekly regression reports sent by regzbot. Such delays can result in Linus 131 Torvalds being unaware of important regressions when deciding between "continue 132 development or call this finished and release the final?". 133 134 Are really all regressions fixed? 135 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 136 137 Nearly all of them are, as long as the change causing the regression (the 138 "culprit commit") is reliably identified. Some regressions can be fixed without 139 this, but often it's required. 140 141 Who needs to find the root cause of a regression? 142 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 143 144 Developers of the affected code area should try to locate the culprit on their 145 own. But for them that's often impossible to do with reasonable effort, as quite 146 a lot of issues only occur in a particular environment outside the developer's 147 reach -- for example, a specific hardware platform, firmware, Linux distro, 148 system's configuration, or application. That's why in the end it's often up to 149 the reporter to locate the culprit commit; sometimes users might even need to 150 run additional tests afterwards to pinpoint the exact root cause. Developers 151 should offer advice and reasonably help where they can, to make this process 152 relatively easy and achievable for typical users. 153 154 How can I find the culprit? 155 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 156 157 Perform a bisection, as roughly outlined in 158 Documentation/admin-guide/reporting-issues.rst and described in more detail by 159 Documentation/admin-guide/bug-bisect.rst. It might sound like a lot of work, but 160 in many cases finds the culprit relatively quickly. If it's hard or 161 time-consuming to reliably reproduce the issue, consider teaming up with other 162 affected users to narrow down the search range together. 163 164 Who can I ask for advice when it comes to regressions? 165 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 166 167 Send a mail to the regressions mailing list (regressions@lists.linux.dev) while 168 CCing the Linux kernel's regression tracker (regressions@leemhuis.info); if the 169 issue might better be dealt with in private, feel free to omit the list. 170 171 172 Additional details about regressions 173 ------------------------------------ 174 175 176 What is the goal of the "no regressions" rule? 177 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 178 179 Users should feel safe when updating kernel versions and not have to worry 180 something might break. This is in the interest of the kernel developers to make 181 updating attractive: they don't want users to stay on stable or longterm Linux 182 series that are either abandoned or more than one and a half years old. That's 183 in everybody's interest, as `those series might have known bugs, security 184 issues, or other problematic aspects already fixed in later versions 185 <http://www.kroah.com/log/blog/2018/08/24/what-stable-kernel-should-i-use/>`_. 186 Additionally, the kernel developers want to make it simple and appealing for 187 users to test the latest pre-release or regular release. That's also in 188 everybody's interest, as it's a lot easier to track down and fix problems, if 189 they are reported shortly after being introduced. 190 191 Is the "no regressions" rule really adhered in practice? 192 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 193 194 It's taken really seriously, as can be seen by many mailing list posts from 195 Linux creator and lead developer Linus Torvalds, some of which are quoted in 196 Documentation/process/handling-regressions.rst. 197 198 Exceptions to this rule are extremely rare; in the past developers almost always 199 turned out to be wrong when they assumed a particular situation was warranting 200 an exception. 201 202 Who ensures the "no regressions" rule is actually followed? 203 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 204 205 The subsystem maintainers should take care of that, which are watched and 206 supported by the tree maintainers -- e.g. Linus Torvalds for mainline and 207 Greg Kroah-Hartman et al. for various stable/longterm series. 208 209 All of them are helped by people trying to ensure no regression report falls 210 through the cracks. One of them is Thorsten Leemhuis, who's currently acting as 211 the Linux kernel's "regressions tracker"; to facilitate this work he relies on 212 regzbot, the Linux kernel regression tracking bot. That's why you want to bring 213 your report on the radar of these people by CCing or forwarding each report to 214 the regressions mailing list, ideally with a "regzbot command" in your mail to 215 get it tracked immediately. 216 217 How quickly are regressions normally fixed? 218 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 219 220 Developers should fix any reported regression as quickly as possible, to provide 221 affected users with a solution in a timely manner and prevent more users from 222 running into the issue; nevertheless developers need to take enough time and 223 care to ensure regression fixes do not cause additional damage. 224 225 The answer thus depends on various factors like the impact of a regression, its 226 age, or the Linux series in which it occurs. In the end though, most regressions 227 should be fixed within two weeks. 228 229 Is it a regression, if the issue can be avoided by updating some software? 230 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 231 232 Almost always: yes. If a developer tells you otherwise, ask the regression 233 tracker for advice as outlined above. 234 235 Is it a regression, if a newer kernel works slower or consumes more energy? 236 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 237 238 Yes, but the difference has to be significant. A five percent slow-down in a 239 micro-benchmark thus is unlikely to qualify as regression, unless it also 240 influences the results of a broad benchmark by more than one percent. If in 241 doubt, ask for advice. 242 243 Is it a regression, if an external kernel module breaks when updating Linux? 244 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 245 246 No, as the "no regression" rule is about interfaces and services the Linux 247 kernel provides to the userland. It thus does not cover building or running 248 externally developed kernel modules, as they run in kernel-space and hook into 249 the kernel using internal interfaces occasionally changed. 250 251 How are regressions handled that are caused by security fixes? 252 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 253 254 In extremely rare situations security issues can't be fixed without causing 255 regressions; those fixes are given way, as they are the lesser evil in the end. 256 Luckily this middling almost always can be avoided, as key developers for the 257 affected area and often Linus Torvalds himself try very hard to fix security 258 issues without causing regressions. 259 260 If you nevertheless face such a case, check the mailing list archives if people 261 tried their best to avoid the regression. If not, report it; if in doubt, ask 262 for advice as outlined above. 263 264 What happens if fixing a regression is impossible without causing another? 265 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 266 267 Sadly these things happen, but luckily not very often; if they occur, expert 268 developers of the affected code area should look into the issue to find a fix 269 that avoids regressions or at least their impact. If you run into such a 270 situation, do what was outlined already for regressions caused by security 271 fixes: check earlier discussions if people already tried their best and ask for 272 advice if in doubt. 273 274 A quick note while at it: these situations could be avoided, if people would 275 regularly give mainline pre-releases (say v5.15-rc1 or -rc3) from each 276 development cycle a test run. This is best explained by imagining a change 277 integrated between Linux v5.14 and v5.15-rc1 which causes a regression, but at 278 the same time is a hard requirement for some other improvement applied for 279 5.15-rc1. All these changes often can simply be reverted and the regression thus 280 solved, if someone finds and reports it before 5.15 is released. A few days or 281 weeks later this solution can become impossible, as some software might have 282 started to rely on aspects introduced by one of the follow-up changes: reverting 283 all changes would then cause a regression for users of said software and thus is 284 out of the question. 285 286 Is it a regression, if some feature I relied on was removed months ago? 287 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 288 289 It is, but often it's hard to fix such regressions due to the aspects outlined 290 in the previous section. It hence needs to be dealt with on a case-by-case 291 basis. This is another reason why it's in everybody's interest to regularly test 292 mainline pre-releases. 293 294 Does the "no regression" rule apply if I seem to be the only affected person? 295 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 296 297 It does, but only for practical usage: the Linux developers want to be free to 298 remove support for hardware only to be found in attics and museums anymore. 299 300 Note, sometimes regressions can't be avoided to make progress -- and the latter 301 is needed to prevent Linux from stagnation. Hence, if only very few users seem 302 to be affected by a regression, it for the greater good might be in their and 303 everyone else's interest to lettings things pass. Especially if there is an 304 easy way to circumvent the regression somehow, for example by updating some 305 software or using a kernel parameter created just for this purpose. 306 307 Does the regression rule apply for code in the staging tree as well? 308 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 309 310 Not according to the `help text for the configuration option covering all 311 staging code <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/Kconfig>`_, 312 which since its early days states:: 313 314 Please note that these drivers are under heavy development, may or 315 may not work, and may contain userspace interfaces that most likely 316 will be changed in the near future. 317 318 The staging developers nevertheless often adhere to the "no regressions" rule, 319 but sometimes bend it to make progress. That's for example why some users had to 320 deal with (often negligible) regressions when a WiFi driver from the staging 321 tree was replaced by a totally different one written from scratch. 322 323 Why do later versions have to be "compiled with a similar configuration"? 324 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 325 326 Because the Linux kernel developers sometimes integrate changes known to cause 327 regressions, but make them optional and disable them in the kernel's default 328 configuration. This trick allows progress, as the "no regressions" rule 329 otherwise would lead to stagnation. 330 331 Consider for example a new security feature blocking access to some kernel 332 interfaces often abused by malware, which at the same time are required to run a 333 few rarely used applications. The outlined approach makes both camps happy: 334 people using these applications can leave the new security feature off, while 335 everyone else can enable it without running into trouble. 336 337 How to create a configuration similar to the one of an older kernel? 338 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 339 340 Start your machine with a known-good kernel and configure the newer Linux 341 version with ``make olddefconfig``. This makes the kernel's build scripts pick 342 up the configuration file (the ".config" file) from the running kernel as base 343 for the new one you are about to compile; afterwards they set all new 344 configuration options to their default value, which should disable new features 345 that might cause regressions. 346 347 Can I report a regression I found with pre-compiled vanilla kernels? 348 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 349 350 You need to ensure the newer kernel was compiled with a similar configuration 351 file as the older one (see above), as those that built them might have enabled 352 some known-to-be incompatible feature for the newer kernel. If in doubt, report 353 the matter to the kernel's provider and ask for advice. 354 355 356 More about regression tracking with "regzbot" 357 --------------------------------------------- 358 359 What is regression tracking and why should I care about it? 360 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 361 362 Rules like "no regressions" need someone to ensure they are followed, otherwise 363 they are broken either accidentally or on purpose. History has shown this to be 364 true for Linux kernel development as well. That's why Thorsten Leemhuis, the 365 Linux Kernel's regression tracker, and some people try to ensure all regression 366 are fixed by keeping an eye on them until they are resolved. Neither of them are 367 paid for this, that's why the work is done on a best effort basis. 368 369 Why and how are Linux kernel regressions tracked using a bot? 370 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 371 372 Tracking regressions completely manually has proven to be quite hard due to the 373 distributed and loosely structured nature of Linux kernel development process. 374 That's why the Linux kernel's regression tracker developed regzbot to facilitate 375 the work, with the long term goal to automate regression tracking as much as 376 possible for everyone involved. 377 378 Regzbot works by watching for replies to reports of tracked regressions. 379 Additionally, it's looking out for posted or committed patches referencing such 380 reports with "Link:" tags; replies to such patch postings are tracked as well. 381 Combined this data provides good insights into the current state of the fixing 382 process. 383 384 How to see which regressions regzbot tracks currently? 385 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 386 387 Check out `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_. 388 389 What kind of issues are supposed to be tracked by regzbot? 390 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 391 392 The bot is meant to track regressions, hence please don't involve regzbot for 393 regular issues. But it's okay for the Linux kernel's regression tracker if you 394 involve regzbot to track severe issues, like reports about hangs, corrupted 395 data, or internal errors (Panic, Oops, BUG(), warning, ...). 396 397 How to change aspects of a tracked regression? 398 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 399 400 By using a 'regzbot command' in a direct or indirect reply to the mail with the 401 report. The easiest way to do that: find the report in your "Sent" folder or the 402 mailing list archive and reply to it using your mailer's "Reply-all" function. 403 In that mail, use one of the following commands in a stand-alone paragraph (IOW: 404 use blank lines to separate one or multiple of these commands from the rest of 405 the mail's text). 406 407 * Update when the regression started to happen, for example after performing a 408 bisection:: 409 410 #regzbot introduced: 1f2e3d4c5d 411 412 * Set or update the title:: 413 414 #regzbot title: foo 415 416 * Monitor a discussion or bugzilla.kernel.org ticket where additions aspects of 417 the issue or a fix are discussed::: 418 419 #regzbot monitor: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/ 420 #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=123456789 421 422 * Point to a place with further details of interest, like a mailing list post 423 or a ticket in a bug tracker that are slightly related, but about a different 424 topic:: 425 426 #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=123456789 427 428 * Mark a regression as invalid:: 429 430 #regzbot invalid: wasn't a regression, problem has always existed 431 432 Regzbot supports a few other commands primarily used by developers or people 433 tracking regressions. They and more details about the aforementioned regzbot 434 commands can be found in the `getting started guide 435 <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md>`_ and 436 the `reference documentation <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md>`_ 437 for regzbot. 438 439 .. 440 end-of-content 441 .. 442 This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top 443 of the file. If you want to distribute this text under CC-BY-4.0 only, 444 please use "The Linux kernel developers" for author attribution and link 445 this as source: 446 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-regressions.rst 447 .. 448 Note: Only the content of this RST file as found in the Linux kernel sources 449 is available under CC-BY-4.0, as versions of this text that were processed 450 (for example by the kernel's build system) might contain content taken from 451 files which use a more restrictive license.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.