1 This file has moved to process/coding-style.rs !! 1 .. _codingstyle: >> 2 >> 3 Linux kernel coding style >> 4 ========================= >> 5 >> 6 This is a short document describing the preferred coding style for the >> 7 linux kernel. Coding style is very personal, and I won't **force** my >> 8 views on anybody, but this is what goes for anything that I have to be >> 9 able to maintain, and I'd prefer it for most other things too. Please >> 10 at least consider the points made here. >> 11 >> 12 First off, I'd suggest printing out a copy of the GNU coding standards, >> 13 and NOT read it. Burn them, it's a great symbolic gesture. >> 14 >> 15 Anyway, here goes: >> 16 >> 17 >> 18 1) Indentation >> 19 -------------- >> 20 >> 21 Tabs are 8 characters, and thus indentations are also 8 characters. >> 22 There are heretic movements that try to make indentations 4 (or even 2!) >> 23 characters deep, and that is akin to trying to define the value of PI to >> 24 be 3. >> 25 >> 26 Rationale: The whole idea behind indentation is to clearly define where >> 27 a block of control starts and ends. Especially when you've been looking >> 28 at your screen for 20 straight hours, you'll find it a lot easier to see >> 29 how the indentation works if you have large indentations. >> 30 >> 31 Now, some people will claim that having 8-character indentations makes >> 32 the code move too far to the right, and makes it hard to read on a >> 33 80-character terminal screen. The answer to that is that if you need >> 34 more than 3 levels of indentation, you're screwed anyway, and should fix >> 35 your program. >> 36 >> 37 In short, 8-char indents make things easier to read, and have the added >> 38 benefit of warning you when you're nesting your functions too deep. >> 39 Heed that warning. >> 40 >> 41 The preferred way to ease multiple indentation levels in a switch statement is >> 42 to align the ``switch`` and its subordinate ``case`` labels in the same column >> 43 instead of ``double-indenting`` the ``case`` labels. E.g.: >> 44 >> 45 .. code-block:: c >> 46 >> 47 switch (suffix) { >> 48 case 'G': >> 49 case 'g': >> 50 mem <<= 30; >> 51 break; >> 52 case 'M': >> 53 case 'm': >> 54 mem <<= 20; >> 55 break; >> 56 case 'K': >> 57 case 'k': >> 58 mem <<= 10; >> 59 /* fall through */ >> 60 default: >> 61 break; >> 62 } >> 63 >> 64 Don't put multiple statements on a single line unless you have >> 65 something to hide: >> 66 >> 67 .. code-block:: c >> 68 >> 69 if (condition) do_this; >> 70 do_something_everytime; >> 71 >> 72 Don't put multiple assignments on a single line either. Kernel coding style >> 73 is super simple. Avoid tricky expressions. >> 74 >> 75 Outside of comments, documentation and except in Kconfig, spaces are never >> 76 used for indentation, and the above example is deliberately broken. >> 77 >> 78 Get a decent editor and don't leave whitespace at the end of lines. >> 79 >> 80 >> 81 2) Breaking long lines and strings >> 82 ---------------------------------- >> 83 >> 84 Coding style is all about readability and maintainability using commonly >> 85 available tools. >> 86 >> 87 The limit on the length of lines is 80 columns and this is a strongly >> 88 preferred limit. >> 89 >> 90 Statements longer than 80 columns will be broken into sensible chunks, unless >> 91 exceeding 80 columns significantly increases readability and does not hide >> 92 information. Descendants are always substantially shorter than the parent and >> 93 are placed substantially to the right. The same applies to function headers >> 94 with a long argument list. However, never break user-visible strings such as >> 95 printk messages, because that breaks the ability to grep for them. >> 96 >> 97 >> 98 3) Placing Braces and Spaces >> 99 ---------------------------- >> 100 >> 101 The other issue that always comes up in C styling is the placement of >> 102 braces. Unlike the indent size, there are few technical reasons to >> 103 choose one placement strategy over the other, but the preferred way, as >> 104 shown to us by the prophets Kernighan and Ritchie, is to put the opening >> 105 brace last on the line, and put the closing brace first, thusly: >> 106 >> 107 .. code-block:: c >> 108 >> 109 if (x is true) { >> 110 we do y >> 111 } >> 112 >> 113 This applies to all non-function statement blocks (if, switch, for, >> 114 while, do). E.g.: >> 115 >> 116 .. code-block:: c >> 117 >> 118 switch (action) { >> 119 case KOBJ_ADD: >> 120 return "add"; >> 121 case KOBJ_REMOVE: >> 122 return "remove"; >> 123 case KOBJ_CHANGE: >> 124 return "change"; >> 125 default: >> 126 return NULL; >> 127 } >> 128 >> 129 However, there is one special case, namely functions: they have the >> 130 opening brace at the beginning of the next line, thus: >> 131 >> 132 .. code-block:: c >> 133 >> 134 int function(int x) >> 135 { >> 136 body of function >> 137 } >> 138 >> 139 Heretic people all over the world have claimed that this inconsistency >> 140 is ... well ... inconsistent, but all right-thinking people know that >> 141 (a) K&R are **right** and (b) K&R are right. Besides, functions are >> 142 special anyway (you can't nest them in C). >> 143 >> 144 Note that the closing brace is empty on a line of its own, **except** in >> 145 the cases where it is followed by a continuation of the same statement, >> 146 ie a ``while`` in a do-statement or an ``else`` in an if-statement, like >> 147 this: >> 148 >> 149 .. code-block:: c >> 150 >> 151 do { >> 152 body of do-loop >> 153 } while (condition); >> 154 >> 155 and >> 156 >> 157 .. code-block:: c >> 158 >> 159 if (x == y) { >> 160 .. >> 161 } else if (x > y) { >> 162 ... >> 163 } else { >> 164 .... >> 165 } >> 166 >> 167 Rationale: K&R. >> 168 >> 169 Also, note that this brace-placement also minimizes the number of empty >> 170 (or almost empty) lines, without any loss of readability. Thus, as the >> 171 supply of new-lines on your screen is not a renewable resource (think >> 172 25-line terminal screens here), you have more empty lines to put >> 173 comments on. >> 174 >> 175 Do not unnecessarily use braces where a single statement will do. >> 176 >> 177 .. code-block:: c >> 178 >> 179 if (condition) >> 180 action(); >> 181 >> 182 and >> 183 >> 184 .. code-block:: none >> 185 >> 186 if (condition) >> 187 do_this(); >> 188 else >> 189 do_that(); >> 190 >> 191 This does not apply if only one branch of a conditional statement is a single >> 192 statement; in the latter case use braces in both branches: >> 193 >> 194 .. code-block:: c >> 195 >> 196 if (condition) { >> 197 do_this(); >> 198 do_that(); >> 199 } else { >> 200 otherwise(); >> 201 } >> 202 >> 203 3.1) Spaces >> 204 *********** >> 205 >> 206 Linux kernel style for use of spaces depends (mostly) on >> 207 function-versus-keyword usage. Use a space after (most) keywords. The >> 208 notable exceptions are sizeof, typeof, alignof, and __attribute__, which look >> 209 somewhat like functions (and are usually used with parentheses in Linux, >> 210 although they are not required in the language, as in: ``sizeof info`` after >> 211 ``struct fileinfo info;`` is declared). >> 212 >> 213 So use a space after these keywords:: >> 214 >> 215 if, switch, case, for, do, while >> 216 >> 217 but not with sizeof, typeof, alignof, or __attribute__. E.g., >> 218 >> 219 .. code-block:: c >> 220 >> 221 >> 222 s = sizeof(struct file); >> 223 >> 224 Do not add spaces around (inside) parenthesized expressions. This example is >> 225 **bad**: >> 226 >> 227 .. code-block:: c >> 228 >> 229 >> 230 s = sizeof( struct file ); >> 231 >> 232 When declaring pointer data or a function that returns a pointer type, the >> 233 preferred use of ``*`` is adjacent to the data name or function name and not >> 234 adjacent to the type name. Examples: >> 235 >> 236 .. code-block:: c >> 237 >> 238 >> 239 char *linux_banner; >> 240 unsigned long long memparse(char *ptr, char **retptr); >> 241 char *match_strdup(substring_t *s); >> 242 >> 243 Use one space around (on each side of) most binary and ternary operators, >> 244 such as any of these:: >> 245 >> 246 = + - < > * / % | & ^ <= >= == != ? : >> 247 >> 248 but no space after unary operators:: >> 249 >> 250 & * + - ~ ! sizeof typeof alignof __attribute__ defined >> 251 >> 252 no space before the postfix increment & decrement unary operators:: >> 253 >> 254 ++ -- >> 255 >> 256 no space after the prefix increment & decrement unary operators:: >> 257 >> 258 ++ -- >> 259 >> 260 and no space around the ``.`` and ``->`` structure member operators. >> 261 >> 262 Do not leave trailing whitespace at the ends of lines. Some editors with >> 263 ``smart`` indentation will insert whitespace at the beginning of new lines as >> 264 appropriate, so you can start typing the next line of code right away. >> 265 However, some such editors do not remove the whitespace if you end up not >> 266 putting a line of code there, such as if you leave a blank line. As a result, >> 267 you end up with lines containing trailing whitespace. >> 268 >> 269 Git will warn you about patches that introduce trailing whitespace, and can >> 270 optionally strip the trailing whitespace for you; however, if applying a series >> 271 of patches, this may make later patches in the series fail by changing their >> 272 context lines. >> 273 >> 274 >> 275 4) Naming >> 276 --------- >> 277 >> 278 C is a Spartan language, and so should your naming be. Unlike Modula-2 >> 279 and Pascal programmers, C programmers do not use cute names like >> 280 ThisVariableIsATemporaryCounter. A C programmer would call that >> 281 variable ``tmp``, which is much easier to write, and not the least more >> 282 difficult to understand. >> 283 >> 284 HOWEVER, while mixed-case names are frowned upon, descriptive names for >> 285 global variables are a must. To call a global function ``foo`` is a >> 286 shooting offense. >> 287 >> 288 GLOBAL variables (to be used only if you **really** need them) need to >> 289 have descriptive names, as do global functions. If you have a function >> 290 that counts the number of active users, you should call that >> 291 ``count_active_users()`` or similar, you should **not** call it ``cntusr()``. >> 292 >> 293 Encoding the type of a function into the name (so-called Hungarian >> 294 notation) is brain damaged - the compiler knows the types anyway and can >> 295 check those, and it only confuses the programmer. No wonder MicroSoft >> 296 makes buggy programs. >> 297 >> 298 LOCAL variable names should be short, and to the point. If you have >> 299 some random integer loop counter, it should probably be called ``i``. >> 300 Calling it ``loop_counter`` is non-productive, if there is no chance of it >> 301 being mis-understood. Similarly, ``tmp`` can be just about any type of >> 302 variable that is used to hold a temporary value. >> 303 >> 304 If you are afraid to mix up your local variable names, you have another >> 305 problem, which is called the function-growth-hormone-imbalance syndrome. >> 306 See chapter 6 (Functions). >> 307 >> 308 >> 309 5) Typedefs >> 310 ----------- >> 311 >> 312 Please don't use things like ``vps_t``. >> 313 It's a **mistake** to use typedef for structures and pointers. When you see a >> 314 >> 315 .. code-block:: c >> 316 >> 317 >> 318 vps_t a; >> 319 >> 320 in the source, what does it mean? >> 321 In contrast, if it says >> 322 >> 323 .. code-block:: c >> 324 >> 325 struct virtual_container *a; >> 326 >> 327 you can actually tell what ``a`` is. >> 328 >> 329 Lots of people think that typedefs ``help readability``. Not so. They are >> 330 useful only for: >> 331 >> 332 (a) totally opaque objects (where the typedef is actively used to **hide** >> 333 what the object is). >> 334 >> 335 Example: ``pte_t`` etc. opaque objects that you can only access using >> 336 the proper accessor functions. >> 337 >> 338 .. note:: >> 339 >> 340 Opaqueness and ``accessor functions`` are not good in themselves. >> 341 The reason we have them for things like pte_t etc. is that there >> 342 really is absolutely **zero** portably accessible information there. >> 343 >> 344 (b) Clear integer types, where the abstraction **helps** avoid confusion >> 345 whether it is ``int`` or ``long``. >> 346 >> 347 u8/u16/u32 are perfectly fine typedefs, although they fit into >> 348 category (d) better than here. >> 349 >> 350 .. note:: >> 351 >> 352 Again - there needs to be a **reason** for this. If something is >> 353 ``unsigned long``, then there's no reason to do >> 354 >> 355 typedef unsigned long myflags_t; >> 356 >> 357 but if there is a clear reason for why it under certain circumstances >> 358 might be an ``unsigned int`` and under other configurations might be >> 359 ``unsigned long``, then by all means go ahead and use a typedef. >> 360 >> 361 (c) when you use sparse to literally create a **new** type for >> 362 type-checking. >> 363 >> 364 (d) New types which are identical to standard C99 types, in certain >> 365 exceptional circumstances. >> 366 >> 367 Although it would only take a short amount of time for the eyes and >> 368 brain to become accustomed to the standard types like ``uint32_t``, >> 369 some people object to their use anyway. >> 370 >> 371 Therefore, the Linux-specific ``u8/u16/u32/u64`` types and their >> 372 signed equivalents which are identical to standard types are >> 373 permitted -- although they are not mandatory in new code of your >> 374 own. >> 375 >> 376 When editing existing code which already uses one or the other set >> 377 of types, you should conform to the existing choices in that code. >> 378 >> 379 (e) Types safe for use in userspace. >> 380 >> 381 In certain structures which are visible to userspace, we cannot >> 382 require C99 types and cannot use the ``u32`` form above. Thus, we >> 383 use __u32 and similar types in all structures which are shared >> 384 with userspace. >> 385 >> 386 Maybe there are other cases too, but the rule should basically be to NEVER >> 387 EVER use a typedef unless you can clearly match one of those rules. >> 388 >> 389 In general, a pointer, or a struct that has elements that can reasonably >> 390 be directly accessed should **never** be a typedef. >> 391 >> 392 >> 393 6) Functions >> 394 ------------ >> 395 >> 396 Functions should be short and sweet, and do just one thing. They should >> 397 fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, >> 398 as we all know), and do one thing and do that well. >> 399 >> 400 The maximum length of a function is inversely proportional to the >> 401 complexity and indentation level of that function. So, if you have a >> 402 conceptually simple function that is just one long (but simple) >> 403 case-statement, where you have to do lots of small things for a lot of >> 404 different cases, it's OK to have a longer function. >> 405 >> 406 However, if you have a complex function, and you suspect that a >> 407 less-than-gifted first-year high-school student might not even >> 408 understand what the function is all about, you should adhere to the >> 409 maximum limits all the more closely. Use helper functions with >> 410 descriptive names (you can ask the compiler to in-line them if you think >> 411 it's performance-critical, and it will probably do a better job of it >> 412 than you would have done). >> 413 >> 414 Another measure of the function is the number of local variables. They >> 415 shouldn't exceed 5-10, or you're doing something wrong. Re-think the >> 416 function, and split it into smaller pieces. A human brain can >> 417 generally easily keep track of about 7 different things, anything more >> 418 and it gets confused. You know you're brilliant, but maybe you'd like >> 419 to understand what you did 2 weeks from now. >> 420 >> 421 In source files, separate functions with one blank line. If the function is >> 422 exported, the **EXPORT** macro for it should follow immediately after the >> 423 closing function brace line. E.g.: >> 424 >> 425 .. code-block:: c >> 426 >> 427 int system_is_up(void) >> 428 { >> 429 return system_state == SYSTEM_RUNNING; >> 430 } >> 431 EXPORT_SYMBOL(system_is_up); >> 432 >> 433 In function prototypes, include parameter names with their data types. >> 434 Although this is not required by the C language, it is preferred in Linux >> 435 because it is a simple way to add valuable information for the reader. >> 436 >> 437 >> 438 7) Centralized exiting of functions >> 439 ----------------------------------- >> 440 >> 441 Albeit deprecated by some people, the equivalent of the goto statement is >> 442 used frequently by compilers in form of the unconditional jump instruction. >> 443 >> 444 The goto statement comes in handy when a function exits from multiple >> 445 locations and some common work such as cleanup has to be done. If there is no >> 446 cleanup needed then just return directly. >> 447 >> 448 Choose label names which say what the goto does or why the goto exists. An >> 449 example of a good name could be ``out_free_buffer:`` if the goto frees ``buffer``. >> 450 Avoid using GW-BASIC names like ``err1:`` and ``err2:``, as you would have to >> 451 renumber them if you ever add or remove exit paths, and they make correctness >> 452 difficult to verify anyway. >> 453 >> 454 The rationale for using gotos is: >> 455 >> 456 - unconditional statements are easier to understand and follow >> 457 - nesting is reduced >> 458 - errors by not updating individual exit points when making >> 459 modifications are prevented >> 460 - saves the compiler work to optimize redundant code away ;) >> 461 >> 462 .. code-block:: c >> 463 >> 464 int fun(int a) >> 465 { >> 466 int result = 0; >> 467 char *buffer; >> 468 >> 469 buffer = kmalloc(SIZE, GFP_KERNEL); >> 470 if (!buffer) >> 471 return -ENOMEM; >> 472 >> 473 if (condition1) { >> 474 while (loop1) { >> 475 ... >> 476 } >> 477 result = 1; >> 478 goto out_buffer; >> 479 } >> 480 ... >> 481 out_free_buffer: >> 482 kfree(buffer); >> 483 return result; >> 484 } >> 485 >> 486 A common type of bug to be aware of is ``one err bugs`` which look like this: >> 487 >> 488 .. code-block:: c >> 489 >> 490 err: >> 491 kfree(foo->bar); >> 492 kfree(foo); >> 493 return ret; >> 494 >> 495 The bug in this code is that on some exit paths ``foo`` is NULL. Normally the >> 496 fix for this is to split it up into two error labels ``err_free_bar:`` and >> 497 ``err_free_foo:``: >> 498 >> 499 .. code-block:: c >> 500 >> 501 err_free_bar: >> 502 kfree(foo->bar); >> 503 err_free_foo: >> 504 kfree(foo); >> 505 return ret; >> 506 >> 507 Ideally you should simulate errors to test all exit paths. >> 508 >> 509 >> 510 8) Commenting >> 511 ------------- >> 512 >> 513 Comments are good, but there is also a danger of over-commenting. NEVER >> 514 try to explain HOW your code works in a comment: it's much better to >> 515 write the code so that the **working** is obvious, and it's a waste of >> 516 time to explain badly written code. >> 517 >> 518 Generally, you want your comments to tell WHAT your code does, not HOW. >> 519 Also, try to avoid putting comments inside a function body: if the >> 520 function is so complex that you need to separately comment parts of it, >> 521 you should probably go back to chapter 6 for a while. You can make >> 522 small comments to note or warn about something particularly clever (or >> 523 ugly), but try to avoid excess. Instead, put the comments at the head >> 524 of the function, telling people what it does, and possibly WHY it does >> 525 it. >> 526 >> 527 When commenting the kernel API functions, please use the kernel-doc format. >> 528 See the files Documentation/kernel-documentation.rst and scripts/kernel-doc >> 529 for details. >> 530 >> 531 The preferred style for long (multi-line) comments is: >> 532 >> 533 .. code-block:: c >> 534 >> 535 /* >> 536 * This is the preferred style for multi-line >> 537 * comments in the Linux kernel source code. >> 538 * Please use it consistently. >> 539 * >> 540 * Description: A column of asterisks on the left side, >> 541 * with beginning and ending almost-blank lines. >> 542 */ >> 543 >> 544 For files in net/ and drivers/net/ the preferred style for long (multi-line) >> 545 comments is a little different. >> 546 >> 547 .. code-block:: c >> 548 >> 549 /* The preferred comment style for files in net/ and drivers/net >> 550 * looks like this. >> 551 * >> 552 * It is nearly the same as the generally preferred comment style, >> 553 * but there is no initial almost-blank line. >> 554 */ >> 555 >> 556 It's also important to comment data, whether they are basic types or derived >> 557 types. To this end, use just one data declaration per line (no commas for >> 558 multiple data declarations). This leaves you room for a small comment on each >> 559 item, explaining its use. >> 560 >> 561 >> 562 9) You've made a mess of it >> 563 --------------------------- >> 564 >> 565 That's OK, we all do. You've probably been told by your long-time Unix >> 566 user helper that ``GNU emacs`` automatically formats the C sources for >> 567 you, and you've noticed that yes, it does do that, but the defaults it >> 568 uses are less than desirable (in fact, they are worse than random >> 569 typing - an infinite number of monkeys typing into GNU emacs would never >> 570 make a good program). >> 571 >> 572 So, you can either get rid of GNU emacs, or change it to use saner >> 573 values. To do the latter, you can stick the following in your .emacs file: >> 574 >> 575 .. code-block:: none >> 576 >> 577 (defun c-lineup-arglist-tabs-only (ignored) >> 578 "Line up argument lists by tabs, not spaces" >> 579 (let* ((anchor (c-langelem-pos c-syntactic-element)) >> 580 (column (c-langelem-2nd-pos c-syntactic-element)) >> 581 (offset (- (1+ column) anchor)) >> 582 (steps (floor offset c-basic-offset))) >> 583 (* (max steps 1) >> 584 c-basic-offset))) >> 585 >> 586 (add-hook 'c-mode-common-hook >> 587 (lambda () >> 588 ;; Add kernel style >> 589 (c-add-style >> 590 "linux-tabs-only" >> 591 '("linux" (c-offsets-alist >> 592 (arglist-cont-nonempty >> 593 c-lineup-gcc-asm-reg >> 594 c-lineup-arglist-tabs-only)))))) >> 595 >> 596 (add-hook 'c-mode-hook >> 597 (lambda () >> 598 (let ((filename (buffer-file-name))) >> 599 ;; Enable kernel mode for the appropriate files >> 600 (when (and filename >> 601 (string-match (expand-file-name "~/src/linux-trees") >> 602 filename)) >> 603 (setq indent-tabs-mode t) >> 604 (setq show-trailing-whitespace t) >> 605 (c-set-style "linux-tabs-only"))))) >> 606 >> 607 This will make emacs go better with the kernel coding style for C >> 608 files below ``~/src/linux-trees``. >> 609 >> 610 But even if you fail in getting emacs to do sane formatting, not >> 611 everything is lost: use ``indent``. >> 612 >> 613 Now, again, GNU indent has the same brain-dead settings that GNU emacs >> 614 has, which is why you need to give it a few command line options. >> 615 However, that's not too bad, because even the makers of GNU indent >> 616 recognize the authority of K&R (the GNU people aren't evil, they are >> 617 just severely misguided in this matter), so you just give indent the >> 618 options ``-kr -i8`` (stands for ``K&R, 8 character indents``), or use >> 619 ``scripts/Lindent``, which indents in the latest style. >> 620 >> 621 ``indent`` has a lot of options, and especially when it comes to comment >> 622 re-formatting you may want to take a look at the man page. But >> 623 remember: ``indent`` is not a fix for bad programming. >> 624 >> 625 >> 626 10) Kconfig configuration files >> 627 ------------------------------- >> 628 >> 629 For all of the Kconfig* configuration files throughout the source tree, >> 630 the indentation is somewhat different. Lines under a ``config`` definition >> 631 are indented with one tab, while help text is indented an additional two >> 632 spaces. Example:: >> 633 >> 634 config AUDIT >> 635 bool "Auditing support" >> 636 depends on NET >> 637 help >> 638 Enable auditing infrastructure that can be used with another >> 639 kernel subsystem, such as SELinux (which requires this for >> 640 logging of avc messages output). Does not do system-call >> 641 auditing without CONFIG_AUDITSYSCALL. >> 642 >> 643 Seriously dangerous features (such as write support for certain >> 644 filesystems) should advertise this prominently in their prompt string:: >> 645 >> 646 config ADFS_FS_RW >> 647 bool "ADFS write support (DANGEROUS)" >> 648 depends on ADFS_FS >> 649 ... >> 650 >> 651 For full documentation on the configuration files, see the file >> 652 Documentation/kbuild/kconfig-language.txt. >> 653 >> 654 >> 655 11) Data structures >> 656 ------------------- >> 657 >> 658 Data structures that have visibility outside the single-threaded >> 659 environment they are created and destroyed in should always have >> 660 reference counts. In the kernel, garbage collection doesn't exist (and >> 661 outside the kernel garbage collection is slow and inefficient), which >> 662 means that you absolutely **have** to reference count all your uses. >> 663 >> 664 Reference counting means that you can avoid locking, and allows multiple >> 665 users to have access to the data structure in parallel - and not having >> 666 to worry about the structure suddenly going away from under them just >> 667 because they slept or did something else for a while. >> 668 >> 669 Note that locking is **not** a replacement for reference counting. >> 670 Locking is used to keep data structures coherent, while reference >> 671 counting is a memory management technique. Usually both are needed, and >> 672 they are not to be confused with each other. >> 673 >> 674 Many data structures can indeed have two levels of reference counting, >> 675 when there are users of different ``classes``. The subclass count counts >> 676 the number of subclass users, and decrements the global count just once >> 677 when the subclass count goes to zero. >> 678 >> 679 Examples of this kind of ``multi-level-reference-counting`` can be found in >> 680 memory management (``struct mm_struct``: mm_users and mm_count), and in >> 681 filesystem code (``struct super_block``: s_count and s_active). >> 682 >> 683 Remember: if another thread can find your data structure, and you don't >> 684 have a reference count on it, you almost certainly have a bug. >> 685 >> 686 >> 687 12) Macros, Enums and RTL >> 688 ------------------------- >> 689 >> 690 Names of macros defining constants and labels in enums are capitalized. >> 691 >> 692 .. code-block:: c >> 693 >> 694 #define CONSTANT 0x12345 >> 695 >> 696 Enums are preferred when defining several related constants. >> 697 >> 698 CAPITALIZED macro names are appreciated but macros resembling functions >> 699 may be named in lower case. >> 700 >> 701 Generally, inline functions are preferable to macros resembling functions. >> 702 >> 703 Macros with multiple statements should be enclosed in a do - while block: >> 704 >> 705 .. code-block:: c >> 706 >> 707 #define macrofun(a, b, c) \ >> 708 do { \ >> 709 if (a == 5) \ >> 710 do_this(b, c); \ >> 711 } while (0) >> 712 >> 713 Things to avoid when using macros: >> 714 >> 715 1) macros that affect control flow: >> 716 >> 717 .. code-block:: c >> 718 >> 719 #define FOO(x) \ >> 720 do { \ >> 721 if (blah(x) < 0) \ >> 722 return -EBUGGERED; \ >> 723 } while (0) >> 724 >> 725 is a **very** bad idea. It looks like a function call but exits the ``calling`` >> 726 function; don't break the internal parsers of those who will read the code. >> 727 >> 728 2) macros that depend on having a local variable with a magic name: >> 729 >> 730 .. code-block:: c >> 731 >> 732 #define FOO(val) bar(index, val) >> 733 >> 734 might look like a good thing, but it's confusing as hell when one reads the >> 735 code and it's prone to breakage from seemingly innocent changes. >> 736 >> 737 3) macros with arguments that are used as l-values: FOO(x) = y; will >> 738 bite you if somebody e.g. turns FOO into an inline function. >> 739 >> 740 4) forgetting about precedence: macros defining constants using expressions >> 741 must enclose the expression in parentheses. Beware of similar issues with >> 742 macros using parameters. >> 743 >> 744 .. code-block:: c >> 745 >> 746 #define CONSTANT 0x4000 >> 747 #define CONSTEXP (CONSTANT | 3) >> 748 >> 749 5) namespace collisions when defining local variables in macros resembling >> 750 functions: >> 751 >> 752 .. code-block:: c >> 753 >> 754 #define FOO(x) \ >> 755 ({ \ >> 756 typeof(x) ret; \ >> 757 ret = calc_ret(x); \ >> 758 (ret); \ >> 759 }) >> 760 >> 761 ret is a common name for a local variable - __foo_ret is less likely >> 762 to collide with an existing variable. >> 763 >> 764 The cpp manual deals with macros exhaustively. The gcc internals manual also >> 765 covers RTL which is used frequently with assembly language in the kernel. >> 766 >> 767 >> 768 13) Printing kernel messages >> 769 ---------------------------- >> 770 >> 771 Kernel developers like to be seen as literate. Do mind the spelling >> 772 of kernel messages to make a good impression. Do not use crippled >> 773 words like ``dont``; use ``do not`` or ``don't`` instead. Make the messages >> 774 concise, clear, and unambiguous. >> 775 >> 776 Kernel messages do not have to be terminated with a period. >> 777 >> 778 Printing numbers in parentheses (%d) adds no value and should be avoided. >> 779 >> 780 There are a number of driver model diagnostic macros in <linux/device.h> >> 781 which you should use to make sure messages are matched to the right device >> 782 and driver, and are tagged with the right level: dev_err(), dev_warn(), >> 783 dev_info(), and so forth. For messages that aren't associated with a >> 784 particular device, <linux/printk.h> defines pr_notice(), pr_info(), >> 785 pr_warn(), pr_err(), etc. >> 786 >> 787 Coming up with good debugging messages can be quite a challenge; and once >> 788 you have them, they can be a huge help for remote troubleshooting. However >> 789 debug message printing is handled differently than printing other non-debug >> 790 messages. While the other pr_XXX() functions print unconditionally, >> 791 pr_debug() does not; it is compiled out by default, unless either DEBUG is >> 792 defined or CONFIG_DYNAMIC_DEBUG is set. That is true for dev_dbg() also, >> 793 and a related convention uses VERBOSE_DEBUG to add dev_vdbg() messages to >> 794 the ones already enabled by DEBUG. >> 795 >> 796 Many subsystems have Kconfig debug options to turn on -DDEBUG in the >> 797 corresponding Makefile; in other cases specific files #define DEBUG. And >> 798 when a debug message should be unconditionally printed, such as if it is >> 799 already inside a debug-related #ifdef section, printk(KERN_DEBUG ...) can be >> 800 used. >> 801 >> 802 >> 803 14) Allocating memory >> 804 --------------------- >> 805 >> 806 The kernel provides the following general purpose memory allocators: >> 807 kmalloc(), kzalloc(), kmalloc_array(), kcalloc(), vmalloc(), and >> 808 vzalloc(). Please refer to the API documentation for further information >> 809 about them. >> 810 >> 811 The preferred form for passing a size of a struct is the following: >> 812 >> 813 .. code-block:: c >> 814 >> 815 p = kmalloc(sizeof(*p), ...); >> 816 >> 817 The alternative form where struct name is spelled out hurts readability and >> 818 introduces an opportunity for a bug when the pointer variable type is changed >> 819 but the corresponding sizeof that is passed to a memory allocator is not. >> 820 >> 821 Casting the return value which is a void pointer is redundant. The conversion >> 822 from void pointer to any other pointer type is guaranteed by the C programming >> 823 language. >> 824 >> 825 The preferred form for allocating an array is the following: >> 826 >> 827 .. code-block:: c >> 828 >> 829 p = kmalloc_array(n, sizeof(...), ...); >> 830 >> 831 The preferred form for allocating a zeroed array is the following: >> 832 >> 833 .. code-block:: c >> 834 >> 835 p = kcalloc(n, sizeof(...), ...); >> 836 >> 837 Both forms check for overflow on the allocation size n * sizeof(...), >> 838 and return NULL if that occurred. >> 839 >> 840 >> 841 15) The inline disease >> 842 ---------------------- >> 843 >> 844 There appears to be a common misperception that gcc has a magic "make me >> 845 faster" speedup option called ``inline``. While the use of inlines can be >> 846 appropriate (for example as a means of replacing macros, see Chapter 12), it >> 847 very often is not. Abundant use of the inline keyword leads to a much bigger >> 848 kernel, which in turn slows the system as a whole down, due to a bigger >> 849 icache footprint for the CPU and simply because there is less memory >> 850 available for the pagecache. Just think about it; a pagecache miss causes a >> 851 disk seek, which easily takes 5 milliseconds. There are a LOT of cpu cycles >> 852 that can go into these 5 milliseconds. >> 853 >> 854 A reasonable rule of thumb is to not put inline at functions that have more >> 855 than 3 lines of code in them. An exception to this rule are the cases where >> 856 a parameter is known to be a compiletime constant, and as a result of this >> 857 constantness you *know* the compiler will be able to optimize most of your >> 858 function away at compile time. For a good example of this later case, see >> 859 the kmalloc() inline function. >> 860 >> 861 Often people argue that adding inline to functions that are static and used >> 862 only once is always a win since there is no space tradeoff. While this is >> 863 technically correct, gcc is capable of inlining these automatically without >> 864 help, and the maintenance issue of removing the inline when a second user >> 865 appears outweighs the potential value of the hint that tells gcc to do >> 866 something it would have done anyway. >> 867 >> 868 >> 869 16) Function return values and names >> 870 ------------------------------------ >> 871 >> 872 Functions can return values of many different kinds, and one of the >> 873 most common is a value indicating whether the function succeeded or >> 874 failed. Such a value can be represented as an error-code integer >> 875 (-Exxx = failure, 0 = success) or a ``succeeded`` boolean (0 = failure, >> 876 non-zero = success). >> 877 >> 878 Mixing up these two sorts of representations is a fertile source of >> 879 difficult-to-find bugs. If the C language included a strong distinction >> 880 between integers and booleans then the compiler would find these mistakes >> 881 for us... but it doesn't. To help prevent such bugs, always follow this >> 882 convention:: >> 883 >> 884 If the name of a function is an action or an imperative command, >> 885 the function should return an error-code integer. If the name >> 886 is a predicate, the function should return a "succeeded" boolean. >> 887 >> 888 For example, ``add work`` is a command, and the add_work() function returns 0 >> 889 for success or -EBUSY for failure. In the same way, ``PCI device present`` is >> 890 a predicate, and the pci_dev_present() function returns 1 if it succeeds in >> 891 finding a matching device or 0 if it doesn't. >> 892 >> 893 All EXPORTed functions must respect this convention, and so should all >> 894 public functions. Private (static) functions need not, but it is >> 895 recommended that they do. >> 896 >> 897 Functions whose return value is the actual result of a computation, rather >> 898 than an indication of whether the computation succeeded, are not subject to >> 899 this rule. Generally they indicate failure by returning some out-of-range >> 900 result. Typical examples would be functions that return pointers; they use >> 901 NULL or the ERR_PTR mechanism to report failure. >> 902 >> 903 >> 904 17) Don't re-invent the kernel macros >> 905 ------------------------------------- >> 906 >> 907 The header file include/linux/kernel.h contains a number of macros that >> 908 you should use, rather than explicitly coding some variant of them yourself. >> 909 For example, if you need to calculate the length of an array, take advantage >> 910 of the macro >> 911 >> 912 .. code-block:: c >> 913 >> 914 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) >> 915 >> 916 Similarly, if you need to calculate the size of some structure member, use >> 917 >> 918 .. code-block:: c >> 919 >> 920 #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f)) >> 921 >> 922 There are also min() and max() macros that do strict type checking if you >> 923 need them. Feel free to peruse that header file to see what else is already >> 924 defined that you shouldn't reproduce in your code. >> 925 >> 926 >> 927 18) Editor modelines and other cruft >> 928 ------------------------------------ >> 929 >> 930 Some editors can interpret configuration information embedded in source files, >> 931 indicated with special markers. For example, emacs interprets lines marked >> 932 like this: >> 933 >> 934 .. code-block:: c >> 935 >> 936 -*- mode: c -*- >> 937 >> 938 Or like this: >> 939 >> 940 .. code-block:: c >> 941 >> 942 /* >> 943 Local Variables: >> 944 compile-command: "gcc -DMAGIC_DEBUG_FLAG foo.c" >> 945 End: >> 946 */ >> 947 >> 948 Vim interprets markers that look like this: >> 949 >> 950 .. code-block:: c >> 951 >> 952 /* vim:set sw=8 noet */ >> 953 >> 954 Do not include any of these in source files. People have their own personal >> 955 editor configurations, and your source files should not override them. This >> 956 includes markers for indentation and mode configuration. People may use their >> 957 own custom mode, or may have some other magic method for making indentation >> 958 work correctly. >> 959 >> 960 >> 961 19) Inline assembly >> 962 ------------------- >> 963 >> 964 In architecture-specific code, you may need to use inline assembly to interface >> 965 with CPU or platform functionality. Don't hesitate to do so when necessary. >> 966 However, don't use inline assembly gratuitously when C can do the job. You can >> 967 and should poke hardware from C when possible. >> 968 >> 969 Consider writing simple helper functions that wrap common bits of inline >> 970 assembly, rather than repeatedly writing them with slight variations. Remember >> 971 that inline assembly can use C parameters. >> 972 >> 973 Large, non-trivial assembly functions should go in .S files, with corresponding >> 974 C prototypes defined in C header files. The C prototypes for assembly >> 975 functions should use ``asmlinkage``. >> 976 >> 977 You may need to mark your asm statement as volatile, to prevent GCC from >> 978 removing it if GCC doesn't notice any side effects. You don't always need to >> 979 do so, though, and doing so unnecessarily can limit optimization. >> 980 >> 981 When writing a single inline assembly statement containing multiple >> 982 instructions, put each instruction on a separate line in a separate quoted >> 983 string, and end each string except the last with \n\t to properly indent the >> 984 next instruction in the assembly output: >> 985 >> 986 .. code-block:: c >> 987 >> 988 asm ("magic %reg1, #42\n\t" >> 989 "more_magic %reg2, %reg3" >> 990 : /* outputs */ : /* inputs */ : /* clobbers */); >> 991 >> 992 >> 993 20) Conditional Compilation >> 994 --------------------------- >> 995 >> 996 Wherever possible, don't use preprocessor conditionals (#if, #ifdef) in .c >> 997 files; doing so makes code harder to read and logic harder to follow. Instead, >> 998 use such conditionals in a header file defining functions for use in those .c >> 999 files, providing no-op stub versions in the #else case, and then call those >> 1000 functions unconditionally from .c files. The compiler will avoid generating >> 1001 any code for the stub calls, producing identical results, but the logic will >> 1002 remain easy to follow. >> 1003 >> 1004 Prefer to compile out entire functions, rather than portions of functions or >> 1005 portions of expressions. Rather than putting an ifdef in an expression, factor >> 1006 out part or all of the expression into a separate helper function and apply the >> 1007 conditional to that function. >> 1008 >> 1009 If you have a function or variable which may potentially go unused in a >> 1010 particular configuration, and the compiler would warn about its definition >> 1011 going unused, mark the definition as __maybe_unused rather than wrapping it in >> 1012 a preprocessor conditional. (However, if a function or variable *always* goes >> 1013 unused, delete it.) >> 1014 >> 1015 Within code, where possible, use the IS_ENABLED macro to convert a Kconfig >> 1016 symbol into a C boolean expression, and use it in a normal C conditional: >> 1017 >> 1018 .. code-block:: c >> 1019 >> 1020 if (IS_ENABLED(CONFIG_SOMETHING)) { >> 1021 ... >> 1022 } >> 1023 >> 1024 The compiler will constant-fold the conditional away, and include or exclude >> 1025 the block of code just as with an #ifdef, so this will not add any runtime >> 1026 overhead. However, this approach still allows the C compiler to see the code >> 1027 inside the block, and check it for correctness (syntax, types, symbol >> 1028 references, etc). Thus, you still have to use an #ifdef if the code inside the >> 1029 block references symbols that will not exist if the condition is not met. >> 1030 >> 1031 At the end of any non-trivial #if or #ifdef block (more than a few lines), >> 1032 place a comment after the #endif on the same line, noting the conditional >> 1033 expression used. For instance: >> 1034 >> 1035 .. code-block:: c >> 1036 >> 1037 #ifdef CONFIG_SOMETHING >> 1038 ... >> 1039 #endif /* CONFIG_SOMETHING */ >> 1040 >> 1041 >> 1042 Appendix I) References >> 1043 ---------------------- >> 1044 >> 1045 The C Programming Language, Second Edition >> 1046 by Brian W. Kernighan and Dennis M. Ritchie. >> 1047 Prentice Hall, Inc., 1988. >> 1048 ISBN 0-13-110362-8 (paperback), 0-13-110370-9 (hardback). >> 1049 >> 1050 The Practice of Programming >> 1051 by Brian W. Kernighan and Rob Pike. >> 1052 Addison-Wesley, Inc., 1999. >> 1053 ISBN 0-201-61586-X. >> 1054 >> 1055 GNU manuals - where in compliance with K&R and this text - for cpp, gcc, >> 1056 gcc internals and indent, all available from http://www.gnu.org/manual/ >> 1057 >> 1058 WG14 is the international standardization working group for the programming >> 1059 language C, URL: http://www.open-std.org/JTC1/SC22/WG14/ >> 1060 >> 1061 Kernel CodingStyle, by greg@kroah.com at OLS 2002: >> 1062 http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.