1 .. SPDX-License-Identifier: GPL-2.0 2 3 ======================== 4 Linux and the Devicetree 5 ======================== 6 7 The Linux usage model for device tree data 8 9 :Author: Grant Likely <grant.likely@secretlab.ca> 10 11 This article describes how Linux uses the device tree. An overview of 12 the device tree data format can be found on the device tree usage page 13 at devicetree.org\ [1]_. 14 15 .. [1] https://www.devicetree.org/specifications/ 16 17 The "Open Firmware Device Tree", or simply Devicetree (DT), is a data 18 structure and language for describing hardware. More specifically, it 19 is a description of hardware that is readable by an operating system 20 so that the operating system doesn't need to hard code details of the 21 machine. 22 23 Structurally, the DT is a tree, or acyclic graph with named nodes, and 24 nodes may have an arbitrary number of named properties encapsulating 25 arbitrary data. A mechanism also exists to create arbitrary 26 links from one node to another outside of the natural tree structure. 27 28 Conceptually, a common set of usage conventions, called 'bindings', 29 is defined for how data should appear in the tree to describe typical 30 hardware characteristics including data busses, interrupt lines, GPIO 31 connections, and peripheral devices. 32 33 As much as possible, hardware is described using existing bindings to 34 maximize use of existing support code, but since property and node 35 names are simply text strings, it is easy to extend existing bindings 36 or create new ones by defining new nodes and properties. Be wary, 37 however, of creating a new binding without first doing some homework 38 about what already exists. There are currently two different, 39 incompatible, bindings for i2c busses that came about because the new 40 binding was created without first investigating how i2c devices were 41 already being enumerated in existing systems. 42 43 1. History 44 ---------- 45 The DT was originally created by Open Firmware as part of the 46 communication method for passing data from Open Firmware to a client 47 program (like to an operating system). An operating system used the 48 Device Tree to discover the topology of the hardware at runtime, and 49 thereby support a majority of available hardware without hard coded 50 information (assuming drivers were available for all devices). 51 52 Since Open Firmware is commonly used on PowerPC and SPARC platforms, 53 the Linux support for those architectures has for a long time used the 54 Device Tree. 55 56 In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit 57 and 64-bit support, the decision was made to require DT support on all 58 powerpc platforms, regardless of whether or not they used Open 59 Firmware. To do this, a DT representation called the Flattened Device 60 Tree (FDT) was created which could be passed to the kernel as a binary 61 blob without requiring a real Open Firmware implementation. U-Boot, 62 kexec, and other bootloaders were modified to support both passing a 63 Device Tree Binary (dtb) and to modify a dtb at boot time. DT was 64 also added to the PowerPC boot wrapper (``arch/powerpc/boot/*``) so that 65 a dtb could be wrapped up with the kernel image to support booting 66 existing non-DT aware firmware. 67 68 Some time later, FDT infrastructure was generalized to be usable by 69 all architectures. At the time of this writing, 6 mainlined 70 architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1 71 out of mainline (nios) have some level of DT support. 72 73 2. Data Model 74 ------------- 75 If you haven't already read the Device Tree Usage\ [1]_ page, 76 then go read it now. It's okay, I'll wait.... 77 78 2.1 High Level View 79 ------------------- 80 The most important thing to understand is that the DT is simply a data 81 structure that describes the hardware. There is nothing magical about 82 it, and it doesn't magically make all hardware configuration problems 83 go away. What it does do is provide a language for decoupling the 84 hardware configuration from the board and device driver support in the 85 Linux kernel (or any other operating system for that matter). Using 86 it allows board and device support to become data driven; to make 87 setup decisions based on data passed into the kernel instead of on 88 per-machine hard coded selections. 89 90 Ideally, data driven platform setup should result in less code 91 duplication and make it easier to support a wide range of hardware 92 with a single kernel image. 93 94 Linux uses DT data for three major purposes: 95 96 1) platform identification, 97 2) runtime configuration, and 98 3) device population. 99 100 2.2 Platform Identification 101 --------------------------- 102 First and foremost, the kernel will use data in the DT to identify the 103 specific machine. In a perfect world, the specific platform shouldn't 104 matter to the kernel because all platform details would be described 105 perfectly by the device tree in a consistent and reliable manner. 106 Hardware is not perfect though, and so the kernel must identify the 107 machine during early boot so that it has the opportunity to run 108 machine-specific fixups. 109 110 In the majority of cases, the machine identity is irrelevant, and the 111 kernel will instead select setup code based on the machine's core 112 CPU or SoC. On ARM for example, setup_arch() in 113 arch/arm/kernel/setup.c will call setup_machine_fdt() in 114 arch/arm/kernel/devtree.c which searches through the machine_desc 115 table and selects the machine_desc which best matches the device tree 116 data. It determines the best match by looking at the 'compatible' 117 property in the root device tree node, and comparing it with the 118 dt_compat list in struct machine_desc (which is defined in 119 arch/arm/include/asm/mach/arch.h if you're curious). 120 121 The 'compatible' property contains a sorted list of strings starting 122 with the exact name of the machine, followed by an optional list of 123 boards it is compatible with sorted from most compatible to least. For 124 example, the root compatible properties for the TI BeagleBoard and its 125 successor, the BeagleBoard xM board might look like, respectively:: 126 127 compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3"; 128 compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3"; 129 130 Where "ti,omap3-beagleboard-xm" specifies the exact model, it also 131 claims that it compatible with the OMAP 3450 SoC, and the omap3 family 132 of SoCs in general. You'll notice that the list is sorted from most 133 specific (exact board) to least specific (SoC family). 134 135 Astute readers might point out that the Beagle xM could also claim 136 compatibility with the original Beagle board. However, one should be 137 cautioned about doing so at the board level since there is typically a 138 high level of change from one board to another, even within the same 139 product line, and it is hard to nail down exactly what is meant when one 140 board claims to be compatible with another. For the top level, it is 141 better to err on the side of caution and not claim one board is 142 compatible with another. The notable exception would be when one 143 board is a carrier for another, such as a CPU module attached to a 144 carrier board. 145 146 One more note on compatible values. Any string used in a compatible 147 property must be documented as to what it indicates. Add 148 documentation for compatible strings in Documentation/devicetree/bindings. 149 150 Again on ARM, for each machine_desc, the kernel looks to see if 151 any of the dt_compat list entries appear in the compatible property. 152 If one does, then that machine_desc is a candidate for driving the 153 machine. After searching the entire table of machine_descs, 154 setup_machine_fdt() returns the 'most compatible' machine_desc based 155 on which entry in the compatible property each machine_desc matches 156 against. If no matching machine_desc is found, then it returns NULL. 157 158 The reasoning behind this scheme is the observation that in the majority 159 of cases, a single machine_desc can support a large number of boards 160 if they all use the same SoC, or same family of SoCs. However, 161 invariably there will be some exceptions where a specific board will 162 require special setup code that is not useful in the generic case. 163 Special cases could be handled by explicitly checking for the 164 troublesome board(s) in generic setup code, but doing so very quickly 165 becomes ugly and/or unmaintainable if it is more than just a couple of 166 cases. 167 168 Instead, the compatible list allows a generic machine_desc to provide 169 support for a wide common set of boards by specifying "less 170 compatible" values in the dt_compat list. In the example above, 171 generic board support can claim compatibility with "ti,omap3" or 172 "ti,omap3450". If a bug was discovered on the original beagleboard 173 that required special workaround code during early boot, then a new 174 machine_desc could be added which implements the workarounds and only 175 matches on "ti,omap3-beagleboard". 176 177 PowerPC uses a slightly different scheme where it calls the .probe() 178 hook from each machine_desc, and the first one returning TRUE is used. 179 However, this approach does not take into account the priority of the 180 compatible list, and probably should be avoided for new architecture 181 support. 182 183 2.3 Runtime configuration 184 ------------------------- 185 In most cases, a DT will be the sole method of communicating data from 186 firmware to the kernel, so also gets used to pass in runtime and 187 configuration data like the kernel parameters string and the location 188 of an initrd image. 189 190 Most of this data is contained in the /chosen node, and when booting 191 Linux it will look something like this:: 192 193 chosen { 194 bootargs = "console=ttyS0,115200 loglevel=8"; 195 initrd-start = <0xc8000000>; 196 initrd-end = <0xc8200000>; 197 }; 198 199 The bootargs property contains the kernel arguments, and the initrd-* 200 properties define the address and size of an initrd blob. Note that 201 initrd-end is the first address after the initrd image, so this doesn't 202 match the usual semantic of struct resource. The chosen node may also 203 optionally contain an arbitrary number of additional properties for 204 platform-specific configuration data. 205 206 During early boot, the architecture setup code calls of_scan_flat_dt() 207 several times with different helper callbacks to parse device tree 208 data before paging is setup. The of_scan_flat_dt() code scans through 209 the device tree and uses the helpers to extract information required 210 during early boot. Typically the early_init_dt_scan_chosen() helper 211 is used to parse the chosen node including kernel parameters, 212 early_init_dt_scan_root() to initialize the DT address space model, 213 and early_init_dt_scan_memory() to determine the size and 214 location of usable RAM. 215 216 On ARM, the function setup_machine_fdt() is responsible for early 217 scanning of the device tree after selecting the correct machine_desc 218 that supports the board. 219 220 2.4 Device population 221 --------------------- 222 After the board has been identified, and after the early configuration data 223 has been parsed, then kernel initialization can proceed in the normal 224 way. At some point in this process, unflatten_device_tree() is called 225 to convert the data into a more efficient runtime representation. 226 This is also when machine-specific setup hooks will get called, like 227 the machine_desc .init_early(), .init_irq() and .init_machine() hooks 228 on ARM. The remainder of this section uses examples from the ARM 229 implementation, but all architectures will do pretty much the same 230 thing when using a DT. 231 232 As can be guessed by the names, .init_early() is used for any machine- 233 specific setup that needs to be executed early in the boot process, 234 and .init_irq() is used to set up interrupt handling. Using a DT 235 doesn't materially change the behaviour of either of these functions. 236 If a DT is provided, then both .init_early() and .init_irq() are able 237 to call any of the DT query functions (of_* in include/linux/of*.h) to 238 get additional data about the platform. 239 240 The most interesting hook in the DT context is .init_machine() which 241 is primarily responsible for populating the Linux device model with 242 data about the platform. Historically this has been implemented on 243 embedded platforms by defining a set of static clock structures, 244 platform_devices, and other data in the board support .c file, and 245 registering it en-masse in .init_machine(). When DT is used, then 246 instead of hard coding static devices for each platform, the list of 247 devices can be obtained by parsing the DT, and allocating device 248 structures dynamically. 249 250 The simplest case is when .init_machine() is only responsible for 251 registering a block of platform_devices. A platform_device is a concept 252 used by Linux for memory or I/O mapped devices which cannot be detected 253 by hardware, and for 'composite' or 'virtual' devices (more on those 254 later). While there is no 'platform device' terminology for the DT, 255 platform devices roughly correspond to device nodes at the root of the 256 tree and children of simple memory mapped bus nodes. 257 258 About now is a good time to lay out an example. Here is part of the 259 device tree for the NVIDIA Tegra board:: 260 261 /{ 262 compatible = "nvidia,harmony", "nvidia,tegra20"; 263 #address-cells = <1>; 264 #size-cells = <1>; 265 interrupt-parent = <&intc>; 266 267 chosen { }; 268 aliases { }; 269 270 memory { 271 device_type = "memory"; 272 reg = <0x00000000 0x40000000>; 273 }; 274 275 soc { 276 compatible = "nvidia,tegra20-soc", "simple-bus"; 277 #address-cells = <1>; 278 #size-cells = <1>; 279 ranges; 280 281 intc: interrupt-controller@50041000 { 282 compatible = "nvidia,tegra20-gic"; 283 interrupt-controller; 284 #interrupt-cells = <1>; 285 reg = <0x50041000 0x1000>, < 0x50040100 0x0100 >; 286 }; 287 288 serial@70006300 { 289 compatible = "nvidia,tegra20-uart"; 290 reg = <0x70006300 0x100>; 291 interrupts = <122>; 292 }; 293 294 i2s1: i2s@70002800 { 295 compatible = "nvidia,tegra20-i2s"; 296 reg = <0x70002800 0x100>; 297 interrupts = <77>; 298 codec = <&wm8903>; 299 }; 300 301 i2c@7000c000 { 302 compatible = "nvidia,tegra20-i2c"; 303 #address-cells = <1>; 304 #size-cells = <0>; 305 reg = <0x7000c000 0x100>; 306 interrupts = <70>; 307 308 wm8903: codec@1a { 309 compatible = "wlf,wm8903"; 310 reg = <0x1a>; 311 interrupts = <347>; 312 }; 313 }; 314 }; 315 316 sound { 317 compatible = "nvidia,harmony-sound"; 318 i2s-controller = <&i2s1>; 319 i2s-codec = <&wm8903>; 320 }; 321 }; 322 323 At .init_machine() time, Tegra board support code will need to look at 324 this DT and decide which nodes to create platform_devices for. 325 However, looking at the tree, it is not immediately obvious what kind 326 of device each node represents, or even if a node represents a device 327 at all. The /chosen, /aliases, and /memory nodes are informational 328 nodes that don't describe devices (although arguably memory could be 329 considered a device). The children of the /soc node are memory mapped 330 devices, but the codec@1a is an i2c device, and the sound node 331 represents not a device, but rather how other devices are connected 332 together to create the audio subsystem. I know what each device is 333 because I'm familiar with the board design, but how does the kernel 334 know what to do with each node? 335 336 The trick is that the kernel starts at the root of the tree and looks 337 for nodes that have a 'compatible' property. First, it is generally 338 assumed that any node with a 'compatible' property represents a device 339 of some kind, and second, it can be assumed that any node at the root 340 of the tree is either directly attached to the processor bus, or is a 341 miscellaneous system device that cannot be described any other way. 342 For each of these nodes, Linux allocates and registers a 343 platform_device, which in turn may get bound to a platform_driver. 344 345 Why is using a platform_device for these nodes a safe assumption? 346 Well, for the way that Linux models devices, just about all bus_types 347 assume that its devices are children of a bus controller. For 348 example, each i2c_client is a child of an i2c_master. Each spi_device 349 is a child of an SPI bus. Similarly for USB, PCI, MDIO, etc. The 350 same hierarchy is also found in the DT, where I2C device nodes only 351 ever appear as children of an I2C bus node. Ditto for SPI, MDIO, USB, 352 etc. The only devices which do not require a specific type of parent 353 device are platform_devices (and amba_devices, but more on that 354 later), which will happily live at the base of the Linux /sys/devices 355 tree. Therefore, if a DT node is at the root of the tree, then it 356 really probably is best registered as a platform_device. 357 358 Linux board support code calls of_platform_populate(NULL, NULL, NULL, NULL) 359 to kick off discovery of devices at the root of the tree. The 360 parameters are all NULL because when starting from the root of the 361 tree, there is no need to provide a starting node (the first NULL), a 362 parent struct device (the last NULL), and we're not using a match 363 table (yet). For a board that only needs to register devices, 364 .init_machine() can be completely empty except for the 365 of_platform_populate() call. 366 367 In the Tegra example, this accounts for the /soc and /sound nodes, but 368 what about the children of the SoC node? Shouldn't they be registered 369 as platform devices too? For Linux DT support, the generic behaviour 370 is for child devices to be registered by the parent's device driver at 371 driver .probe() time. So, an i2c bus device driver will register a 372 i2c_client for each child node, an SPI bus driver will register 373 its spi_device children, and similarly for other bus_types. 374 According to that model, a driver could be written that binds to the 375 SoC node and simply registers platform_devices for each of its 376 children. The board support code would allocate and register an SoC 377 device, a (theoretical) SoC device driver could bind to the SoC device, 378 and register platform_devices for /soc/interrupt-controller, /soc/serial, 379 /soc/i2s, and /soc/i2c in its .probe() hook. Easy, right? 380 381 Actually, it turns out that registering children of some 382 platform_devices as more platform_devices is a common pattern, and the 383 device tree support code reflects that and makes the above example 384 simpler. The second argument to of_platform_populate() is an 385 of_device_id table, and any node that matches an entry in that table 386 will also get its child nodes registered. In the Tegra case, the code 387 can look something like this:: 388 389 static void __init harmony_init_machine(void) 390 { 391 /* ... */ 392 of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL); 393 } 394 395 "simple-bus" is defined in the Devicetree Specification as a property 396 meaning a simple memory mapped bus, so the of_platform_populate() code 397 could be written to just assume simple-bus compatible nodes will 398 always be traversed. However, we pass it in as an argument so that 399 board support code can always override the default behaviour. 400 401 [Need to add discussion of adding i2c/spi/etc child devices] 402 403 Appendix A: AMBA devices 404 ------------------------ 405 406 ARM Primecells are a certain kind of device attached to the ARM AMBA 407 bus which include some support for hardware detection and power 408 management. In Linux, struct amba_device and the amba_bus_type is 409 used to represent Primecell devices. However, the fiddly bit is that 410 not all devices on an AMBA bus are Primecells, and for Linux it is 411 typical for both amba_device and platform_device instances to be 412 siblings of the same bus segment. 413 414 When using the DT, this creates problems for of_platform_populate() 415 because it must decide whether to register each node as either a 416 platform_device or an amba_device. This unfortunately complicates the 417 device creation model a little bit, but the solution turns out not to 418 be too invasive. If a node is compatible with "arm,primecell", then 419 of_platform_populate() will register it as an amba_device instead of a 420 platform_device.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.