1 ===================== 2 PHY Abstraction Layer 3 ===================== 4 5 Purpose 6 ======= 7 8 Most network devices consist of set of registers which provide an interface 9 to a MAC layer, which communicates with the physical connection through a 10 PHY. The PHY concerns itself with negotiating link parameters with the link 11 partner on the other side of the network connection (typically, an ethernet 12 cable), and provides a register interface to allow drivers to determine what 13 settings were chosen, and to configure what settings are allowed. 14 15 While these devices are distinct from the network devices, and conform to a 16 standard layout for the registers, it has been common practice to integrate 17 the PHY management code with the network driver. This has resulted in large 18 amounts of redundant code. Also, on embedded systems with multiple (and 19 sometimes quite different) ethernet controllers connected to the same 20 management bus, it is difficult to ensure safe use of the bus. 21 22 Since the PHYs are devices, and the management busses through which they are 23 accessed are, in fact, busses, the PHY Abstraction Layer treats them as such. 24 In doing so, it has these goals: 25 26 #. Increase code-reuse 27 #. Increase overall code-maintainability 28 #. Speed development time for new network drivers, and for new systems 29 30 Basically, this layer is meant to provide an interface to PHY devices which 31 allows network driver writers to write as little code as possible, while 32 still providing a full feature set. 33 34 The MDIO bus 35 ============ 36 37 Most network devices are connected to a PHY by means of a management bus. 38 Different devices use different busses (though some share common interfaces). 39 In order to take advantage of the PAL, each bus interface needs to be 40 registered as a distinct device. 41 42 #. read and write functions must be implemented. Their prototypes are:: 43 44 int write(struct mii_bus *bus, int mii_id, int regnum, u16 value); 45 int read(struct mii_bus *bus, int mii_id, int regnum); 46 47 mii_id is the address on the bus for the PHY, and regnum is the register 48 number. These functions are guaranteed not to be called from interrupt 49 time, so it is safe for them to block, waiting for an interrupt to signal 50 the operation is complete 51 52 #. A reset function is optional. This is used to return the bus to an 53 initialized state. 54 55 #. A probe function is needed. This function should set up anything the bus 56 driver needs, setup the mii_bus structure, and register with the PAL using 57 mdiobus_register. Similarly, there's a remove function to undo all of 58 that (use mdiobus_unregister). 59 60 #. Like any driver, the device_driver structure must be configured, and init 61 exit functions are used to register the driver. 62 63 #. The bus must also be declared somewhere as a device, and registered. 64 65 As an example for how one driver implemented an mdio bus driver, see 66 drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file 67 for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/") 68 69 (RG)MII/electrical interface considerations 70 =========================================== 71 72 The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin 73 electrical signal interface using a synchronous 125Mhz clock signal and several 74 data lines. Due to this design decision, a 1.5ns to 2ns delay must be added 75 between the clock line (RXC or TXC) and the data lines to let the PHY (clock 76 sink) have a large enough setup and hold time to sample the data lines correctly. The 77 PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let 78 the PHY driver and optionally the MAC driver, implement the required delay. The 79 values of phy_interface_t must be understood from the perspective of the PHY 80 device itself, leading to the following: 81 82 * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any 83 internal delay by itself, it assumes that either the Ethernet MAC (if capable) 84 or the PCB traces insert the correct 1.5-2ns delay 85 86 * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay 87 for the transmit data lines (TXD[3:0]) processed by the PHY device 88 89 * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay 90 for the receive data lines (RXD[3:0]) processed by the PHY device 91 92 * PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for 93 both transmit AND receive data lines from/to the PHY device 94 95 Whenever possible, use the PHY side RGMII delay for these reasons: 96 97 * PHY devices may offer sub-nanosecond granularity in how they allow a 98 receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such 99 precision may be required to account for differences in PCB trace lengths 100 101 * PHY devices are typically qualified for a large range of applications 102 (industrial, medical, automotive...), and they provide a constant and 103 reliable delay across temperature/pressure/voltage ranges 104 105 * PHY device drivers in PHYLIB being reusable by nature, being able to 106 configure correctly a specified delay enables more designs with similar delay 107 requirements to be operated correctly 108 109 For cases where the PHY is not capable of providing this delay, but the 110 Ethernet MAC driver is capable of doing so, the correct phy_interface_t value 111 should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be 112 configured correctly in order to provide the required transmit and/or receive 113 side delay from the perspective of the PHY device. Conversely, if the Ethernet 114 MAC driver looks at the phy_interface_t value, for any other mode but 115 PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are 116 disabled. 117 118 In case neither the Ethernet MAC, nor the PHY are capable of providing the 119 required delays, as defined per the RGMII standard, several options may be 120 available: 121 122 * Some SoCs may offer a pin pad/mux/controller capable of configuring a given 123 set of pins' strength, delays, and voltage; and it may be a suitable 124 option to insert the expected 2ns RGMII delay. 125 126 * Modifying the PCB design to include a fixed delay (e.g: using a specifically 127 designed serpentine), which may not require software configuration at all. 128 129 Common problems with RGMII delay mismatch 130 ----------------------------------------- 131 132 When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this 133 will most likely result in the clock and data line signals to be unstable when 134 the PHY or MAC take a snapshot of these signals to translate them into logical 135 1 or 0 states and reconstruct the data being transmitted/received. Typical 136 symptoms include: 137 138 * Transmission/reception partially works, and there is frequent or occasional 139 packet loss observed 140 141 * Ethernet MAC may report some or all packets ingressing with a FCS/CRC error, 142 or just discard them all 143 144 * Switching to lower speeds such as 10/100Mbits/sec makes the problem go away 145 (since there is enough setup/hold time in that case) 146 147 Connecting to a PHY 148 =================== 149 150 Sometime during startup, the network driver needs to establish a connection 151 between the PHY device, and the network device. At this time, the PHY's bus 152 and drivers need to all have been loaded, so it is ready for the connection. 153 At this point, there are several ways to connect to the PHY: 154 155 #. The PAL handles everything, and only calls the network driver when 156 the link state changes, so it can react. 157 158 #. The PAL handles everything except interrupts (usually because the 159 controller has the interrupt registers). 160 161 #. The PAL handles everything, but checks in with the driver every second, 162 allowing the network driver to react first to any changes before the PAL 163 does. 164 165 #. The PAL serves only as a library of functions, with the network device 166 manually calling functions to update status, and configure the PHY 167 168 169 Letting the PHY Abstraction Layer do Everything 170 =============================================== 171 172 If you choose option 1 (The hope is that every driver can, but to still be 173 useful to drivers that can't), connecting to the PHY is simple: 174 175 First, you need a function to react to changes in the link state. This 176 function follows this protocol:: 177 178 static void adjust_link(struct net_device *dev); 179 180 Next, you need to know the device name of the PHY connected to this device. 181 The name will look something like, "0:00", where the first number is the 182 bus id, and the second is the PHY's address on that bus. Typically, 183 the bus is responsible for making its ID unique. 184 185 Now, to connect, just call this function:: 186 187 phydev = phy_connect(dev, phy_name, &adjust_link, interface); 188 189 *phydev* is a pointer to the phy_device structure which represents the PHY. 190 If phy_connect is successful, it will return the pointer. dev, here, is the 191 pointer to your net_device. Once done, this function will have started the 192 PHY's software state machine, and registered for the PHY's interrupt, if it 193 has one. The phydev structure will be populated with information about the 194 current state, though the PHY will not yet be truly operational at this 195 point. 196 197 PHY-specific flags should be set in phydev->dev_flags prior to the call 198 to phy_connect() such that the underlying PHY driver can check for flags 199 and perform specific operations based on them. 200 This is useful if the system has put hardware restrictions on 201 the PHY/controller, of which the PHY needs to be aware. 202 203 *interface* is a u32 which specifies the connection type used 204 between the controller and the PHY. Examples are GMII, MII, 205 RGMII, and SGMII. See "PHY interface mode" below. For a full 206 list, see include/linux/phy.h 207 208 Now just make sure that phydev->supported and phydev->advertising have any 209 values pruned from them which don't make sense for your controller (a 10/100 210 controller may be connected to a gigabit capable PHY, so you would need to 211 mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions 212 for these bitfields. Note that you should not SET any bits, except the 213 SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get 214 put into an unsupported state. 215 216 Lastly, once the controller is ready to handle network traffic, you call 217 phy_start(phydev). This tells the PAL that you are ready, and configures the 218 PHY to connect to the network. If the MAC interrupt of your network driver 219 also handles PHY status changes, just set phydev->irq to PHY_MAC_INTERRUPT 220 before you call phy_start and use phy_mac_interrupt() from the network 221 driver. If you don't want to use interrupts, set phydev->irq to PHY_POLL. 222 phy_start() enables the PHY interrupts (if applicable) and starts the 223 phylib state machine. 224 225 When you want to disconnect from the network (even if just briefly), you call 226 phy_stop(phydev). This function also stops the phylib state machine and 227 disables PHY interrupts. 228 229 PHY interface modes 230 =================== 231 232 The PHY interface mode supplied in the phy_connect() family of functions 233 defines the initial operating mode of the PHY interface. This is not 234 guaranteed to remain constant; there are PHYs which dynamically change 235 their interface mode without software interaction depending on the 236 negotiation results. 237 238 Some of the interface modes are described below: 239 240 ``PHY_INTERFACE_MODE_SMII`` 241 This is serial MII, clocked at 125MHz, supporting 100M and 10M speeds. 242 Some details can be found in 243 https://opencores.org/ocsvn/smii/smii/trunk/doc/SMII.pdf 244 245 ``PHY_INTERFACE_MODE_1000BASEX`` 246 This defines the 1000BASE-X single-lane serdes link as defined by the 247 802.3 standard section 36. The link operates at a fixed bit rate of 248 1.25Gbaud using a 10B/8B encoding scheme, resulting in an underlying 249 data rate of 1Gbps. Embedded in the data stream is a 16-bit control 250 word which is used to negotiate the duplex and pause modes with the 251 remote end. This does not include "up-clocked" variants such as 2.5Gbps 252 speeds (see below.) 253 254 ``PHY_INTERFACE_MODE_2500BASEX`` 255 This defines a variant of 1000BASE-X which is clocked 2.5 times as fast 256 as the 802.3 standard, giving a fixed bit rate of 3.125Gbaud. 257 258 ``PHY_INTERFACE_MODE_SGMII`` 259 This is used for Cisco SGMII, which is a modification of 1000BASE-X 260 as defined by the 802.3 standard. The SGMII link consists of a single 261 serdes lane running at a fixed bit rate of 1.25Gbaud with 10B/8B 262 encoding. The underlying data rate is 1Gbps, with the slower speeds of 263 100Mbps and 10Mbps being achieved through replication of each data symbol. 264 The 802.3 control word is re-purposed to send the negotiated speed and 265 duplex information from to the MAC, and for the MAC to acknowledge 266 receipt. This does not include "up-clocked" variants such as 2.5Gbps 267 speeds. 268 269 Note: mismatched SGMII vs 1000BASE-X configuration on a link can 270 successfully pass data in some circumstances, but the 16-bit control 271 word will not be correctly interpreted, which may cause mismatches in 272 duplex, pause or other settings. This is dependent on the MAC and/or 273 PHY behaviour. 274 275 ``PHY_INTERFACE_MODE_5GBASER`` 276 This is the IEEE 802.3 Clause 129 defined 5GBASE-R protocol. It is 277 identical to the 10GBASE-R protocol defined in Clause 49, with the 278 exception that it operates at half the frequency. Please refer to the 279 IEEE standard for the definition. 280 281 ``PHY_INTERFACE_MODE_10GBASER`` 282 This is the IEEE 802.3 Clause 49 defined 10GBASE-R protocol used with 283 various different mediums. Please refer to the IEEE standard for a 284 definition of this. 285 286 Note: 10GBASE-R is just one protocol that can be used with XFI and SFI. 287 XFI and SFI permit multiple protocols over a single SERDES lane, and 288 also defines the electrical characteristics of the signals with a host 289 compliance board plugged into the host XFP/SFP connector. Therefore, 290 XFI and SFI are not PHY interface types in their own right. 291 292 ``PHY_INTERFACE_MODE_10GKR`` 293 This is the IEEE 802.3 Clause 49 defined 10GBASE-R with Clause 73 294 autonegotiation. Please refer to the IEEE standard for further 295 information. 296 297 Note: due to legacy usage, some 10GBASE-R usage incorrectly makes 298 use of this definition. 299 300 ``PHY_INTERFACE_MODE_25GBASER`` 301 This is the IEEE 802.3 PCS Clause 107 defined 25GBASE-R protocol. 302 The PCS is identical to 10GBASE-R, i.e. 64B/66B encoded 303 running 2.5 as fast, giving a fixed bit rate of 25.78125 Gbaud. 304 Please refer to the IEEE standard for further information. 305 306 ``PHY_INTERFACE_MODE_100BASEX`` 307 This defines IEEE 802.3 Clause 24. The link operates at a fixed data 308 rate of 125Mpbs using a 4B/5B encoding scheme, resulting in an underlying 309 data rate of 100Mpbs. 310 311 ``PHY_INTERFACE_MODE_QUSGMII`` 312 This defines the Cisco the Quad USGMII mode, which is the Quad variant of 313 the USGMII (Universal SGMII) link. It's very similar to QSGMII, but uses 314 a Packet Control Header (PCH) instead of the 7 bytes preamble to carry not 315 only the port id, but also so-called "extensions". The only documented 316 extension so-far in the specification is the inclusion of timestamps, for 317 PTP-enabled PHYs. This mode isn't compatible with QSGMII, but offers the 318 same capabilities in terms of link speed and negotiation. 319 320 ``PHY_INTERFACE_MODE_1000BASEKX`` 321 This is 1000BASE-X as defined by IEEE 802.3 Clause 36 with Clause 73 322 autonegotiation. Generally, it will be used with a Clause 70 PMD. To 323 contrast with the 1000BASE-X phy mode used for Clause 38 and 39 PMDs, this 324 interface mode has different autonegotiation and only supports full duplex. 325 326 ``PHY_INTERFACE_MODE_PSGMII`` 327 This is the Penta SGMII mode, it is similar to QSGMII but it combines 5 328 SGMII lines into a single link compared to 4 on QSGMII. 329 330 ``PHY_INTERFACE_MODE_10G_QXGMII`` 331 Represents the 10G-QXGMII PHY-MAC interface as defined by the Cisco USXGMII 332 Multiport Copper Interface document. It supports 4 ports over a 10.3125 GHz 333 SerDes lane, each port having speeds of 2.5G / 1G / 100M / 10M achieved 334 through symbol replication. The PCS expects the standard USXGMII code word. 335 336 Pause frames / flow control 337 =========================== 338 339 The PHY does not participate directly in flow control/pause frames except by 340 making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in 341 MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC 342 controller supports such a thing. Since flow control/pause frames generation 343 involves the Ethernet MAC driver, it is recommended that this driver takes care 344 of properly indicating advertisement and support for such features by setting 345 the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done 346 either before or after phy_connect() and/or as a result of implementing the 347 ethtool::set_pauseparam feature. 348 349 350 Keeping Close Tabs on the PAL 351 ============================= 352 353 It is possible that the PAL's built-in state machine needs a little help to 354 keep your network device and the PHY properly in sync. If so, you can 355 register a helper function when connecting to the PHY, which will be called 356 every second before the state machine reacts to any changes. To do this, you 357 need to manually call phy_attach() and phy_prepare_link(), and then call 358 phy_start_machine() with the second argument set to point to your special 359 handler. 360 361 Currently there are no examples of how to use this functionality, and testing 362 on it has been limited because the author does not have any drivers which use 363 it (they all use option 1). So Caveat Emptor. 364 365 Doing it all yourself 366 ===================== 367 368 There's a remote chance that the PAL's built-in state machine cannot track 369 the complex interactions between the PHY and your network device. If this is 370 so, you can simply call phy_attach(), and not call phy_start_machine or 371 phy_prepare_link(). This will mean that phydev->state is entirely yours to 372 handle (phy_start and phy_stop toggle between some of the states, so you 373 might need to avoid them). 374 375 An effort has been made to make sure that useful functionality can be 376 accessed without the state-machine running, and most of these functions are 377 descended from functions which did not interact with a complex state-machine. 378 However, again, no effort has been made so far to test running without the 379 state machine, so tryer beware. 380 381 Here is a brief rundown of the functions:: 382 383 int phy_read(struct phy_device *phydev, u16 regnum); 384 int phy_write(struct phy_device *phydev, u16 regnum, u16 val); 385 386 Simple read/write primitives. They invoke the bus's read/write function 387 pointers. 388 :: 389 390 void phy_print_status(struct phy_device *phydev); 391 392 A convenience function to print out the PHY status neatly. 393 :: 394 395 void phy_request_interrupt(struct phy_device *phydev); 396 397 Requests the IRQ for the PHY interrupts. 398 :: 399 400 struct phy_device * phy_attach(struct net_device *dev, const char *phy_id, 401 phy_interface_t interface); 402 403 Attaches a network device to a particular PHY, binding the PHY to a generic 404 driver if none was found during bus initialization. 405 :: 406 407 int phy_start_aneg(struct phy_device *phydev); 408 409 Using variables inside the phydev structure, either configures advertising 410 and resets autonegotiation, or disables autonegotiation, and configures 411 forced settings. 412 :: 413 414 static inline int phy_read_status(struct phy_device *phydev); 415 416 Fills the phydev structure with up-to-date information about the current 417 settings in the PHY. 418 :: 419 420 int phy_ethtool_ksettings_set(struct phy_device *phydev, 421 const struct ethtool_link_ksettings *cmd); 422 423 Ethtool convenience functions. 424 :: 425 426 int phy_mii_ioctl(struct phy_device *phydev, 427 struct mii_ioctl_data *mii_data, int cmd); 428 429 The MII ioctl. Note that this function will completely screw up the state 430 machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to 431 use this only to write registers which are not standard, and don't set off 432 a renegotiation. 433 434 PHY Device Drivers 435 ================== 436 437 With the PHY Abstraction Layer, adding support for new PHYs is 438 quite easy. In some cases, no work is required at all! However, 439 many PHYs require a little hand-holding to get up-and-running. 440 441 Generic PHY driver 442 ------------------ 443 444 If the desired PHY doesn't have any errata, quirks, or special 445 features you want to support, then it may be best to not add 446 support, and let the PHY Abstraction Layer's Generic PHY Driver 447 do all of the work. 448 449 Writing a PHY driver 450 -------------------- 451 452 If you do need to write a PHY driver, the first thing to do is 453 make sure it can be matched with an appropriate PHY device. 454 This is done during bus initialization by reading the device's 455 UID (stored in registers 2 and 3), then comparing it to each 456 driver's phy_id field by ANDing it with each driver's 457 phy_id_mask field. Also, it needs a name. Here's an example:: 458 459 static struct phy_driver dm9161_driver = { 460 .phy_id = 0x0181b880, 461 .name = "Davicom DM9161E", 462 .phy_id_mask = 0x0ffffff0, 463 ... 464 } 465 466 Next, you need to specify what features (speed, duplex, autoneg, 467 etc) your PHY device and driver support. Most PHYs support 468 PHY_BASIC_FEATURES, but you can look in include/mii.h for other 469 features. 470 471 Each driver consists of a number of function pointers, documented 472 in include/linux/phy.h under the phy_driver structure. 473 474 Of these, only config_aneg and read_status are required to be 475 assigned by the driver code. The rest are optional. Also, it is 476 preferred to use the generic phy driver's versions of these two 477 functions if at all possible: genphy_read_status and 478 genphy_config_aneg. If this is not possible, it is likely that 479 you only need to perform some actions before and after invoking 480 these functions, and so your functions will wrap the generic 481 ones. 482 483 Feel free to look at the Marvell, Cicada, and Davicom drivers in 484 drivers/net/phy/ for examples (the lxt and qsemi drivers have 485 not been tested as of this writing). 486 487 The PHY's MMD register accesses are handled by the PAL framework 488 by default, but can be overridden by a specific PHY driver if 489 required. This could be the case if a PHY was released for 490 manufacturing before the MMD PHY register definitions were 491 standardized by the IEEE. Most modern PHYs will be able to use 492 the generic PAL framework for accessing the PHY's MMD registers. 493 An example of such usage is for Energy Efficient Ethernet support, 494 implemented in the PAL. This support uses the PAL to access MMD 495 registers for EEE query and configuration if the PHY supports 496 the IEEE standard access mechanisms, or can use the PHY's specific 497 access interfaces if overridden by the specific PHY driver. See 498 the Micrel driver in drivers/net/phy/ for an example of how this 499 can be implemented. 500 501 Board Fixups 502 ============ 503 504 Sometimes the specific interaction between the platform and the PHY requires 505 special handling. For instance, to change where the PHY's clock input is, 506 or to add a delay to account for latency issues in the data path. In order 507 to support such contingencies, the PHY Layer allows platform code to register 508 fixups to be run when the PHY is brought up (or subsequently reset). 509 510 When the PHY Layer brings up a PHY it checks to see if there are any fixups 511 registered for it, matching based on UID (contained in the PHY device's phy_id 512 field) and the bus identifier (contained in phydev->dev.bus_id). Both must 513 match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as 514 wildcards for the bus ID and UID, respectively. 515 516 When a match is found, the PHY layer will invoke the run function associated 517 with the fixup. This function is passed a pointer to the phy_device of 518 interest. It should therefore only operate on that PHY. 519 520 The platform code can either register the fixup using phy_register_fixup():: 521 522 int phy_register_fixup(const char *phy_id, 523 u32 phy_uid, u32 phy_uid_mask, 524 int (*run)(struct phy_device *)); 525 526 Or using one of the two stubs, phy_register_fixup_for_uid() and 527 phy_register_fixup_for_id():: 528 529 int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask, 530 int (*run)(struct phy_device *)); 531 int phy_register_fixup_for_id(const char *phy_id, 532 int (*run)(struct phy_device *)); 533 534 The stubs set one of the two matching criteria, and set the other one to 535 match anything. 536 537 When phy_register_fixup() or \*_for_uid()/\*_for_id() is called at module load 538 time, the module needs to unregister the fixup and free allocated memory when 539 it's unloaded. 540 541 Call one of following function before unloading module:: 542 543 int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask); 544 int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask); 545 int phy_register_fixup_for_id(const char *phy_id); 546 547 Standards 548 ========= 549 550 IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two: 551 http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf 552 553 RGMII v1.3: 554 http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf 555 556 RGMII v2.0: 557 http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.