~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/networking/netdevices.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 =====================================
  4 Network Devices, the Kernel, and You!
  5 =====================================
  6 
  7 
  8 Introduction
  9 ============
 10 The following is a random collection of documentation regarding
 11 network devices.
 12 
 13 struct net_device lifetime rules
 14 ================================
 15 Network device structures need to persist even after module is unloaded and
 16 must be allocated with alloc_netdev_mqs() and friends.
 17 If device has registered successfully, it will be freed on last use
 18 by free_netdev(). This is required to handle the pathological case cleanly
 19 (example: ``rmmod mydriver </sys/class/net/myeth/mtu``)
 20 
 21 alloc_netdev_mqs() / alloc_netdev() reserve extra space for driver
 22 private data which gets freed when the network device is freed. If
 23 separately allocated data is attached to the network device
 24 (netdev_priv()) then it is up to the module exit handler to free that.
 25 
 26 There are two groups of APIs for registering struct net_device.
 27 First group can be used in normal contexts where ``rtnl_lock`` is not already
 28 held: register_netdev(), unregister_netdev().
 29 Second group can be used when ``rtnl_lock`` is already held:
 30 register_netdevice(), unregister_netdevice(), free_netdevice().
 31 
 32 Simple drivers
 33 --------------
 34 
 35 Most drivers (especially device drivers) handle lifetime of struct net_device
 36 in context where ``rtnl_lock`` is not held (e.g. driver probe and remove paths).
 37 
 38 In that case the struct net_device registration is done using
 39 the register_netdev(), and unregister_netdev() functions:
 40 
 41 .. code-block:: c
 42 
 43   int probe()
 44   {
 45     struct my_device_priv *priv;
 46     int err;
 47 
 48     dev = alloc_netdev_mqs(...);
 49     if (!dev)
 50       return -ENOMEM;
 51     priv = netdev_priv(dev);
 52 
 53     /* ... do all device setup before calling register_netdev() ...
 54      */
 55 
 56     err = register_netdev(dev);
 57     if (err)
 58       goto err_undo;
 59 
 60     /* net_device is visible to the user! */
 61 
 62   err_undo:
 63     /* ... undo the device setup ... */
 64     free_netdev(dev);
 65     return err;
 66   }
 67 
 68   void remove()
 69   {
 70     unregister_netdev(dev);
 71     free_netdev(dev);
 72   }
 73 
 74 Note that after calling register_netdev() the device is visible in the system.
 75 Users can open it and start sending / receiving traffic immediately,
 76 or run any other callback, so all initialization must be done prior to
 77 registration.
 78 
 79 unregister_netdev() closes the device and waits for all users to be done
 80 with it. The memory of struct net_device itself may still be referenced
 81 by sysfs but all operations on that device will fail.
 82 
 83 free_netdev() can be called after unregister_netdev() returns on when
 84 register_netdev() failed.
 85 
 86 Device management under RTNL
 87 ----------------------------
 88 
 89 Registering struct net_device while in context which already holds
 90 the ``rtnl_lock`` requires extra care. In those scenarios most drivers
 91 will want to make use of struct net_device's ``needs_free_netdev``
 92 and ``priv_destructor`` members for freeing of state.
 93 
 94 Example flow of netdev handling under ``rtnl_lock``:
 95 
 96 .. code-block:: c
 97 
 98   static void my_setup(struct net_device *dev)
 99   {
100     dev->needs_free_netdev = true;
101   }
102 
103   static void my_destructor(struct net_device *dev)
104   {
105     some_obj_destroy(priv->obj);
106     some_uninit(priv);
107   }
108 
109   int create_link()
110   {
111     struct my_device_priv *priv;
112     int err;
113 
114     ASSERT_RTNL();
115 
116     dev = alloc_netdev(sizeof(*priv), "net%d", NET_NAME_UNKNOWN, my_setup);
117     if (!dev)
118       return -ENOMEM;
119     priv = netdev_priv(dev);
120 
121     /* Implicit constructor */
122     err = some_init(priv);
123     if (err)
124       goto err_free_dev;
125 
126     priv->obj = some_obj_create();
127     if (!priv->obj) {
128       err = -ENOMEM;
129       goto err_some_uninit;
130     }
131     /* End of constructor, set the destructor: */
132     dev->priv_destructor = my_destructor;
133 
134     err = register_netdevice(dev);
135     if (err)
136       /* register_netdevice() calls destructor on failure */
137       goto err_free_dev;
138 
139     /* If anything fails now unregister_netdevice() (or unregister_netdev())
140      * will take care of calling my_destructor and free_netdev().
141      */
142 
143     return 0;
144 
145   err_some_uninit:
146     some_uninit(priv);
147   err_free_dev:
148     free_netdev(dev);
149     return err;
150   }
151 
152 If struct net_device.priv_destructor is set it will be called by the core
153 some time after unregister_netdevice(), it will also be called if
154 register_netdevice() fails. The callback may be invoked with or without
155 ``rtnl_lock`` held.
156 
157 There is no explicit constructor callback, driver "constructs" the private
158 netdev state after allocating it and before registration.
159 
160 Setting struct net_device.needs_free_netdev makes core call free_netdevice()
161 automatically after unregister_netdevice() when all references to the device
162 are gone. It only takes effect after a successful call to register_netdevice()
163 so if register_netdevice() fails driver is responsible for calling
164 free_netdev().
165 
166 free_netdev() is safe to call on error paths right after unregister_netdevice()
167 or when register_netdevice() fails. Parts of netdev (de)registration process
168 happen after ``rtnl_lock`` is released, therefore in those cases free_netdev()
169 will defer some of the processing until ``rtnl_lock`` is released.
170 
171 Devices spawned from struct rtnl_link_ops should never free the
172 struct net_device directly.
173 
174 .ndo_init and .ndo_uninit
175 ~~~~~~~~~~~~~~~~~~~~~~~~~
176 
177 ``.ndo_init`` and ``.ndo_uninit`` callbacks are called during net_device
178 registration and de-registration, under ``rtnl_lock``. Drivers can use
179 those e.g. when parts of their init process need to run under ``rtnl_lock``.
180 
181 ``.ndo_init`` runs before device is visible in the system, ``.ndo_uninit``
182 runs during de-registering after device is closed but other subsystems
183 may still have outstanding references to the netdevice.
184 
185 MTU
186 ===
187 Each network device has a Maximum Transfer Unit. The MTU does not
188 include any link layer protocol overhead. Upper layer protocols must
189 not pass a socket buffer (skb) to a device to transmit with more data
190 than the mtu. The MTU does not include link layer header overhead, so
191 for example on Ethernet if the standard MTU is 1500 bytes used, the
192 actual skb will contain up to 1514 bytes because of the Ethernet
193 header. Devices should allow for the 4 byte VLAN header as well.
194 
195 Segmentation Offload (GSO, TSO) is an exception to this rule.  The
196 upper layer protocol may pass a large socket buffer to the device
197 transmit routine, and the device will break that up into separate
198 packets based on the current MTU.
199 
200 MTU is symmetrical and applies both to receive and transmit. A device
201 must be able to receive at least the maximum size packet allowed by
202 the MTU. A network device may use the MTU as mechanism to size receive
203 buffers, but the device should allow packets with VLAN header. With
204 standard Ethernet mtu of 1500 bytes, the device should allow up to
205 1518 byte packets (1500 + 14 header + 4 tag).  The device may either:
206 drop, truncate, or pass up oversize packets, but dropping oversize
207 packets is preferred.
208 
209 
210 struct net_device synchronization rules
211 =======================================
212 ndo_open:
213         Synchronization: rtnl_lock() semaphore.
214         Context: process
215 
216 ndo_stop:
217         Synchronization: rtnl_lock() semaphore.
218         Context: process
219         Note: netif_running() is guaranteed false
220 
221 ndo_do_ioctl:
222         Synchronization: rtnl_lock() semaphore.
223         Context: process
224 
225         This is only called by network subsystems internally,
226         not by user space calling ioctl as it was in before
227         linux-5.14.
228 
229 ndo_siocbond:
230         Synchronization: rtnl_lock() semaphore.
231         Context: process
232 
233         Used by the bonding driver for the SIOCBOND family of
234         ioctl commands.
235 
236 ndo_siocwandev:
237         Synchronization: rtnl_lock() semaphore.
238         Context: process
239 
240         Used by the drivers/net/wan framework to handle
241         the SIOCWANDEV ioctl with the if_settings structure.
242 
243 ndo_siocdevprivate:
244         Synchronization: rtnl_lock() semaphore.
245         Context: process
246 
247         This is used to implement SIOCDEVPRIVATE ioctl helpers.
248         These should not be added to new drivers, so don't use.
249 
250 ndo_eth_ioctl:
251         Synchronization: rtnl_lock() semaphore.
252         Context: process
253 
254 ndo_get_stats:
255         Synchronization: rtnl_lock() semaphore, or RCU.
256         Context: atomic (can't sleep under RCU)
257 
258 ndo_start_xmit:
259         Synchronization: __netif_tx_lock spinlock.
260 
261         When the driver sets NETIF_F_LLTX in dev->features this will be
262         called without holding netif_tx_lock. In this case the driver
263         has to lock by itself when needed.
264         The locking there should also properly protect against
265         set_rx_mode. WARNING: use of NETIF_F_LLTX is deprecated.
266         Don't use it for new drivers.
267 
268         Context: Process with BHs disabled or BH (timer),
269                  will be called with interrupts disabled by netconsole.
270 
271         Return codes:
272 
273         * NETDEV_TX_OK everything ok.
274         * NETDEV_TX_BUSY Cannot transmit packet, try later
275           Usually a bug, means queue start/stop flow control is broken in
276           the driver. Note: the driver must NOT put the skb in its DMA ring.
277 
278 ndo_tx_timeout:
279         Synchronization: netif_tx_lock spinlock; all TX queues frozen.
280         Context: BHs disabled
281         Notes: netif_queue_stopped() is guaranteed true
282 
283 ndo_set_rx_mode:
284         Synchronization: netif_addr_lock spinlock.
285         Context: BHs disabled
286 
287 struct napi_struct synchronization rules
288 ========================================
289 napi->poll:
290         Synchronization:
291                 NAPI_STATE_SCHED bit in napi->state.  Device
292                 driver's ndo_stop method will invoke napi_disable() on
293                 all NAPI instances which will do a sleeping poll on the
294                 NAPI_STATE_SCHED napi->state bit, waiting for all pending
295                 NAPI activity to cease.
296 
297         Context:
298                  softirq
299                  will be called with interrupts disabled by netconsole.

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php