~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/networking/kcm.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 .. SPDX-License-Identifier: GPL-2.0
  2 
  3 =============================
  4 Kernel Connection Multiplexor
  5 =============================
  6 
  7 Kernel Connection Multiplexor (KCM) is a mechanism that provides a message based
  8 interface over TCP for generic application protocols. With KCM an application
  9 can efficiently send and receive application protocol messages over TCP using
 10 datagram sockets.
 11 
 12 KCM implements an NxM multiplexor in the kernel as diagrammed below::
 13 
 14     +------------+   +------------+   +------------+   +------------+
 15     | KCM socket |   | KCM socket |   | KCM socket |   | KCM socket |
 16     +------------+   +------------+   +------------+   +------------+
 17         |                 |               |                |
 18         +-----------+     |               |     +----------+
 19                     |     |               |     |
 20                 +----------------------------------+
 21                 |           Multiplexor            |
 22                 +----------------------------------+
 23                     |   |           |           |  |
 24         +---------+   |           |           |  ------------+
 25         |             |           |           |              |
 26     +----------+  +----------+  +----------+  +----------+ +----------+
 27     |  Psock   |  |  Psock   |  |  Psock   |  |  Psock   | |  Psock   |
 28     +----------+  +----------+  +----------+  +----------+ +----------+
 29         |              |           |            |             |
 30     +----------+  +----------+  +----------+  +----------+ +----------+
 31     | TCP sock |  | TCP sock |  | TCP sock |  | TCP sock | | TCP sock |
 32     +----------+  +----------+  +----------+  +----------+ +----------+
 33 
 34 KCM sockets
 35 ===========
 36 
 37 The KCM sockets provide the user interface to the multiplexor. All the KCM sockets
 38 bound to a multiplexor are considered to have equivalent function, and I/O
 39 operations in different sockets may be done in parallel without the need for
 40 synchronization between threads in userspace.
 41 
 42 Multiplexor
 43 ===========
 44 
 45 The multiplexor provides the message steering. In the transmit path, messages
 46 written on a KCM socket are sent atomically on an appropriate TCP socket.
 47 Similarly, in the receive path, messages are constructed on each TCP socket
 48 (Psock) and complete messages are steered to a KCM socket.
 49 
 50 TCP sockets & Psocks
 51 ====================
 52 
 53 TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated
 54 for each bound TCP socket, this structure holds the state for constructing
 55 messages on receive as well as other connection specific information for KCM.
 56 
 57 Connected mode semantics
 58 ========================
 59 
 60 Each multiplexor assumes that all attached TCP connections are to the same
 61 destination and can use the different connections for load balancing when
 62 transmitting. The normal send and recv calls (include sendmmsg and recvmmsg)
 63 can be used to send and receive messages from the KCM socket.
 64 
 65 Socket types
 66 ============
 67 
 68 KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types.
 69 
 70 Message delineation
 71 -------------------
 72 
 73 Messages are sent over a TCP stream with some application protocol message
 74 format that typically includes a header which frames the messages. The length
 75 of a received message can be deduced from the application protocol header
 76 (often just a simple length field).
 77 
 78 A TCP stream must be parsed to determine message boundaries. Berkeley Packet
 79 Filter (BPF) is used for this. When attaching a TCP socket to a multiplexor a
 80 BPF program must be specified. The program is called at the start of receiving
 81 a new message and is given an skbuff that contains the bytes received so far.
 82 It parses the message header and returns the length of the message. Given this
 83 information, KCM will construct the message of the stated length and deliver it
 84 to a KCM socket.
 85 
 86 TCP socket management
 87 ---------------------
 88 
 89 When a TCP socket is attached to a KCM multiplexor data ready (POLLIN) and
 90 write space available (POLLOUT) events are handled by the multiplexor. If there
 91 is a state change (disconnection) or other error on a TCP socket, an error is
 92 posted on the TCP socket so that a POLLERR event happens and KCM discontinues
 93 using the socket. When the application gets the error notification for a
 94 TCP socket, it should unattach the socket from KCM and then handle the error
 95 condition (the typical response is to close the socket and create a new
 96 connection if necessary).
 97 
 98 KCM limits the maximum receive message size to be the size of the receive
 99 socket buffer on the attached TCP socket (the socket buffer size can be set by
100 SO_RCVBUF). If the length of a new message reported by the BPF program is
101 greater than this limit a corresponding error (EMSGSIZE) is posted on the TCP
102 socket. The BPF program may also enforce a maximum messages size and report an
103 error when it is exceeded.
104 
105 A timeout may be set for assembling messages on a receive socket. The timeout
106 value is taken from the receive timeout of the attached TCP socket (this is set
107 by SO_RCVTIMEO). If the timer expires before assembly is complete an error
108 (ETIMEDOUT) is posted on the socket.
109 
110 User interface
111 ==============
112 
113 Creating a multiplexor
114 ----------------------
115 
116 A new multiplexor and initial KCM socket is created by a socket call::
117 
118   socket(AF_KCM, type, protocol)
119 
120 - type is either SOCK_DGRAM or SOCK_SEQPACKET
121 - protocol is KCMPROTO_CONNECTED
122 
123 Cloning KCM sockets
124 -------------------
125 
126 After the first KCM socket is created using the socket call as described
127 above, additional sockets for the multiplexor can be created by cloning
128 a KCM socket. This is accomplished by an ioctl on a KCM socket::
129 
130   /* From linux/kcm.h */
131   struct kcm_clone {
132         int fd;
133   };
134 
135   struct kcm_clone info;
136 
137   memset(&info, 0, sizeof(info));
138 
139   err = ioctl(kcmfd, SIOCKCMCLONE, &info);
140 
141   if (!err)
142     newkcmfd = info.fd;
143 
144 Attach transport sockets
145 ------------------------
146 
147 Attaching of transport sockets to a multiplexor is performed by calling an
148 ioctl on a KCM socket for the multiplexor. e.g.::
149 
150   /* From linux/kcm.h */
151   struct kcm_attach {
152         int fd;
153         int bpf_fd;
154   };
155 
156   struct kcm_attach info;
157 
158   memset(&info, 0, sizeof(info));
159 
160   info.fd = tcpfd;
161   info.bpf_fd = bpf_prog_fd;
162 
163   ioctl(kcmfd, SIOCKCMATTACH, &info);
164 
165 The kcm_attach structure contains:
166 
167   - fd: file descriptor for TCP socket being attached
168   - bpf_prog_fd: file descriptor for compiled BPF program downloaded
169 
170 Unattach transport sockets
171 --------------------------
172 
173 Unattaching a transport socket from a multiplexor is straightforward. An
174 "unattach" ioctl is done with the kcm_unattach structure as the argument::
175 
176   /* From linux/kcm.h */
177   struct kcm_unattach {
178         int fd;
179   };
180 
181   struct kcm_unattach info;
182 
183   memset(&info, 0, sizeof(info));
184 
185   info.fd = cfd;
186 
187   ioctl(fd, SIOCKCMUNATTACH, &info);
188 
189 Disabling receive on KCM socket
190 -------------------------------
191 
192 A setsockopt is used to disable or enable receiving on a KCM socket.
193 When receive is disabled, any pending messages in the socket's
194 receive buffer are moved to other sockets. This feature is useful
195 if an application thread knows that it will be doing a lot of
196 work on a request and won't be able to service new messages for a
197 while. Example use::
198 
199   int val = 1;
200 
201   setsockopt(kcmfd, SOL_KCM, KCM_RECV_DISABLE, &val, sizeof(val))
202 
203 BFP programs for message delineation
204 ------------------------------------
205 
206 BPF programs can be compiled using the BPF LLVM backend. For example,
207 the BPF program for parsing Thrift is::
208 
209   #include "bpf.h" /* for __sk_buff */
210   #include "bpf_helpers.h" /* for load_word intrinsic */
211 
212   SEC("socket_kcm")
213   int bpf_prog1(struct __sk_buff *skb)
214   {
215        return load_word(skb, 0) + 4;
216   }
217 
218   char _license[] SEC("license") = "GPL";
219 
220 Use in applications
221 ===================
222 
223 KCM accelerates application layer protocols. Specifically, it allows
224 applications to use a message based interface for sending and receiving
225 messages. The kernel provides necessary assurances that messages are sent
226 and received atomically. This relieves much of the burden applications have
227 in mapping a message based protocol onto the TCP stream. KCM also make
228 application layer messages a unit of work in the kernel for the purposes of
229 steering and scheduling, which in turn allows a simpler networking model in
230 multithreaded applications.
231 
232 Configurations
233 --------------
234 
235 In an Nx1 configuration, KCM logically provides multiple socket handles
236 to the same TCP connection. This allows parallelism between in I/O
237 operations on the TCP socket (for instance copyin and copyout of data is
238 parallelized). In an application, a KCM socket can be opened for each
239 processing thread and inserted into the epoll (similar to how SO_REUSEPORT
240 is used to allow multiple listener sockets on the same port).
241 
242 In a MxN configuration, multiple connections are established to the
243 same destination. These are used for simple load balancing.
244 
245 Message batching
246 ----------------
247 
248 The primary purpose of KCM is load balancing between KCM sockets and hence
249 threads in a nominal use case. Perfect load balancing, that is steering
250 each received message to a different KCM socket or steering each sent
251 message to a different TCP socket, can negatively impact performance
252 since this doesn't allow for affinities to be established. Balancing
253 based on groups, or batches of messages, can be beneficial for performance.
254 
255 On transmit, there are three ways an application can batch (pipeline)
256 messages on a KCM socket.
257 
258   1) Send multiple messages in a single sendmmsg.
259   2) Send a group of messages each with a sendmsg call, where all messages
260      except the last have MSG_BATCH in the flags of sendmsg call.
261   3) Create "super message" composed of multiple messages and send this
262      with a single sendmsg.
263 
264 On receive, the KCM module attempts to queue messages received on the
265 same KCM socket during each TCP ready callback. The targeted KCM socket
266 changes at each receive ready callback on the KCM socket. The application
267 does not need to configure this.
268 
269 Error handling
270 --------------
271 
272 An application should include a thread to monitor errors raised on
273 the TCP connection. Normally, this will be done by placing each
274 TCP socket attached to a KCM multiplexor in epoll set for POLLERR
275 event. If an error occurs on an attached TCP socket, KCM sets an EPIPE
276 on the socket thus waking up the application thread. When the application
277 sees the error (which may just be a disconnect) it should unattach the
278 socket from KCM and then close it. It is assumed that once an error is
279 posted on the TCP socket the data stream is unrecoverable (i.e. an error
280 may have occurred in the middle of receiving a message).
281 
282 TCP connection monitoring
283 -------------------------
284 
285 In KCM there is no means to correlate a message to the TCP socket that
286 was used to send or receive the message (except in the case there is
287 only one attached TCP socket). However, the application does retain
288 an open file descriptor to the socket so it will be able to get statistics
289 from the socket which can be used in detecting issues (such as high
290 retransmissions on the socket).

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php