1 xdrgen - Linux Kernel XDR code generator 2 3 Introduction 4 ------------ 5 6 SunRPC programs are typically specified using a language defined by 7 RFC 4506. In fact, all IETF-published NFS specifications provide a 8 description of the specified protocol using this language. 9 10 Since the 1990's, user space consumers of SunRPC have had access to 11 a tool that could read such XDR specifications and then generate C 12 code that implements the RPC portions of that protocol. This tool is 13 called rpcgen. 14 15 This RPC-level code is code that handles input directly from the 16 network, and thus a high degree of memory safety and sanity checking 17 is needed to help ensure proper levels of security. Bugs in this 18 code can have significant impact on security and performance. 19 20 However, it is code that is repetitive and tedious to write by hand. 21 22 The C code generated by rpcgen makes extensive use of the facilities 23 of the user space TI-RPC library and libc. Furthermore, the dialect 24 of the generated code is very traditional K&R C. 25 26 The Linux kernel's implementation of SunRPC-based protocols hand-roll 27 their XDR implementation. There are two main reasons for this: 28 29 1. libtirpc (and its predecessors) operate only in user space. The 30 kernel's RPC implementation and its API are significantly 31 different than libtirpc. 32 33 2. rpcgen-generated code is believed to be less efficient than code 34 that is hand-written. 35 36 These days, gcc and its kin are capable of optimizing code better 37 than human authors. There are only a few instances where writing 38 XDR code by hand will make a measurable performance different. 39 40 In addition, the current hand-written code in the Linux kernel is 41 difficult to audit and prove that it implements exactly what is in 42 the protocol specification. 43 44 In order to accrue the benefits of machine-generated XDR code in the 45 kernel, a tool is needed that will output C code that works against 46 the kernel's SunRPC implementation rather than libtirpc. 47 48 Enter xdrgen. 49 50 51 Dependencies 52 ------------ 53 54 These dependencies are typically packaged by Linux distributions: 55 56 - python3 57 - python3-lark 58 - python3-jinja2 59 60 These dependencies are available via PyPi: 61 62 - pip install 'lark[interegular]' 63 64 65 XDR Specifications 66 ------------------ 67 68 When adding a new protocol implementation to the kernel, the XDR 69 specification can be derived by feeding a .txt copy of the RFC to 70 the script located in tools/net/sunrpc/extract.sh. 71 72 $ extract.sh < rfc0001.txt > new2.x 73 74 75 Operation 76 --------- 77 78 Once a .x file is available, use xdrgen to generate source and 79 header files containing an implementation of XDR encoding and 80 decoding functions for the specified protocol. 81 82 $ ./xdrgen definitions new2.x > include/linux/sunrpc/xdrgen/new2.h 83 $ ./xdrgen declarations new2.x > new2xdr_gen.h 84 85 and 86 87 $ ./xdrgen source new2.x > new2xdr_gen.c 88 89 The files are ready to use for a server-side protocol implementation, 90 or may be used as a guide for implementing these routines by hand. 91 92 By default, the only comments added to this code are kdoc comments 93 that appear directly in front of the public per-procedure APIs. For 94 deeper introspection, specifying the "--annotate" flag will insert 95 additional comments in the generated code to help readers match the 96 generated code to specific parts of the XDR specification. 97 98 Because the generated code is targeted for the Linux kernel, it 99 is tagged with a GPLv2-only license. 100 101 The xdrgen tool can also provide lexical and syntax checking of 102 an XDR specification: 103 104 $ ./xdrgen lint xdr/new.x 105 106 107 How It Works 108 ------------ 109 110 xdrgen does not use machine learning to generate source code. The 111 translation is entirely deterministic. 112 113 RFC 4506 Section 6 contains a BNF grammar of the XDR specification 114 language. The grammar has been adapted for use by the Python Lark 115 module. 116 117 The xdr.ebnf file in this directory contains the grammar used to 118 parse XDR specifications. xdrgen configures Lark using the grammar 119 in xdr.ebnf. Lark parses the target XDR specification using this 120 grammar, creating a parse tree. 121 122 xdrgen then transforms the parse tree into an abstract syntax tree. 123 This tree is passed to a series of code generators. 124 125 The generators are implemented as Python classes residing in the 126 generators/ directory. Each generator emits code created from Jinja2 127 templates stored in the templates/ directory. 128 129 The source code is generated in the same order in which they appear 130 in the specification to ensure the generated code compiles. This 131 conforms with the behavior of rpcgen. 132 133 xdrgen assumes that the generated source code is further compiled by 134 a compiler that can optimize in a number of ways, including: 135 136 - Unused functions are discarded (ie, not added to the executable) 137 138 - Aggressive function inlining removes unnecessary stack frames 139 140 - Single-arm switch statements are replaced by a single conditional 141 branch 142 143 And so on. 144 145 146 Pragmas 147 ------- 148 149 Pragma directives specify exceptions to the normal generation of 150 encoding and decoding functions. Currently one directive is 151 implemented: "public". 152 153 Pragma exclude 154 ------ ------- 155 156 pragma exclude <RPC procedure> ; 157 158 In some cases, a procedure encoder or decoder function might need 159 special processing that cannot be automatically generated. The 160 automatically-generated functions might conflict or interfere with 161 the hand-rolled function. To avoid editing the generated source code 162 by hand, a pragma can specify that the procedure's encoder and 163 decoder functions are not included in the generated header and 164 source. 165 166 For example: 167 168 pragma exclude NFSPROC3_READDIRPLUS; 169 170 Excludes the decoder function for the READDIRPLUS argument and the 171 encoder function for the READDIRPLUS result. 172 173 Note that because data item encoder and decoder functions are 174 defined "static __maybe_unused", subsequent compilation 175 automatically excludes data item encoder and decoder functions that 176 are used only by excluded procedure. 177 178 Pragma header 179 ------ ------ 180 181 pragma header <string> ; 182 183 Provide a name to use for the header file. For example: 184 185 pragma header nlm4; 186 187 Adds 188 189 #include "nlm4xdr_gen.h" 190 191 to the generated source file. 192 193 Pragma public 194 ------ ------ 195 196 pragma public <XDR data item> ; 197 198 Normally XDR encoder and decoder functions are "static". In case an 199 implementer wants to call these functions from other source code, 200 s/he can add a public pragma in the input .x file to indicate a set 201 of functions that should get a prototype in the generated header, 202 and the function definitions will not be declared static. 203 204 For example: 205 206 pragma public nfsstat3; 207 208 Adds these prototypes in the generated header: 209 210 bool xdrgen_decode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 *ptr); 211 bool xdrgen_encode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 value); 212 213 And, in the generated source code, both of these functions appear 214 without the "static __maybe_unused" modifiers. 215 216 217 Future Work 218 ----------- 219 220 Finish implementing XDR pointer and list types. 221 222 Generate client-side procedure functions 223 224 Expand the README into a user guide similar to rpcgen(1) 225 226 Add more pragma directives: 227 228 * @pages -- use xdr_read/write_pages() for the specified opaque 229 field 230 * @skip -- do not decode, but rather skip, the specified argument 231 field 232 233 Enable something like a #include to dynamically insert the content 234 of other specification files 235 236 Properly support line-by-line pass-through via the "%" decorator 237 238 Build a unit test suite for verifying translation of XDR language 239 into compilable code 240 241 Add a command-line option to insert trace_printk call sites in the 242 generated source code, for improved (temporary) observability 243 244 Generate kernel Rust code as well as C code
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.