1 +-------------------------------------------- 2 | wm-FPU-emu an FPU emulator for 80386 and 3 | 4 | Copyright (C) 1992,1993,1994,1995,1996,1997 5 | W. Metzenthen, 22 Par 6 | Australia. E-mail bi 7 | 8 | This program is free software; you can r 9 | it under the terms of the GNU General Pu 10 | published by the Free Software Foundatio 11 | 12 | This program is distributed in the hope 13 | but WITHOUT ANY WARRANTY; without even t 14 | MERCHANTABILITY or FITNESS FOR A PARTICU 15 | GNU General Public License for more deta 16 | 17 | You should have received a copy of the G 18 | along with this program; if not, write t 19 | Foundation, Inc., 675 Mass Ave, Cambridg 20 | 21 +-------------------------------------------- 22 23 24 25 wm-FPU-emu is an FPU emulator for Linux. It is 26 which was my 80387 emulator for early versions 27 msdos); wm-emu387 was in turn based upon emu38 28 DJ Delorie for djgpp. The interface to the Li 29 the original Linux math emulator by Linus Torv 30 31 My target FPU for wm-FPU-emu is that described 32 Programmer's Reference Manual (1992 edition). 33 facets of the functioning of the FPU are not w 34 Reference Manual. The information in the manua 35 with measurements on real 80486's. Unfortunate 36 possible to be sure that all of the peculiarit 37 been discovered, so there is always likely to 38 in the detailed behaviour of the emulator and 39 40 wm-FPU-emu does not implement all of the behav 41 but is very close. See "Limitations" later in 42 some differences. 43 44 Please report bugs, etc to me at: 45 billm@melbpc.org.au 46 or b.metzenthen@medoto.unimelb.edu.au 47 48 For more information on the emulator and on fl 49 my web pages, currently at http://www.suburbi 50 51 52 --Bill Metzenthen 53 December 1999 54 55 56 ----------------------- Internals of wm-FPU-em 57 58 Numeric algorithms: 59 (1) Add, subtract, and multiply. Nothing remar 60 (2) Divide has been tuned to get reasonable pe 61 is not the obvious one which most people s 62 to take advantage of the characteristics o 63 it has been invented many times before I d 64 seen it. It is based upon one of those ide 65 for years without ever bothering to check 66 (3) The sqrt function has been tuned to get go 67 upon Newton's classic method. Performance 68 upon the properties of Newton's method, an 69 structured taking account of the 80386 cha 70 (4) The trig, log, and exp functions are based 71 "optimal" polynomial approximations. My de 72 based upon getting good accuracy with reas 73 (5) The argument reducing code for the trig fu 74 a value of pi which is accurate to more th 75 the reduced argument is accurate to more t 76 to a few pi, and accurate to more than 64 77 even for arguments approaching 2^63. This 78 80486, which uses a value of pi which is a 79 80 The code of the emulator is complicated slight 81 account for a limited form of re-entrancy. Nor 82 emulate each FPU instruction to completion wit 83 However, it may happen that when the emulator 84 memory space, swapping may be needed. In this 85 temporarily suspended while disk i/o takes pla 86 another process may use the emulator, thereby 87 variables. The code which accesses user memory 88 files: 89 fpu_entry.c 90 reg_ld_str.c 91 load_store.c 92 get_address.c 93 errors.c 94 As from version 1.12 of the emulator, no stati 95 (apart from those in the kernel's per-process 96 therefore now fully re-entrant, rather than ha 97 form of re-entrancy which is required by the L 98 99 ----------------------- Limitations of wm-FPU- 100 101 There are a number of differences between the 102 (version 2.01) and the 80486 FPU (apart from b 103 are fewer than those which applied to the 1.xx 104 Some of the more important differences are lis 105 106 The Roundup flag does not have much meaning fo 107 functions and its 80486 value with these funct 108 from its emulator value. 109 110 In a few rare cases the Underflow flag obtaine 111 be different from that obtained with an 80486. 112 following conditions apply simultaneously: 113 (a) the operands have a higher precision than 114 precision control (PC) flags. 115 (b) the underflow exception is masked. 116 (c) the magnitude of the exact result (before 117 (d) the magnitude of the final result (after r 118 (e) the magnitude of the exact result would be 119 operands were rounded to the current preci 120 operation was performed. 121 If all of these apply, the emulator will set t 122 80486 will not. 123 124 NOTE: Certain formats of Extended Real are UNS 125 unsupported by the 80486. They are the Pseudo- 126 and Unnormals. None of these will be generated 127 emulator. Do not use them. The emulator treats 128 detail from the way an 80486 does. 129 130 Self modifying code can cause the emulator to 131 code is: 132 movl %esp,[%ebx] 133 fld1 134 The FPU instruction may be (usually will be) l 135 queue of the CPU before the mov instruction is 136 destination of the 'movl' overlaps the FPU ins 137 in the prefetch queue and memory will be incon 138 instruction is executed. The emulator will be 139 able to find the instruction which caused the 140 exception. For this case, the emulator cannot 141 an 80486DX. 142 143 Handling of the address size override prefix b 144 extensively tested yet. A major problem exists 145 vm86 mode can cause a general protection fault 146 greater than 0xffff appear to be illegal in vm 147 acceptable (and work) in real mode. A small te 148 check the addressing, and which runs successfu 149 crashes dosemu under Linux and also brings Win 150 protection fault message when run under the MS 151 3.1. (The program simply reads data from a val 152 153 The emulator supports 16-bit protected mode, w 154 an 80486DX. A 80486DX will allow some floatin 155 write a few bytes below the lowest address of 156 will not allow this in 16-bit protected mode: 157 allowed to write outside the bounds set by the 158 159 ----------------------- Performance of wm-FPU- 160 161 Speed. 162 ----- 163 164 The speed of floating point computation with t 165 upon instruction mix. Relative performance is 166 which require most computation. The simple ins 167 affected by the FPU instruction trap overhead. 168 169 170 Timing: Some simple timing tests have been mad 171 The times include load/store instructions. All 172 measured on a 33MHz 386 with 64k cache. The Tu 173 ms-dos, the next two columns are for emulators 174 ms-dos extender. The final column is for wm-FP 175 using libm4.0 (hard). 176 177 function Turbo C djgpp 1.06 178 179 + 60.5 154.8 180 - 61.1-65.5 157.3-160.8 181 * 71.0 190.8 182 / 61.2-75.0 261.4-266.9 183 184 sin() 310.8 4692.0 185 cos() 284.4 4855.2 186 tan() 495.0 8807.1 187 atan() 328.9 4866.4 188 189 sqrt() 128.7 crashed 190 log() 413.1-419.1 5103.4-5354.21 191 exp() 479.1 6619.2 192 193 194 The performance under Linux is improved by the 195 The following results show the improvement whi 196 Linux due to the look-ahead code. Also given a 197 original Linux emulator with the 4.1 'soft' li 198 199 [ Linus' note: I changed look-ahead to be the 200 there was no reason not to use it after I h 201 disabled during tracing ] 202 203 wm-FPU-emu w original w 204 look-ahead 'soft' lib 205 + 106.4 190.2 206 - 108.6-111.6 192.4-216.2 207 * 113.4 193.1 208 / 108.8-124.4 700.1-706.2 209 210 sin() 390.5 2642.0 211 cos() 381.5 2767.4 212 tan() 496.5 3153.3 213 atan() 367.2-435.5 2439.4-3396.8 214 215 sqrt() 195.1 4732.5 216 log() 358.0-387.5 3359.2-3390.3 217 exp() 619.3 4046.4 218 219 220 These figures are now somewhat out-of-date. Th 221 progressively slower for most functions as mor 222 have been implemented. 223 224 225 ----------------------- Accuracy of wm-FPU-emu 226 227 228 The accuracy of the emulator is in almost all 229 than that of an Intel 80486 FPU. 230 231 The results of the basic arithmetic functions 232 match those of an 80486 FPU. They are the best 233 these never exceeds 1/2 an lsb. The fprem and 234 return exact results; they have no error. 235 236 237 The following table compares the emulator accu 238 trig and log functions against the Turbo C "em 239 each function was tested at about 400 points. 240 would be 64 bits. The reduced Turbo C accuracy 241 arguments greater than pi/4 can be thought of 242 precision of the argument x; e.g. an argument 243 accurate to 64 bits can result in a relative a 244 about 64 + log2(cos(x)) = 31 bits. 245 246 247 Function Tested x range Worst 248 (relat 249 250 sqrt(x) 1 .. 2 64.1 251 atan(x) 1e-10 .. 200 64.2 252 cos(x) 0 .. pi/2-(1e-10) 64.4 ( 253 64.1 ( 254 sin(x) 1e-10 .. pi/2 64.0 255 tan(x) 1e-10 .. pi/2-(1e-10) 64.0 ( 256 64.1 ( 257 exp(x) 0 .. 1 63.1 * 258 log(x) 1+1e-6 .. 2 63.8 * 259 260 ** The accuracy for exp() and log() is low bec 261 does not compute them directly; two operations 262 263 264 The emulator passes the "paranoia" tests (comp 265 later) for 'float' variables (24 bit precision 266 control is set to 24, 53 or 64 bits, and for ' 267 bit precision numbers) when precision control 268 properly performing FPU cannot pass the 'paran 269 variables when precision control is set to 64 270 271 The code for reducing the argument for the tri 272 fptan and fsincos) has been improved and now e 273 for pi which is accurate to more than 128 bits 274 consequence, the accuracy of these functions f 275 been dramatically improved (and is now very mu 276 FPU). There is also now no degradation of accu 277 for operands close to pi/2. Measured results a 278 definition of accuracy has changed slightly fr 279 above table): 280 281 Function Tested x range Worst re 282 (absolute 283 284 cos(x) 0 .. 9.22e+18 62.0 285 sin(x) 1e-16 .. 9.22e+18 62.1 286 tan(x) 1e-16 .. 9.22e+18 61.8 287 288 It is possible with some effort to find very l 289 give much degraded precision. For example, the 290 8227740058411162616.0 291 is within about 10e-7 of a multiple of pi. To 292 example) of this number to 64 bits precision i 293 have a value of pi which had about 150 bits pr 294 emulator computes the result to about 42.6 bit 295 result is about -9.739715e-8). On the other ha 296 0.01059, which in relative terms is hopelessly 297 298 For arguments close to critical angles (which 299 pi/2) the emulator is more accurate than an 80 300 arguments, the emulator is far more accurate. 301 302 303 Prior to version 1.20 of the emulator, the acc 304 the transcendental functions (in their princip 305 good as the results from an 80486 FPU. From ve 306 has been considerably improved and these funct 307 worst-case results which are better than the w 308 by an 80486 FPU. 309 310 The following table gives the measured results 311 number of randomly selected arguments in each 312 million. The group of three columns gives the 313 accuracy in number of times per million, thus 314 columns shows that an accuracy of between 63.8 315 found at a rate of 133 times per one million m 316 The results show that the fsin, fcos and fptan 317 results which are in error (i.e. less accurate 318 result (which is 64 bits)) for about one per c 319 between -pi/2 and +pi/2. The other instructio 320 frequency of results which are in error. The 321 the worst accuracy which was found (in bits) a 322 of the argument which produced it. 323 324 frequency (per 325 --------------- 326 instr arg range # tests 63.7 63.8 327 bits bits 328 ----- ------------ ------- ---- ---- - 329 fsin (0,pi/2) 547756 0 133 1 330 fcos (0,pi/2) 547563 0 126 1 331 fptan (0,pi/2) 536274 11 267 1 332 fpatan 4 quadrants 517087 0 8 333 fyl2x (0,20) 541861 0 0 334 fyl2xp1 (-.293,.414) 520256 0 0 335 f2xm1 (-1,1) 538847 4 481 336 337 338 Tests performed on an 80486 FPU showed results 339 following table gives the results which were o 340 486DX2/66 (other tests indicate that an Intel 341 identical results). The tests were basically 342 to measure the emulator (the values, being ran 343 the same). The total number of tests for each 344 at the end of the table, in case each about 10 345 Another line of figures at the end of the tabl 346 instructions return results which are in error 347 percent of the arguments tested. 348 349 The numbers in the body of the table give the 350 result of the given accuracy in bits (given in 351 was obtained per one million arguments. For th 352 two columns of results are given: * The second 353 the number cases where the results of the firs 354 positive argument, this shows that this instru 355 results for positive arguments than it does fo 356 cases of fcos and fptan, the first column give 357 cases where arguments greater than 1.5 were re 358 given in the second column. Unlike the emulato 359 results of relatively poor accuracy for these 360 argument approaches pi/2. The table does not s 361 accuracy of the results were less than 62 bits 362 often for fsin and fptan when the argument app 363 accuracy is discussed above in relation to the 364 the accuracy of the value of pi. 365 366 367 bits f2xm1 f2xm1 fpatan fcos fcos fyl2 368 62.0 0 0 0 0 437 369 62.1 0 0 10 0 894 370 62.2 14 0 0 0 1033 371 62.3 57 0 0 0 1202 372 62.4 385 0 0 10 1292 373 62.5 1140 0 0 119 1649 374 62.6 2037 0 0 189 1620 375 62.7 5086 14 0 646 2315 1 376 62.8 8818 86 0 984 3050 5 377 62.9 11340 1355 0 2126 4153 7 378 63.0 15557 4750 0 3319 5376 24 379 63.1 20016 8288 0 4620 6628 51 380 63.2 24945 11127 10 6588 8098 112 381 63.3 25686 12382 69 8774 10682 190 382 63.4 29219 14722 79 11109 12311 309 383 63.5 30458 14936 393 13802 15014 587 384 63.6 32439 16448 1277 17945 19028 1022 385 63.7 35031 16805 4067 23003 23947 1891 386 63.8 33251 15820 7673 24781 25675 2461 387 63.9 33293 16833 18529 28318 29233 3126 388 389 Per cent with error: 390 30.9 3.2 18.5 9. 391 Total arguments tested: 392 70194 70099 101784 100641 100641 10179 393 394 395 ------------------------- Contributors ------- 396 397 A number of people have contributed to the dev 398 emulator, often by just reporting bugs, someti 399 fixes, and a few kind people have provided me 400 or another to an 80486 machine. Contributors i 401 who I may have forgotten, please forgive me): 402 403 Linus Torvalds 404 Tommy.Thorn@daimi.aau.dk 405 Andrew.Tridgell@anu.edu.au 406 Nick Holloway, alfie@dcs.warwick.ac.uk 407 Hermano Moura, moura@dcs.gla.ac.uk 408 Jon Jagger, J.Jagger@scp.ac.uk 409 Lennart Benschop 410 Brian Gallew, geek+@CMU.EDU 411 Thomas Staniszewski, ts3v+@andrew.cmu.edu 412 Martin Howell, mph@plasma.apana.org.au 413 M Saggaf, alsaggaf@athena.mit.edu 414 Peter Barker, PETER@socpsy.sci.fau.edu 415 tom@vlsivie.tuwien.ac.at 416 Dan Russel, russed@rpi.edu 417 Daniel Carosone, danielce@ee.mu.oz.au 418 cae@jpmorgan.com 419 Hamish Coleman, t933093@minyos.xx.rmit.oz.au 420 Bruce Evans, bde@kralizec.zeta.org.au 421 Timo Korvola, Timo.Korvola@hut.fi 422 Rick Lyons, rick@razorback.brisnet.org.au 423 Rick, jrs@world.std.com 424 425 ...and numerous others who responded to my req 426 a real 80486. 427
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.