1 | 2 | setox.sa 3.1 12/10/90 3 | 4 | The entry point setox computes the exp 5 | setoxd does the same except the input 6 | number. setoxm1 computes exp(X)-1, and 7 | exp(X)-1 for denormalized X. 8 | 9 | INPUT 10 | ----- 11 | Double-extended value in memory locati 12 | register a0. 13 | 14 | OUTPUT 15 | ------ 16 | exp(X) or exp(X)-1 returned in floatin 17 | 18 | ACCURACY and MONOTONICITY 19 | ------------------------- 20 | The returned result is within 0.85 ulp 21 | within 0.5001 ulp to 53 bits if the re 22 | to double precision. The result is pro 23 | precision. 24 | 25 | SPEED 26 | ----- 27 | Two timings are measured, both in the 28 | first one is measured when the functio 29 | (so the instructions and data are not 30 | second one is measured when the functi 31 | input argument. 32 | 33 | The program setox takes approximately 34 | argument X whose magnitude is less tha 35 | is the usual situation. For the less c 36 | depending on their values, the program 37 | but no worse than 10% slower even in t 38 | 39 | The program setoxm1 takes approximatel 40 | argument X, 0.25 <= |X| < 70log2. For 41 | approximately ??? / ??? cycles. For th 42 | depending on their values, the program 43 | but no worse than 10% slower even in t 44 | 45 | ALGORITHM and IMPLEMENTATION NOTES 46 | ---------------------------------- 47 | 48 | setoxd 49 | ------ 50 | Step 1. Set ans := 1.0 51 | 52 | Step 2. Return ans := ans + sign(X)*2 53 | Notes: This will always generate one 54 | 55 | 56 | setox 57 | ----- 58 | 59 | Step 1. Filter out extreme cases of in 60 | 1.1 If |X| >= 2^(-65), go 61 | 1.2 Go to Step 7. 62 | 1.3 If |X| < 16380 log(2), 63 | 1.4 Go to Step 8. 64 | Notes: The usual case should take the 65 | To avoid the use of floating- 66 | compact representation of |X| 67 | 32-bit integer, the upper (mo 68 | the sign and biased exponent 69 | bits are the 16 most signific 70 | explicit bit) bits of |X|. Co 71 | in Steps 1.1 and 1.3 can be p 72 | Note also that the constant 1 73 | is also in the compact form. 74 | to Step 2 guarantees |X| < 16 75 | to have a small number of cas 76 | but close to, 16380 log(2) an 77 | taken. 78 | 79 | Step 2. Calculate N = round-to-nearest 80 | 2.1 Set AdjFlag := 0 (indi 81 | 2.2 N := round-to-nearest- 82 | 2.3 Calculate J = N 83 | 2.4 Calculate M = (N 84 | 2.5 Calculate the address 85 | 2.6 Create the value Scale 86 | Notes: The calculation in 2.2 is real 87 | 88 | Z := X * constant 89 | N := round-to-nearest- 90 | 91 | where 92 | 93 | constant := single-pre 94 | 95 | Using a single-precision cons 96 | Another effect of using a sin 97 | that the calculated value Z i 98 | 99 | Z = X*(64/log2)*(1+eps 100 | 101 | This error has to be consider 102 | 103 | Step 3. Calculate X - N*log2/64. 104 | 3.1 R := X + N*L1, where L 105 | 3.2 R := R + N*L2, L2 := e 106 | Notes: a) The way L1 and L2 are chose 107 | the value -log2/64 108 | b) N*L1 is exact because N is 109 | L1 is no longer than 24 bits. 110 | c) The calculation X+N*L1 is 111 | Thus, R is practically X+N(L1 112 | d) It is important to estimat 113 | Step 3.2. 114 | 115 | N = rnd-to-int( X*64/l 116 | X*64/log2 (1+eps) 117 | X*64/log2 - N = 118 | X - N*log2/64 = 119 | 120 | 121 | Now |X| <= 16446 log2, thus 122 | 123 | |X - N*log2/64| <= (0. 124 | <= 0.5 125 | This bound will be used in St 126 | 127 | Step 4. Approximate exp(R)-1 by a poly 128 | p = R + R*R*(A1 + R*(A 129 | Notes: a) In order to reduce memory a 130 | made as "short" as possible: 131 | are single precision; A2 and 132 | b) Even with the restrictions 133 | |p - (exp(R)-1)| < 2^( 134 | Note that 0.0062 is slightly 135 | c) To fully utilize the pipel 136 | two independent pieces of rou 137 | p = [ R + R*S*(A2 + S* 138 | [ S*(A1 + S*(A 139 | where S = R*R. 140 | 141 | Step 5. Compute 2^(J/64)*exp(R) = 2^(J 142 | ans := T + ( T 143 | where T and t are the stored 144 | Notes: 2^(J/64) is stored as T and t 145 | 2^(J/64) to roughly 85 bits; 146 | and t is in single precision. 147 | to 62 bits so that the last t 148 | reason for such a special for 149 | will all be exact --- a prope 150 | more accurate computation of 151 | 152 | Step 6. Reconstruction of exp(X) 153 | exp(X) = 2^M * 2^(J/64 154 | 6.1 If AdjFlag = 0, go to 155 | 6.2 ans := ans * AdjScale 156 | 6.3 Restore the user FPCR 157 | 6.4 Return ans := ans * Sc 158 | Notes: If AdjFlag = 0, we have X = Ml 159 | |M| <= 16380, and Scale = 2^M 160 | neither overflow nor underflo 161 | means that 162 | X = (M1+M)log2 + Jlog2 163 | Hence, exp(X) may overflow or 164 | When that is the case, AdjSca 165 | approximately M. Thus 6.2 wil 166 | Possible exception in 6.4 is 167 | The inexact exception is not 168 | one can argue that the inexac 169 | raised, to simulate that exce 170 | flag is worth in practical us 171 | 172 | Step 7. Return 1 + X. 173 | 7.1 ans := X 174 | 7.2 Restore user FPCR. 175 | 7.3 Return ans := 1 + ans. 176 | Notes: For non-zero X, the inexact ex 177 | raised by 7.3. That is the on 178 | Note also that we use the FMO 179 | in Step 7.1 to avoid unnecess 180 | the FMOVEM may not seem relev 181 | the precaution will be useful 182 | this code where the separate 183 | will be done away with.) 184 | 185 | Step 8. Handle exp(X) where |X| >= 163 186 | 8.1 If |X| > 16480 log2, g 187 | (mimic 2.2 - 2.6) 188 | 8.2 N := round-to-integer( 189 | 8.3 Calculate J = N mod 64 190 | 8.4 K := (N-J)/64, M1 := t 191 | 8.5 Calculate the address 192 | 8.6 Create the values Scal 193 | 8.7 Go to Step 3. 194 | Notes: Refer to notes for 2.2 - 2.6. 195 | 196 | Step 9. Handle exp(X), |X| > 16480 log 197 | 9.1 If X < 0, go to 9.3 198 | 9.2 ans := Huge, go to 9.4 199 | 9.3 ans := Tiny. 200 | 9.4 Restore user FPCR. 201 | 9.5 Return ans := ans * an 202 | Notes: Exp(X) will surely overflow or 203 | X's sign. "Huge" and "Tiny" a 204 | extended-precision numbers wh 205 | with an inexact result. Thus, 206 | inexact together with either 207 | 208 | 209 | setoxm1d 210 | -------- 211 | 212 | Step 1. Set ans := 0 213 | 214 | Step 2. Return ans := X + ans. Exit. 215 | Notes: This will return X with the ap 216 | precision prescribed by the u 217 | 218 | setoxm1 219 | ------- 220 | 221 | Step 1. Check |X| 222 | 1.1 If |X| >= 1/4, go to S 223 | 1.2 Go to Step 7. 224 | 1.3 If |X| < 70 log(2), go 225 | 1.4 Go to Step 10. 226 | Notes: The usual case should take the 227 | However, it is conceivable |X 228 | because EXPM1 is intended to 229 | when |X| is small. For furthe 230 | see the notes on Step 1 of se 231 | 232 | Step 2. Calculate N = round-to-nearest 233 | 2.1 N := round-to-nearest- 234 | 2.2 Calculate J = N 235 | 2.3 Calculate M = (N 236 | 2.4 Calculate the address 237 | 2.5 Create the values Sc = 238 | Notes: See the notes on Step 2 of set 239 | 240 | Step 3. Calculate X - N*log2/64. 241 | 3.1 R := X + N*L1, where L 242 | 3.2 R := R + N*L2, L2 := e 243 | Notes: Applying the analysis of Step 244 | shows that |R| <= 0.0055 (not 245 | this case). 246 | 247 | Step 4. Approximate exp(R)-1 by a poly 248 | p = R+R*R*(A1+R*(A2+R* 249 | Notes: a) In order to reduce memory a 250 | made as "short" as possible: 251 | are single precision; A2, A3 252 | b) Even with the restriction 253 | |p - (exp(R)-1)| < 254 | for all |R| <= 0.0055. 255 | c) To fully utilize the pipel 256 | two independent pieces of rou 257 | p = [ R*S*(A2 + S*(A4 258 | [ R + S*(A1 + 259 | where S = R*R. 260 | 261 | Step 5. Compute 2^(J/64)*p by 262 | p := T*p 263 | where T and t are the stored 264 | Notes: 2^(J/64) is stored as T and t 265 | 2^(J/64) to roughly 85 bits; 266 | and t is in single precision. 267 | to 62 bits so that the last t 268 | reason for such a special for 269 | will all be exact --- a prope 270 | in Step 6 below. The total re 271 | bigger than 2^(-67.7) compare 272 | 273 | Step 6. Reconstruction of exp(X)-1 274 | exp(X)-1 = 2^M * ( 2^( 275 | 6.1 If M <= 63, go to Step 276 | 6.2 ans := T + (p + (t + O 277 | 6.3 If M >= -3, go to 6.5. 278 | 6.4 ans := (T + (p + t)) + 279 | 6.5 ans := (T + OnebySc) + 280 | 6.6 Restore user FPCR. 281 | 6.7 Return ans := Sc * ans 282 | Notes: The various arrangements of th 283 | evaluations. 284 | 285 | Step 7. exp(X)-1 for |X| < 1/4. 286 | 7.1 If |X| >= 2^(-65), go 287 | 7.2 Go to Step 8. 288 | 289 | Step 8. Calculate exp(X)-1, |X| < 2^(- 290 | 8.1 If |X| < 2^(-16312), g 291 | 8.2 Restore FPCR; return a 292 | 8.3 X := X * 2^(140). 293 | 8.4 Restore FPCR; ans := a 294 | Return ans := ans*2^(140). Ex 295 | Notes: The idea is to return "X - tin 296 | precision and rounding modes. 297 | inefficiency, we stay away fr 298 | best we can. For |X| >= 2^(-1 299 | 8.2 generates the inexact exc 300 | 301 | Step 9. Calculate exp(X)-1, |X| < 1/4, 302 | p = X + X*X*(B1 + X*(B 303 | Notes: a) In order to reduce memory a 304 | made as "short" as possible: 305 | are single precision; B3 to B 306 | B2 is double extended. 307 | b) Even with the restriction 308 | |p - (exp(X)-1)| < |X| 309 | for all |X| <= 0.251. 310 | Note that 0.251 is slightly b 311 | c) To fully preserve accuracy 312 | as X + ( S*B1 + Q ) wh 313 | Q = X*S*(B 314 | d) To fully utilize the pipel 315 | two independent pieces of rou 316 | Q = [ X*S*(B2 + S*(B4 317 | [ S*S*(B3 + S* 318 | 319 | Step 10. Calculate exp(X)-1 for 320 | 10.1 If X >= 70log2 , exp(X) - 321 | purposes. Therefore, go to St 322 | 10.2 If X <= -70log2, exp(X) - 323 | ans := -1 324 | Restore user FPCR 325 | Return ans := ans + 2^(-126). 326 | Notes: 10.2 will always create an ine 327 | in the user rounding precisio 328 | 329 | 330 331 | Copyright (C) Motorola, Inc. 1 332 | All Rights Reserved 333 | 334 | For details on the license for this fi 335 | file, README, in this same directory. 336 337 |setox idnt 2,1 | Motorola 040 Floating Po 338 339 |section 8 340 341 #include "fpsp.h" 342 343 L2: .long 0x3FDC0000,0x82E30865,0x4361C4 344 345 EXPA3: .long 0x3FA55555,0x55554431 346 EXPA2: .long 0x3FC55555,0x55554018 347 348 HUGE: .long 0x7FFE0000,0xFFFFFFFF,0xFFFFFF 349 TINY: .long 0x00010000,0xFFFFFFFF,0xFFFFFF 350 351 EM1A4: .long 0x3F811111,0x11174385 352 EM1A3: .long 0x3FA55555,0x55554F5A 353 354 EM1A2: .long 0x3FC55555,0x55555555,0x000000 355 356 EM1B8: .long 0x3EC71DE3,0xA5774682 357 EM1B7: .long 0x3EFA01A0,0x19D7CB68 358 359 EM1B6: .long 0x3F2A01A0,0x1A019DF3 360 EM1B5: .long 0x3F56C16C,0x16C170E2 361 362 EM1B4: .long 0x3F811111,0x11111111 363 EM1B3: .long 0x3FA55555,0x55555555 364 365 EM1B2: .long 0x3FFC0000,0xAAAAAAAA,0xAAAAAA 366 .long 0x00000000 367 368 TWO140: .long 0x48B00000,0x00000000 369 TWON140: .long 0x37300000,0x00000000 370 371 EXPTBL: 372 .long 0x3FFF0000,0x80000000,0x000000 373 .long 0x3FFF0000,0x8164D1F3,0xBC0307 374 .long 0x3FFF0000,0x82CD8698,0xAC2BA1 375 .long 0x3FFF0000,0x843A28C3,0xACDE40 376 .long 0x3FFF0000,0x85AAC367,0xCC487B 377 .long 0x3FFF0000,0x871F6196,0x9E8D10 378 .long 0x3FFF0000,0x88980E80,0x92DA85 379 .long 0x3FFF0000,0x8A14D575,0x496EFD 380 .long 0x3FFF0000,0x8B95C1E3,0xEA8BD6 381 .long 0x3FFF0000,0x8D1ADF5B,0x7E5BA9 382 .long 0x3FFF0000,0x8EA4398B,0x45CD53 383 .long 0x3FFF0000,0x9031DC43,0x1466B1 384 .long 0x3FFF0000,0x91C3D373,0xAB11C3 385 .long 0x3FFF0000,0x935A2B2F,0x13E6E9 386 .long 0x3FFF0000,0x94F4EFA8,0xFEF709 387 .long 0x3FFF0000,0x96942D37,0x20185A 388 .long 0x3FFF0000,0x9837F051,0x8DB8A9 389 .long 0x3FFF0000,0x99E04593,0x20B7FA 390 .long 0x3FFF0000,0x9B8D39B9,0xD54E55 391 .long 0x3FFF0000,0x9D3ED9A7,0x2CFFB7 392 .long 0x3FFF0000,0x9EF53260,0x91A111 393 .long 0x3FFF0000,0xA0B0510F,0xB9714F 394 .long 0x3FFF0000,0xA2704303,0x0C4968 395 .long 0x3FFF0000,0xA43515AE,0x09E680 396 .long 0x3FFF0000,0xA5FED6A9,0xB15138 397 .long 0x3FFF0000,0xA7CD93B4,0xE96535 398 .long 0x3FFF0000,0xA9A15AB4,0xEA7C0E 399 .long 0x3FFF0000,0xAB7A39B5,0xA93ED3 400 .long 0x3FFF0000,0xAD583EEA,0x42A14A 401 .long 0x3FFF0000,0xAF3B78AD,0x690A43 402 .long 0x3FFF0000,0xB123F581,0xD2AC25 403 .long 0x3FFF0000,0xB311C412,0xA91124 404 .long 0x3FFF0000,0xB504F333,0xF9DE64 405 .long 0x3FFF0000,0xB6FD91E3,0x28D177 406 .long 0x3FFF0000,0xB8FBAF47,0x62FB9E 407 .long 0x3FFF0000,0xBAFF5AB2,0x133E45 408 .long 0x3FFF0000,0xBD08A39F,0x580C36 409 .long 0x3FFF0000,0xBF1799B6,0x7A7310 410 .long 0x3FFF0000,0xC12C4CCA,0x667094 411 .long 0x3FFF0000,0xC346CCDA,0x249764 412 .long 0x3FFF0000,0xC5672A11,0x5506DA 413 .long 0x3FFF0000,0xC78D74C8,0xABB9B1 414 .long 0x3FFF0000,0xC9B9BD86,0x6E2F27 415 .long 0x3FFF0000,0xCBEC14FE,0xF2727C 416 .long 0x3FFF0000,0xCE248C15,0x1F8480 417 .long 0x3FFF0000,0xD06333DA,0xEF2B25 418 .long 0x3FFF0000,0xD2A81D91,0xF12AE4 419 .long 0x3FFF0000,0xD4F35AAB,0xCFEDFA 420 .long 0x3FFF0000,0xD744FCCA,0xD69D6A 421 .long 0x3FFF0000,0xD99D15C2,0x78AFD7 422 .long 0x3FFF0000,0xDBFBB797,0xDAF237 423 .long 0x3FFF0000,0xDE60F482,0x5E0E91 424 .long 0x3FFF0000,0xE0CCDEEC,0x2A94E1 425 .long 0x3FFF0000,0xE33F8972,0xBE8A5A 426 .long 0x3FFF0000,0xE5B906E7,0x7C8348 427 .long 0x3FFF0000,0xE8396A50,0x3C4BDC 428 .long 0x3FFF0000,0xEAC0C6E7,0xDD2439 429 .long 0x3FFF0000,0xED4F301E,0xD9942B 430 .long 0x3FFF0000,0xEFE4B99B,0xDCDAF5 431 .long 0x3FFF0000,0xF281773C,0x59FFB1 432 .long 0x3FFF0000,0xF5257D15,0x2486CC 433 .long 0x3FFF0000,0xF7D0DF73,0x0AD13B 434 .long 0x3FFF0000,0xFA83B2DB,0x722A03 435 .long 0x3FFF0000,0xFD3E0C0C,0xF486C1 436 437 .set ADJFLAG,L_SCR2 438 .set SCALE,FP_SCR1 439 .set ADJSCALE,FP_SCR2 440 .set SC,FP_SCR3 441 .set ONEBYSC,FP_SCR4 442 443 | xref t_frcinx 444 |xref t_extdnrm 445 |xref t_unfl 446 |xref t_ovfl 447 448 .global setoxd 449 setoxd: 450 |--entry point for EXP(X), X is denormalized 451 movel (%a0),%d0 452 andil #0x80000000,%d0 453 oril #0x00800000,%d0 454 movel %d0,-(%sp) 455 fmoves #0x3F800000,%fp0 456 fmovel %d1,%fpcr 457 fadds (%sp)+,%fp0 458 bra t_frcinx 459 460 .global setox 461 setox: 462 |--entry point for EXP(X), here X is finite, n 463 464 |--Step 1. 465 movel (%a0),%d0 | ... 466 andil #0x7FFF0000,%d0 | ...b 467 cmpil #0x3FBE0000,%d0 | ...2 468 bges EXPC1 | ...n 469 bra EXPSM 470 471 EXPC1: 472 |--The case |X| >= 2^(-65) 473 movew 4(%a0),%d0 | ...e 474 cmpil #0x400CB167,%d0 | ...1 475 blts EXPMAIN | ...normal c 476 bra EXPBIG 477 478 EXPMAIN: 479 |--Step 2. 480 |--This is the normal branch: 2^(-65) <= |X| 481 fmovex (%a0),%fp0 | ...l 482 483 fmovex %fp0,%fp1 484 fmuls #0x42B8AA3B,%fp0 485 fmovemx %fp2-%fp2/%fp3,-(%a7) 486 movel #0,ADJFLAG(%a6) 487 fmovel %fp0,%d0 488 lea EXPTBL,%a1 489 fmovel %d0,%fp0 490 491 movel %d0,L_SCR1(%a6) | ...s 492 andil #0x3F,%d0 493 lsll #4,%d0 494 addal %d0,%a1 | ...a 495 movel L_SCR1(%a6),%d0 496 asrl #6,%d0 | ...D 497 addiw #0x3FFF,%d0 | ...b 498 movew L2,L_SCR1(%a6) | ...p 499 500 EXPCONT1: 501 |--Step 3. 502 |--fp1,fp2 saved on the stack. fp0 is N, fp1 i 503 |--a0 points to 2^(J/64), D0 is biased expo. o 504 fmovex %fp0,%fp2 505 fmuls #0xBC317218,%fp0 506 fmulx L2,%fp2 | ...N 507 faddx %fp1,%fp0 508 faddx %fp2,%fp0 509 | MOVE.W #$3FA5,EXPA3 ...loa 510 511 |--Step 4. 512 |--WE NOW COMPUTE EXP(R)-1 BY A POLYNOMIAL 513 |-- R + R*R*(A1 + R*(A2 + R*(A3 + R*(A4 + R*A5 514 |--TO FULLY UTILIZE THE PIPELINE, WE COMPUTE S 515 |--[R+R*S*(A2+S*A4)] + [S*(A1+S*(A3+S*A5))] 516 517 fmovex %fp0,%fp1 518 fmulx %fp1,%fp1 519 520 fmoves #0x3AB60B70,%fp2 521 | MOVE.W #0,2(%a1) ...loa 522 523 fmulx %fp1,%fp2 524 fmovex %fp1,%fp3 525 fmuls #0x3C088895,%fp3 526 527 faddd EXPA3,%fp2 | ...f 528 faddd EXPA2,%fp3 | ...f 529 530 fmulx %fp1,%fp2 531 movew %d0,SCALE(%a6) | ...S 532 clrw SCALE+2(%a6) 533 movel #0x80000000,SCALE+4(%a 534 clrl SCALE+8(%a6) 535 536 fmulx %fp1,%fp3 537 538 fadds #0x3F000000,%fp2 539 fmulx %fp0,%fp3 540 541 fmulx %fp1,%fp2 542 faddx %fp3,%fp0 543 | ...fp3 544 545 fmovex (%a1)+,%fp1 | ...f 546 faddx %fp2,%fp0 547 | ...fp2 548 549 |--Step 5 550 |--final reconstruction process 551 |--EXP(X) = 2^M * ( 2^(J/64) + 2^(J/64)*(EXP(R 552 553 fmulx %fp1,%fp0 554 fmovemx (%a7)+,%fp2-%fp2/%fp3 | ...f 555 fadds (%a1),%fp0 | ...a 556 557 faddx %fp1,%fp0 558 movel ADJFLAG(%a6),%d0 559 560 |--Step 6 561 tstl %d0 562 beqs NORMAL 563 ADJUST: 564 fmulx ADJSCALE(%a6),%fp0 565 NORMAL: 566 fmovel %d1,%FPCR 567 fmulx SCALE(%a6),%fp0 | ...m 568 bra t_frcinx 569 570 EXPSM: 571 |--Step 7 572 fmovemx (%a0),%fp0-%fp0 | ...in case X 573 fmovel %d1,%FPCR 574 fadds #0x3F800000,%fp0 575 bra t_frcinx 576 577 EXPBIG: 578 |--Step 8 579 cmpil #0x400CB27C,%d0 | ...1 580 bgts EXP2BIG 581 |--Steps 8.2 -- 8.6 582 fmovex (%a0),%fp0 | ...l 583 584 fmovex %fp0,%fp1 585 fmuls #0x42B8AA3B,%fp0 586 fmovemx %fp2-%fp2/%fp3,-(%a7) 587 movel #1,ADJFLAG(%a6) 588 fmovel %fp0,%d0 589 lea EXPTBL,%a1 590 fmovel %d0,%fp0 591 movel %d0,L_SCR1(%a6) 592 andil #0x3F,%d0 593 lsll #4,%d0 594 addal %d0,%a1 595 movel L_SCR1(%a6),%d0 596 asrl #6,%d0 597 movel %d0,L_SCR1(%a6) 598 asrl #1,%d0 599 subl %d0,L_SCR1(%a6) 600 addiw #0x3FFF,%d0 601 movew %d0,ADJSCALE(%a6) 602 clrw ADJSCALE+2(%a6) 603 movel #0x80000000,ADJSCALE+4 604 clrl ADJSCALE+8(%a6) 605 movel L_SCR1(%a6),%d0 606 addiw #0x3FFF,%d0 607 bra EXPCONT1 608 609 EXP2BIG: 610 |--Step 9 611 fmovel %d1,%FPCR 612 movel (%a0),%d0 613 bclrb #sign_bit,(%a0) 614 cmpil #0,%d0 615 blt t_unfl 616 bra t_ovfl 617 618 .global setoxm1d 619 setoxm1d: 620 |--entry point for EXPM1(X), here X is denorma 621 |--Step 0. 622 bra t_extdnrm 623 624 625 .global setoxm1 626 setoxm1: 627 |--entry point for EXPM1(X), here X is finite, 628 629 |--Step 1. 630 |--Step 1.1 631 movel (%a0),%d0 | ... 632 andil #0x7FFF0000,%d0 | ...b 633 cmpil #0x3FFD0000,%d0 | ...1 634 bges EM1CON1 | ...|X| >= 1 635 bra EM1SM 636 637 EM1CON1: 638 |--Step 1.3 639 |--The case |X| >= 1/4 640 movew 4(%a0),%d0 | ...e 641 cmpil #0x4004C215,%d0 | ...7 642 bles EM1MAIN | ...1/4 <= | 643 bra EM1BIG 644 645 EM1MAIN: 646 |--Step 2. 647 |--This is the case: 1/4 <= |X| <= 70 log2. 648 fmovex (%a0),%fp0 | ...l 649 650 fmovex %fp0,%fp1 651 fmuls #0x42B8AA3B,%fp0 652 fmovemx %fp2-%fp2/%fp3,-(%a7) 653 | MOVE.W #$3F81,EM1A4 654 fmovel %fp0,%d0 655 lea EXPTBL,%a1 656 fmovel %d0,%fp0 657 658 movel %d0,L_SCR1(%a6) 659 andil #0x3F,%d0 660 lsll #4,%d0 661 addal %d0,%a1 662 movel L_SCR1(%a6),%d0 663 asrl #6,%d0 664 movel %d0,L_SCR1(%a6) 665 | MOVE.W #$3FDC,L2 666 667 |--Step 3. 668 |--fp1,fp2 saved on the stack. fp0 is N, fp1 i 669 |--a0 points to 2^(J/64), D0 and a1 both conta 670 fmovex %fp0,%fp2 671 fmuls #0xBC317218,%fp0 672 fmulx L2,%fp2 | ...N 673 faddx %fp1,%fp0 | ... 674 faddx %fp2,%fp0 | ... 675 | MOVE.W #$3FC5,EM1A2 676 addiw #0x3FFF,%d0 677 678 |--Step 4. 679 |--WE NOW COMPUTE EXP(R)-1 BY A POLYNOMIAL 680 |-- R + R*R*(A1 + R*(A2 + R*(A3 + R*(A4 + R*(A 681 |--TO FULLY UTILIZE THE PIPELINE, WE COMPUTE S 682 |--[R*S*(A2+S*(A4+S*A6))] + [R+S*(A1+S*(A3+S*A 683 684 fmovex %fp0,%fp1 685 fmulx %fp1,%fp1 686 687 fmoves #0x3950097B,%fp2 688 | MOVE.W #0,2(%a1) ...loa 689 690 fmulx %fp1,%fp2 691 fmovex %fp1,%fp3 692 fmuls #0x3AB60B6A,%fp3 693 694 faddd EM1A4,%fp2 | ...f 695 faddd EM1A3,%fp3 | ...f 696 movew %d0,SC(%a6) 697 clrw SC+2(%a6) 698 movel #0x80000000,SC+4(%a6) 699 clrl SC+8(%a6) 700 701 fmulx %fp1,%fp2 702 movel L_SCR1(%a6),%d0 703 negw %d0 | ...D 704 fmulx %fp1,%fp3 705 addiw #0x3FFF,%d0 | ...b 706 faddd EM1A2,%fp2 | ...f 707 fadds #0x3F000000,%fp3 708 709 fmulx %fp1,%fp2 710 oriw #0x8000,%d0 | ...s 711 movew %d0,ONEBYSC(%a6) 712 clrw ONEBYSC+2(%a6) 713 movel #0x80000000,ONEBYSC+4( 714 clrl ONEBYSC+8(%a6) 715 fmulx %fp3,%fp1 716 | ...fp3 717 718 fmulx %fp0,%fp2 719 faddx %fp1,%fp0 720 | ...fp1 721 722 faddx %fp2,%fp0 723 | ...fp2 724 fmovemx (%a7)+,%fp2-%fp2/%fp3 | ...f 725 726 |--Step 5 727 |--Compute 2^(J/64)*p 728 729 fmulx (%a1),%fp0 | ...2 730 731 |--Step 6 732 |--Step 6.1 733 movel L_SCR1(%a6),%d0 734 cmpil #63,%d0 735 bles MLE63 736 |--Step 6.2 M >= 64 737 fmoves 12(%a1),%fp1 | ...f 738 faddx ONEBYSC(%a6),%fp1 739 faddx %fp1,%fp0 740 faddx (%a1),%fp0 | ...T 741 bras EM1SCALE 742 MLE63: 743 |--Step 6.3 M <= 63 744 cmpil #-3,%d0 745 bges MGEN3 746 MLTN3: 747 |--Step 6.4 M <= -4 748 fadds 12(%a1),%fp0 | ...p 749 faddx (%a1),%fp0 | ...T 750 faddx ONEBYSC(%a6),%fp0 751 bras EM1SCALE 752 MGEN3: 753 |--Step 6.5 -3 <= M <= 63 754 fmovex (%a1)+,%fp1 | ...f 755 fadds (%a1),%fp0 | ...f 756 faddx ONEBYSC(%a6),%fp1 757 faddx %fp1,%fp0 758 759 EM1SCALE: 760 |--Step 6.6 761 fmovel %d1,%FPCR 762 fmulx SC(%a6),%fp0 763 764 bra t_frcinx 765 766 EM1SM: 767 |--Step 7 |X| < 1/4. 768 cmpil #0x3FBE0000,%d0 | ...2 769 bges EM1POLY 770 771 EM1TINY: 772 |--Step 8 |X| < 2^(-65) 773 cmpil #0x00330000,%d0 | ...2 774 blts EM12TINY 775 |--Step 8.2 776 movel #0x80010000,SC(%a6) 777 movel #0x80000000,SC+4(%a6) 778 clrl SC+8(%a6) 779 fmovex (%a0),%fp0 780 fmovel %d1,%FPCR 781 faddx SC(%a6),%fp0 782 783 bra t_frcinx 784 785 EM12TINY: 786 |--Step 8.3 787 fmovex (%a0),%fp0 788 fmuld TWO140,%fp0 789 movel #0x80010000,SC(%a6) 790 movel #0x80000000,SC+4(%a6) 791 clrl SC+8(%a6) 792 faddx SC(%a6),%fp0 793 fmovel %d1,%FPCR 794 fmuld TWON140,%fp0 795 796 bra t_frcinx 797 798 EM1POLY: 799 |--Step 9 exp(X)-1 by a simple polynomia 800 fmovex (%a0),%fp0 | ...f 801 fmulx %fp0,%fp0 802 fmovemx %fp2-%fp2/%fp3,-(%a7) | ...s 803 fmoves #0x2F30CAA8,%fp1 804 fmulx %fp0,%fp1 805 fmoves #0x310F8290,%fp2 806 fadds #0x32D73220,%fp1 807 808 fmulx %fp0,%fp2 809 fmulx %fp0,%fp1 810 811 fadds #0x3493F281,%fp2 812 faddd EM1B8,%fp1 | ...f 813 814 fmulx %fp0,%fp2 815 fmulx %fp0,%fp1 816 817 faddd EM1B7,%fp2 | ...f 818 faddd EM1B6,%fp1 | ...f 819 820 fmulx %fp0,%fp2 821 fmulx %fp0,%fp1 822 823 faddd EM1B5,%fp2 | ...f 824 faddd EM1B4,%fp1 | ...f 825 826 fmulx %fp0,%fp2 827 fmulx %fp0,%fp1 828 829 faddd EM1B3,%fp2 | ...f 830 faddx EM1B2,%fp1 | ...f 831 832 fmulx %fp0,%fp2 833 fmulx %fp0,%fp1 834 835 fmulx %fp0,%fp2 836 fmulx (%a0),%fp1 | ...f 837 838 fmuls #0x3F000000,%fp0 839 faddx %fp2,%fp1 840 | ...fp2 841 842 fmovemx (%a7)+,%fp2-%fp2/%fp3 | ...f 843 844 faddx %fp1,%fp0 845 | ...fp1 846 847 fmovel %d1,%FPCR 848 faddx (%a0),%fp0 849 850 bra t_frcinx 851 852 EM1BIG: 853 |--Step 10 |X| > 70 log2 854 movel (%a0),%d0 855 cmpil #0,%d0 856 bgt EXPC1 857 |--Step 10.2 858 fmoves #0xBF800000,%fp0 859 fmovel %d1,%FPCR 860 fadds #0x00800000,%fp0 861 862 bra t_frcinx 863 864 |end
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.