~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/arch/m68k/fpsp040/setox.S

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /arch/m68k/fpsp040/setox.S (Version linux-6.12-rc7) and /arch/m68k/fpsp040/setox.S (Version linux-2.4.37.11)


  1 |                                                   1 |
  2 |       setox.sa 3.1 12/10/90                       2 |       setox.sa 3.1 12/10/90
  3 |                                                   3 |
  4 |       The entry point setox computes the exp      4 |       The entry point setox computes the exponential of a value.
  5 |       setoxd does the same except the input       5 |       setoxd does the same except the input value is a denormalized
  6 |       number. setoxm1 computes exp(X)-1, and      6 |       number. setoxm1 computes exp(X)-1, and setoxm1d computes
  7 |       exp(X)-1 for denormalized X.                7 |       exp(X)-1 for denormalized X.
  8 |                                                   8 |
  9 |       INPUT                                       9 |       INPUT
 10 |       -----                                      10 |       -----
 11 |       Double-extended value in memory locati     11 |       Double-extended value in memory location pointed to by address
 12 |       register a0.                               12 |       register a0.
 13 |                                                  13 |
 14 |       OUTPUT                                     14 |       OUTPUT
 15 |       ------                                     15 |       ------
 16 |       exp(X) or exp(X)-1 returned in floatin     16 |       exp(X) or exp(X)-1 returned in floating-point register fp0.
 17 |                                                  17 |
 18 |       ACCURACY and MONOTONICITY                  18 |       ACCURACY and MONOTONICITY
 19 |       -------------------------                  19 |       -------------------------
 20 |       The returned result is within 0.85 ulp     20 |       The returned result is within 0.85 ulps in 64 significant bit, i.e.
 21 |       within 0.5001 ulp to 53 bits if the re     21 |       within 0.5001 ulp to 53 bits if the result is subsequently rounded
 22 |       to double precision. The result is pro     22 |       to double precision. The result is provably monotonic in double
 23 |       precision.                                 23 |       precision.
 24 |                                                  24 |
 25 |       SPEED                                      25 |       SPEED
 26 |       -----                                      26 |       -----
 27 |       Two timings are measured, both in the      27 |       Two timings are measured, both in the copy-back mode. The
 28 |       first one is measured when the functio     28 |       first one is measured when the function is invoked the first time
 29 |       (so the instructions and data are not      29 |       (so the instructions and data are not in cache), and the
 30 |       second one is measured when the functi     30 |       second one is measured when the function is reinvoked at the same
 31 |       input argument.                            31 |       input argument.
 32 |                                                  32 |
 33 |       The program setox takes approximately      33 |       The program setox takes approximately 210/190 cycles for input
 34 |       argument X whose magnitude is less tha     34 |       argument X whose magnitude is less than 16380 log2, which
 35 |       is the usual situation. For the less c     35 |       is the usual situation. For the less common arguments,
 36 |       depending on their values, the program     36 |       depending on their values, the program may run faster or slower --
 37 |       but no worse than 10% slower even in t     37 |       but no worse than 10% slower even in the extreme cases.
 38 |                                                  38 |
 39 |       The program setoxm1 takes approximatel !!  39 |       The program setoxm1 takes approximately ???/??? cycles for input
 40 |       argument X, 0.25 <= |X| < 70log2. For      40 |       argument X, 0.25 <= |X| < 70log2. For |X| < 0.25, it takes
 41 |       approximately ??? / ??? cycles. For th !!  41 |       approximately ???/??? cycles. For the less common arguments,
 42 |       depending on their values, the program     42 |       depending on their values, the program may run faster or slower --
 43 |       but no worse than 10% slower even in t     43 |       but no worse than 10% slower even in the extreme cases.
 44 |                                                  44 |
 45 |       ALGORITHM and IMPLEMENTATION NOTES         45 |       ALGORITHM and IMPLEMENTATION NOTES
 46 |       ----------------------------------         46 |       ----------------------------------
 47 |                                                  47 |
 48 |       setoxd                                     48 |       setoxd
 49 |       ------                                     49 |       ------
 50 |       Step 1. Set ans := 1.0                     50 |       Step 1. Set ans := 1.0
 51 |                                                  51 |
 52 |       Step 2. Return  ans := ans + sign(X)*2     52 |       Step 2. Return  ans := ans + sign(X)*2^(-126). Exit.
 53 |       Notes:  This will always generate one      53 |       Notes:  This will always generate one exception -- inexact.
 54 |                                                  54 |
 55 |                                                  55 |
 56 |       setox                                      56 |       setox
 57 |       -----                                      57 |       -----
 58 |                                                  58 |
 59 |       Step 1. Filter out extreme cases of in     59 |       Step 1. Filter out extreme cases of input argument.
 60 |               1.1     If |X| >= 2^(-65), go      60 |               1.1     If |X| >= 2^(-65), go to Step 1.3.
 61 |               1.2     Go to Step 7.              61 |               1.2     Go to Step 7.
 62 |               1.3     If |X| < 16380 log(2),     62 |               1.3     If |X| < 16380 log(2), go to Step 2.
 63 |               1.4     Go to Step 8.              63 |               1.4     Go to Step 8.
 64 |       Notes:  The usual case should take the     64 |       Notes:  The usual case should take the branches 1.1 -> 1.3 -> 2.
 65 |                To avoid the use of floating-     65 |                To avoid the use of floating-point comparisons, a
 66 |                compact representation of |X|     66 |                compact representation of |X| is used. This format is a
 67 |                32-bit integer, the upper (mo     67 |                32-bit integer, the upper (more significant) 16 bits are
 68 |                the sign and biased exponent      68 |                the sign and biased exponent field of |X|; the lower 16
 69 |                bits are the 16 most signific     69 |                bits are the 16 most significant fraction (including the
 70 |                explicit bit) bits of |X|. Co     70 |                explicit bit) bits of |X|. Consequently, the comparisons
 71 |                in Steps 1.1 and 1.3 can be p     71 |                in Steps 1.1 and 1.3 can be performed by integer comparison.
 72 |                Note also that the constant 1     72 |                Note also that the constant 16380 log(2) used in Step 1.3
 73 |                is also in the compact form.      73 |                is also in the compact form. Thus taking the branch
 74 |                to Step 2 guarantees |X| < 16     74 |                to Step 2 guarantees |X| < 16380 log(2). There is no harm
 75 |                to have a small number of cas     75 |                to have a small number of cases where |X| is less than,
 76 |                but close to, 16380 log(2) an     76 |                but close to, 16380 log(2) and the branch to Step 9 is
 77 |                taken.                            77 |                taken.
 78 |                                                  78 |
 79 |       Step 2. Calculate N = round-to-nearest     79 |       Step 2. Calculate N = round-to-nearest-int( X * 64/log2 ).
 80 |               2.1     Set AdjFlag := 0 (indi     80 |               2.1     Set AdjFlag := 0 (indicates the branch 1.3 -> 2 was taken)
 81 |               2.2     N := round-to-nearest-     81 |               2.2     N := round-to-nearest-integer( X * 64/log2 ).
 82 |               2.3     Calculate       J = N      82 |               2.3     Calculate       J = N mod 64; so J = 0,1,2,..., or 63.
 83 |               2.4     Calculate       M = (N     83 |               2.4     Calculate       M = (N - J)/64; so N = 64M + J.
 84 |               2.5     Calculate the address      84 |               2.5     Calculate the address of the stored value of 2^(J/64).
 85 |               2.6     Create the value Scale     85 |               2.6     Create the value Scale = 2^M.
 86 |       Notes:  The calculation in 2.2 is real     86 |       Notes:  The calculation in 2.2 is really performed by
 87 |                                                  87 |
 88 |                       Z := X * constant          88 |                       Z := X * constant
 89 |                       N := round-to-nearest-     89 |                       N := round-to-nearest-integer(Z)
 90 |                                                  90 |
 91 |                where                             91 |                where
 92 |                                                  92 |
 93 |                       constant := single-pre     93 |                       constant := single-precision( 64/log 2 ).
 94 |                                                  94 |
 95 |                Using a single-precision cons     95 |                Using a single-precision constant avoids memory access.
 96 |                Another effect of using a sin     96 |                Another effect of using a single-precision "constant" is
 97 |                that the calculated value Z i     97 |                that the calculated value Z is
 98 |                                                  98 |
 99 |                       Z = X*(64/log2)*(1+eps     99 |                       Z = X*(64/log2)*(1+eps), |eps| <= 2^(-24).
100 |                                                 100 |
101 |                This error has to be consider    101 |                This error has to be considered later in Steps 3 and 4.
102 |                                                 102 |
103 |       Step 3. Calculate X - N*log2/64.          103 |       Step 3. Calculate X - N*log2/64.
104 |               3.1     R := X + N*L1, where L    104 |               3.1     R := X + N*L1, where L1 := single-precision(-log2/64).
105 |               3.2     R := R + N*L2, L2 := e    105 |               3.2     R := R + N*L2, L2 := extended-precision(-log2/64 - L1).
106 |       Notes:  a) The way L1 and L2 are chose    106 |       Notes:  a) The way L1 and L2 are chosen ensures L1+L2 approximate
107 |                the value      -log2/64          107 |                the value      -log2/64        to 88 bits of accuracy.
108 |                b) N*L1 is exact because N is    108 |                b) N*L1 is exact because N is no longer than 22 bits and
109 |                L1 is no longer than 24 bits.    109 |                L1 is no longer than 24 bits.
110 |                c) The calculation X+N*L1 is     110 |                c) The calculation X+N*L1 is also exact due to cancellation.
111 |                Thus, R is practically X+N(L1    111 |                Thus, R is practically X+N(L1+L2) to full 64 bits.
112 |                d) It is important to estimat    112 |                d) It is important to estimate how large can |R| be after
113 |                Step 3.2.                        113 |                Step 3.2.
114 |                                                 114 |
115 |                       N = rnd-to-int( X*64/l    115 |                       N = rnd-to-int( X*64/log2 (1+eps) ), |eps|<=2^(-24)
116 |                       X*64/log2 (1+eps)         116 |                       X*64/log2 (1+eps)       =       N + f,  |f| <= 0.5
117 |                       X*64/log2 - N   =         117 |                       X*64/log2 - N   =       f - eps*X 64/log2
118 |                       X - N*log2/64   =         118 |                       X - N*log2/64   =       f*log2/64 - eps*X
119 |                                                 119 |
120 |                                                 120 |
121 |                Now |X| <= 16446 log2, thus      121 |                Now |X| <= 16446 log2, thus
122 |                                                 122 |
123 |                       |X - N*log2/64| <= (0.    123 |                       |X - N*log2/64| <= (0.5 + 16446/2^(18))*log2/64
124 |                                       <= 0.5    124 |                                       <= 0.57 log2/64.
125 |                This bound will be used in St    125 |                This bound will be used in Step 4.
126 |                                                 126 |
127 |       Step 4. Approximate exp(R)-1 by a poly    127 |       Step 4. Approximate exp(R)-1 by a polynomial
128 |                       p = R + R*R*(A1 + R*(A    128 |                       p = R + R*R*(A1 + R*(A2 + R*(A3 + R*(A4 + R*A5))))
129 |       Notes:  a) In order to reduce memory a    129 |       Notes:  a) In order to reduce memory access, the coefficients are
130 |                made as "short" as possible:     130 |                made as "short" as possible: A1 (which is 1/2), A4 and A5
131 |                are single precision; A2 and     131 |                are single precision; A2 and A3 are double precision.
132 |                b) Even with the restrictions    132 |                b) Even with the restrictions above,
133 |                       |p - (exp(R)-1)| < 2^(    133 |                       |p - (exp(R)-1)| < 2^(-68.8) for all |R| <= 0.0062.
134 |                Note that 0.0062 is slightly     134 |                Note that 0.0062 is slightly bigger than 0.57 log2/64.
135 |                c) To fully utilize the pipel    135 |                c) To fully utilize the pipeline, p is separated into
136 |                two independent pieces of rou    136 |                two independent pieces of roughly equal complexities
137 |                       p = [ R + R*S*(A2 + S*    137 |                       p = [ R + R*S*(A2 + S*A4) ]     +
138 |                               [ S*(A1 + S*(A    138 |                               [ S*(A1 + S*(A3 + S*A5)) ]
139 |                where S = R*R.                   139 |                where S = R*R.
140 |                                                 140 |
141 |       Step 5. Compute 2^(J/64)*exp(R) = 2^(J    141 |       Step 5. Compute 2^(J/64)*exp(R) = 2^(J/64)*(1+p) by
142 |                               ans := T + ( T    142 |                               ans := T + ( T*p + t)
143 |                where T and t are the stored     143 |                where T and t are the stored values for 2^(J/64).
144 |       Notes:  2^(J/64) is stored as T and t     144 |       Notes:  2^(J/64) is stored as T and t where T+t approximates
145 |                2^(J/64) to roughly 85 bits;     145 |                2^(J/64) to roughly 85 bits; T is in extended precision
146 |                and t is in single precision.    146 |                and t is in single precision. Note also that T is rounded
147 |                to 62 bits so that the last t    147 |                to 62 bits so that the last two bits of T are zero. The
148 |                reason for such a special for    148 |                reason for such a special form is that T-1, T-2, and T-8
149 |                will all be exact --- a prope    149 |                will all be exact --- a property that will give much
150 |                more accurate computation of     150 |                more accurate computation of the function EXPM1.
151 |                                                 151 |
152 |       Step 6. Reconstruction of exp(X)          152 |       Step 6. Reconstruction of exp(X)
153 |                       exp(X) = 2^M * 2^(J/64    153 |                       exp(X) = 2^M * 2^(J/64) * exp(R).
154 |               6.1     If AdjFlag = 0, go to     154 |               6.1     If AdjFlag = 0, go to 6.3
155 |               6.2     ans := ans * AdjScale     155 |               6.2     ans := ans * AdjScale
156 |               6.3     Restore the user FPCR     156 |               6.3     Restore the user FPCR
157 |               6.4     Return ans := ans * Sc    157 |               6.4     Return ans := ans * Scale. Exit.
158 |       Notes:  If AdjFlag = 0, we have X = Ml    158 |       Notes:  If AdjFlag = 0, we have X = Mlog2 + Jlog2/64 + R,
159 |                |M| <= 16380, and Scale = 2^M    159 |                |M| <= 16380, and Scale = 2^M. Moreover, exp(X) will
160 |                neither overflow nor underflo    160 |                neither overflow nor underflow. If AdjFlag = 1, that
161 |                means that                       161 |                means that
162 |                       X = (M1+M)log2 + Jlog2    162 |                       X = (M1+M)log2 + Jlog2/64 + R, |M1+M| >= 16380.
163 |                Hence, exp(X) may overflow or    163 |                Hence, exp(X) may overflow or underflow or neither.
164 |                When that is the case, AdjSca    164 |                When that is the case, AdjScale = 2^(M1) where M1 is
165 |                approximately M. Thus 6.2 wil    165 |                approximately M. Thus 6.2 will never cause over/underflow.
166 |                Possible exception in 6.4 is     166 |                Possible exception in 6.4 is overflow or underflow.
167 |                The inexact exception is not     167 |                The inexact exception is not generated in 6.4. Although
168 |                one can argue that the inexac    168 |                one can argue that the inexact flag should always be
169 |                raised, to simulate that exce    169 |                raised, to simulate that exception cost to much than the
170 |                flag is worth in practical us    170 |                flag is worth in practical uses.
171 |                                                 171 |
172 |       Step 7. Return 1 + X.                     172 |       Step 7. Return 1 + X.
173 |               7.1     ans := X                  173 |               7.1     ans := X
174 |               7.2     Restore user FPCR.        174 |               7.2     Restore user FPCR.
175 |               7.3     Return ans := 1 + ans.    175 |               7.3     Return ans := 1 + ans. Exit
176 |       Notes:  For non-zero X, the inexact ex    176 |       Notes:  For non-zero X, the inexact exception will always be
177 |                raised by 7.3. That is the on    177 |                raised by 7.3. That is the only exception raised by 7.3.
178 |                Note also that we use the FMO    178 |                Note also that we use the FMOVEM instruction to move X
179 |                in Step 7.1 to avoid unnecess    179 |                in Step 7.1 to avoid unnecessary trapping. (Although
180 |                the FMOVEM may not seem relev    180 |                the FMOVEM may not seem relevant since X is normalized,
181 |                the precaution will be useful    181 |                the precaution will be useful in the library version of
182 |                this code where the separate     182 |                this code where the separate entry for denormalized inputs
183 |                will be done away with.)         183 |                will be done away with.)
184 |                                                 184 |
185 |       Step 8. Handle exp(X) where |X| >= 163    185 |       Step 8. Handle exp(X) where |X| >= 16380log2.
186 |               8.1     If |X| > 16480 log2, g    186 |               8.1     If |X| > 16480 log2, go to Step 9.
187 |               (mimic 2.2 - 2.6)                 187 |               (mimic 2.2 - 2.6)
188 |               8.2     N := round-to-integer(    188 |               8.2     N := round-to-integer( X * 64/log2 )
189 |               8.3     Calculate J = N mod 64    189 |               8.3     Calculate J = N mod 64, J = 0,1,...,63
190 |               8.4     K := (N-J)/64, M1 := t    190 |               8.4     K := (N-J)/64, M1 := truncate(K/2), M = K-M1, AdjFlag := 1.
191 |               8.5     Calculate the address     191 |               8.5     Calculate the address of the stored value 2^(J/64).
192 |               8.6     Create the values Scal    192 |               8.6     Create the values Scale = 2^M, AdjScale = 2^M1.
193 |               8.7     Go to Step 3.             193 |               8.7     Go to Step 3.
194 |       Notes:  Refer to notes for 2.2 - 2.6.     194 |       Notes:  Refer to notes for 2.2 - 2.6.
195 |                                                 195 |
196 |       Step 9. Handle exp(X), |X| > 16480 log    196 |       Step 9. Handle exp(X), |X| > 16480 log2.
197 |               9.1     If X < 0, go to 9.3       197 |               9.1     If X < 0, go to 9.3
198 |               9.2     ans := Huge, go to 9.4    198 |               9.2     ans := Huge, go to 9.4
199 |               9.3     ans := Tiny.              199 |               9.3     ans := Tiny.
200 |               9.4     Restore user FPCR.        200 |               9.4     Restore user FPCR.
201 |               9.5     Return ans := ans * an    201 |               9.5     Return ans := ans * ans. Exit.
202 |       Notes:  Exp(X) will surely overflow or    202 |       Notes:  Exp(X) will surely overflow or underflow, depending on
203 |                X's sign. "Huge" and "Tiny" a    203 |                X's sign. "Huge" and "Tiny" are respectively large/tiny
204 |                extended-precision numbers wh    204 |                extended-precision numbers whose square over/underflow
205 |                with an inexact result. Thus,    205 |                with an inexact result. Thus, 9.5 always raises the
206 |                inexact together with either     206 |                inexact together with either overflow or underflow.
207 |                                                 207 |
208 |                                                 208 |
209 |       setoxm1d                                  209 |       setoxm1d
210 |       --------                                  210 |       --------
211 |                                                 211 |
212 |       Step 1. Set ans := 0                      212 |       Step 1. Set ans := 0
213 |                                                 213 |
214 |       Step 2. Return  ans := X + ans. Exit.     214 |       Step 2. Return  ans := X + ans. Exit.
215 |       Notes:  This will return X with the ap    215 |       Notes:  This will return X with the appropriate rounding
216 |                precision prescribed by the u    216 |                precision prescribed by the user FPCR.
217 |                                                 217 |
218 |       setoxm1                                   218 |       setoxm1
219 |       -------                                   219 |       -------
220 |                                                 220 |
221 |       Step 1. Check |X|                         221 |       Step 1. Check |X|
222 |               1.1     If |X| >= 1/4, go to S    222 |               1.1     If |X| >= 1/4, go to Step 1.3.
223 |               1.2     Go to Step 7.             223 |               1.2     Go to Step 7.
224 |               1.3     If |X| < 70 log(2), go    224 |               1.3     If |X| < 70 log(2), go to Step 2.
225 |               1.4     Go to Step 10.            225 |               1.4     Go to Step 10.
226 |       Notes:  The usual case should take the    226 |       Notes:  The usual case should take the branches 1.1 -> 1.3 -> 2.
227 |                However, it is conceivable |X    227 |                However, it is conceivable |X| can be small very often
228 |                because EXPM1 is intended to     228 |                because EXPM1 is intended to evaluate exp(X)-1 accurately
229 |                when |X| is small. For furthe    229 |                when |X| is small. For further details on the comparisons,
230 |                see the notes on Step 1 of se    230 |                see the notes on Step 1 of setox.
231 |                                                 231 |
232 |       Step 2. Calculate N = round-to-nearest    232 |       Step 2. Calculate N = round-to-nearest-int( X * 64/log2 ).
233 |               2.1     N := round-to-nearest-    233 |               2.1     N := round-to-nearest-integer( X * 64/log2 ).
234 |               2.2     Calculate       J = N     234 |               2.2     Calculate       J = N mod 64; so J = 0,1,2,..., or 63.
235 |               2.3     Calculate       M = (N    235 |               2.3     Calculate       M = (N - J)/64; so N = 64M + J.
236 |               2.4     Calculate the address     236 |               2.4     Calculate the address of the stored value of 2^(J/64).
237 |               2.5     Create the values Sc =    237 |               2.5     Create the values Sc = 2^M and OnebySc := -2^(-M).
238 |       Notes:  See the notes on Step 2 of set    238 |       Notes:  See the notes on Step 2 of setox.
239 |                                                 239 |
240 |       Step 3. Calculate X - N*log2/64.          240 |       Step 3. Calculate X - N*log2/64.
241 |               3.1     R := X + N*L1, where L    241 |               3.1     R := X + N*L1, where L1 := single-precision(-log2/64).
242 |               3.2     R := R + N*L2, L2 := e    242 |               3.2     R := R + N*L2, L2 := extended-precision(-log2/64 - L1).
243 |       Notes:  Applying the analysis of Step     243 |       Notes:  Applying the analysis of Step 3 of setox in this case
244 |                shows that |R| <= 0.0055 (not    244 |                shows that |R| <= 0.0055 (note that |X| <= 70 log2 in
245 |                this case).                      245 |                this case).
246 |                                                 246 |
247 |       Step 4. Approximate exp(R)-1 by a poly    247 |       Step 4. Approximate exp(R)-1 by a polynomial
248 |                       p = R+R*R*(A1+R*(A2+R*    248 |                       p = R+R*R*(A1+R*(A2+R*(A3+R*(A4+R*(A5+R*A6)))))
249 |       Notes:  a) In order to reduce memory a    249 |       Notes:  a) In order to reduce memory access, the coefficients are
250 |                made as "short" as possible:     250 |                made as "short" as possible: A1 (which is 1/2), A5 and A6
251 |                are single precision; A2, A3     251 |                are single precision; A2, A3 and A4 are double precision.
252 |                b) Even with the restriction     252 |                b) Even with the restriction above,
253 |                       |p - (exp(R)-1)| <        253 |                       |p - (exp(R)-1)| <      |R| * 2^(-72.7)
254 |                for all |R| <= 0.0055.           254 |                for all |R| <= 0.0055.
255 |                c) To fully utilize the pipel    255 |                c) To fully utilize the pipeline, p is separated into
256 |                two independent pieces of rou    256 |                two independent pieces of roughly equal complexity
257 |                       p = [ R*S*(A2 + S*(A4     257 |                       p = [ R*S*(A2 + S*(A4 + S*A6)) ]        +
258 |                               [ R + S*(A1 +     258 |                               [ R + S*(A1 + S*(A3 + S*A5)) ]
259 |                where S = R*R.                   259 |                where S = R*R.
260 |                                                 260 |
261 |       Step 5. Compute 2^(J/64)*p by             261 |       Step 5. Compute 2^(J/64)*p by
262 |                               p := T*p          262 |                               p := T*p
263 |                where T and t are the stored     263 |                where T and t are the stored values for 2^(J/64).
264 |       Notes:  2^(J/64) is stored as T and t     264 |       Notes:  2^(J/64) is stored as T and t where T+t approximates
265 |                2^(J/64) to roughly 85 bits;     265 |                2^(J/64) to roughly 85 bits; T is in extended precision
266 |                and t is in single precision.    266 |                and t is in single precision. Note also that T is rounded
267 |                to 62 bits so that the last t    267 |                to 62 bits so that the last two bits of T are zero. The
268 |                reason for such a special for    268 |                reason for such a special form is that T-1, T-2, and T-8
269 |                will all be exact --- a prope    269 |                will all be exact --- a property that will be exploited
270 |                in Step 6 below. The total re    270 |                in Step 6 below. The total relative error in p is no
271 |                bigger than 2^(-67.7) compare    271 |                bigger than 2^(-67.7) compared to the final result.
272 |                                                 272 |
273 |       Step 6. Reconstruction of exp(X)-1        273 |       Step 6. Reconstruction of exp(X)-1
274 |                       exp(X)-1 = 2^M * ( 2^(    274 |                       exp(X)-1 = 2^M * ( 2^(J/64) + p - 2^(-M) ).
275 |               6.1     If M <= 63, go to Step    275 |               6.1     If M <= 63, go to Step 6.3.
276 |               6.2     ans := T + (p + (t + O    276 |               6.2     ans := T + (p + (t + OnebySc)). Go to 6.6
277 |               6.3     If M >= -3, go to 6.5.    277 |               6.3     If M >= -3, go to 6.5.
278 |               6.4     ans := (T + (p + t)) +    278 |               6.4     ans := (T + (p + t)) + OnebySc. Go to 6.6
279 |               6.5     ans := (T + OnebySc) +    279 |               6.5     ans := (T + OnebySc) + (p + t).
280 |               6.6     Restore user FPCR.        280 |               6.6     Restore user FPCR.
281 |               6.7     Return ans := Sc * ans    281 |               6.7     Return ans := Sc * ans. Exit.
282 |       Notes:  The various arrangements of th    282 |       Notes:  The various arrangements of the expressions give accurate
283 |                evaluations.                     283 |                evaluations.
284 |                                                 284 |
285 |       Step 7. exp(X)-1 for |X| < 1/4.           285 |       Step 7. exp(X)-1 for |X| < 1/4.
286 |               7.1     If |X| >= 2^(-65), go     286 |               7.1     If |X| >= 2^(-65), go to Step 9.
287 |               7.2     Go to Step 8.             287 |               7.2     Go to Step 8.
288 |                                                 288 |
289 |       Step 8. Calculate exp(X)-1, |X| < 2^(-    289 |       Step 8. Calculate exp(X)-1, |X| < 2^(-65).
290 |               8.1     If |X| < 2^(-16312), g    290 |               8.1     If |X| < 2^(-16312), goto 8.3
291 |               8.2     Restore FPCR; return a    291 |               8.2     Restore FPCR; return ans := X - 2^(-16382). Exit.
292 |               8.3     X := X * 2^(140).         292 |               8.3     X := X * 2^(140).
293 |               8.4     Restore FPCR; ans := a    293 |               8.4     Restore FPCR; ans := ans - 2^(-16382).
294 |                Return ans := ans*2^(140). Ex    294 |                Return ans := ans*2^(140). Exit
295 |       Notes:  The idea is to return "X - tin    295 |       Notes:  The idea is to return "X - tiny" under the user
296 |                precision and rounding modes.    296 |                precision and rounding modes. To avoid unnecessary
297 |                inefficiency, we stay away fr    297 |                inefficiency, we stay away from denormalized numbers the
298 |                best we can. For |X| >= 2^(-1    298 |                best we can. For |X| >= 2^(-16312), the straightforward
299 |                8.2 generates the inexact exc    299 |                8.2 generates the inexact exception as the case warrants.
300 |                                                 300 |
301 |       Step 9. Calculate exp(X)-1, |X| < 1/4,    301 |       Step 9. Calculate exp(X)-1, |X| < 1/4, by a polynomial
302 |                       p = X + X*X*(B1 + X*(B    302 |                       p = X + X*X*(B1 + X*(B2 + ... + X*B12))
303 |       Notes:  a) In order to reduce memory a    303 |       Notes:  a) In order to reduce memory access, the coefficients are
304 |                made as "short" as possible:     304 |                made as "short" as possible: B1 (which is 1/2), B9 to B12
305 |                are single precision; B3 to B    305 |                are single precision; B3 to B8 are double precision; and
306 |                B2 is double extended.           306 |                B2 is double extended.
307 |                b) Even with the restriction     307 |                b) Even with the restriction above,
308 |                       |p - (exp(X)-1)| < |X|    308 |                       |p - (exp(X)-1)| < |X| 2^(-70.6)
309 |                for all |X| <= 0.251.            309 |                for all |X| <= 0.251.
310 |                Note that 0.251 is slightly b    310 |                Note that 0.251 is slightly bigger than 1/4.
311 |                c) To fully preserve accuracy    311 |                c) To fully preserve accuracy, the polynomial is computed
312 |                as     X + ( S*B1 +    Q ) wh    312 |                as     X + ( S*B1 +    Q ) where S = X*X and
313 |                       Q       =       X*S*(B    313 |                       Q       =       X*S*(B2 + X*(B3 + ... + X*B12))
314 |                d) To fully utilize the pipel    314 |                d) To fully utilize the pipeline, Q is separated into
315 |                two independent pieces of rou    315 |                two independent pieces of roughly equal complexity
316 |                       Q = [ X*S*(B2 + S*(B4     316 |                       Q = [ X*S*(B2 + S*(B4 + ... + S*B12)) ] +
317 |                               [ S*S*(B3 + S*    317 |                               [ S*S*(B3 + S*(B5 + ... + S*B11)) ]
318 |                                                 318 |
319 |       Step 10.        Calculate exp(X)-1 for    319 |       Step 10.        Calculate exp(X)-1 for |X| >= 70 log 2.
320 |               10.1 If X >= 70log2 , exp(X) -    320 |               10.1 If X >= 70log2 , exp(X) - 1 = exp(X) for all practical
321 |                purposes. Therefore, go to St    321 |                purposes. Therefore, go to Step 1 of setox.
322 |               10.2 If X <= -70log2, exp(X) -    322 |               10.2 If X <= -70log2, exp(X) - 1 = -1 for all practical purposes.
323 |                ans := -1                        323 |                ans := -1
324 |                Restore user FPCR                324 |                Restore user FPCR
325 |                Return ans := ans + 2^(-126).    325 |                Return ans := ans + 2^(-126). Exit.
326 |       Notes:  10.2 will always create an ine    326 |       Notes:  10.2 will always create an inexact and return -1 + tiny
327 |                in the user rounding precisio    327 |                in the user rounding precision and mode.
328 |                                                 328 |
329 |                                                 329 |
330                                                   330 
331 |               Copyright (C) Motorola, Inc. 1    331 |               Copyright (C) Motorola, Inc. 1990
332 |                       All Rights Reserved       332 |                       All Rights Reserved
333 |                                                 333 |
334 |       For details on the license for this fi !! 334 |       THIS IS UNPUBLISHED PROPRIETARY SOURCE CODE OF MOTOROLA 
335 |       file, README, in this same directory.  !! 335 |       The copyright notice above does not evidence any  
                                                   >> 336 |       actual or intended publication of such source code.
336                                                   337 
337 |setox  idnt    2,1 | Motorola 040 Floating Po    338 |setox  idnt    2,1 | Motorola 040 Floating Point Software Package
338                                                   339 
339         |section        8                         340         |section        8
340                                                   341 
341 #include "fpsp.h"                              !! 342         .include "fpsp.h"
342                                                   343 
343 L2:     .long   0x3FDC0000,0x82E30865,0x4361C4    344 L2:     .long   0x3FDC0000,0x82E30865,0x4361C4C6,0x00000000
344                                                   345 
345 EXPA3:  .long   0x3FA55555,0x55554431             346 EXPA3:  .long   0x3FA55555,0x55554431
346 EXPA2:  .long   0x3FC55555,0x55554018             347 EXPA2:  .long   0x3FC55555,0x55554018
347                                                   348 
348 HUGE:   .long   0x7FFE0000,0xFFFFFFFF,0xFFFFFF    349 HUGE:   .long   0x7FFE0000,0xFFFFFFFF,0xFFFFFFFF,0x00000000
349 TINY:   .long   0x00010000,0xFFFFFFFF,0xFFFFFF    350 TINY:   .long   0x00010000,0xFFFFFFFF,0xFFFFFFFF,0x00000000
350                                                   351 
351 EM1A4:  .long   0x3F811111,0x11174385             352 EM1A4:  .long   0x3F811111,0x11174385
352 EM1A3:  .long   0x3FA55555,0x55554F5A             353 EM1A3:  .long   0x3FA55555,0x55554F5A
353                                                   354 
354 EM1A2:  .long   0x3FC55555,0x55555555,0x000000    355 EM1A2:  .long   0x3FC55555,0x55555555,0x00000000,0x00000000
355                                                   356 
356 EM1B8:  .long   0x3EC71DE3,0xA5774682             357 EM1B8:  .long   0x3EC71DE3,0xA5774682
357 EM1B7:  .long   0x3EFA01A0,0x19D7CB68             358 EM1B7:  .long   0x3EFA01A0,0x19D7CB68
358                                                   359 
359 EM1B6:  .long   0x3F2A01A0,0x1A019DF3             360 EM1B6:  .long   0x3F2A01A0,0x1A019DF3
360 EM1B5:  .long   0x3F56C16C,0x16C170E2             361 EM1B5:  .long   0x3F56C16C,0x16C170E2
361                                                   362 
362 EM1B4:  .long   0x3F811111,0x11111111             363 EM1B4:  .long   0x3F811111,0x11111111
363 EM1B3:  .long   0x3FA55555,0x55555555             364 EM1B3:  .long   0x3FA55555,0x55555555
364                                                   365 
365 EM1B2:  .long   0x3FFC0000,0xAAAAAAAA,0xAAAAAA    366 EM1B2:  .long   0x3FFC0000,0xAAAAAAAA,0xAAAAAAAB
366         .long   0x00000000                        367         .long   0x00000000
367                                                   368 
368 TWO140: .long   0x48B00000,0x00000000             369 TWO140: .long   0x48B00000,0x00000000
369 TWON140:        .long   0x37300000,0x00000000     370 TWON140:        .long   0x37300000,0x00000000
370                                                   371 
371 EXPTBL:                                           372 EXPTBL:
372         .long   0x3FFF0000,0x80000000,0x000000    373         .long   0x3FFF0000,0x80000000,0x00000000,0x00000000
373         .long   0x3FFF0000,0x8164D1F3,0xBC0307    374         .long   0x3FFF0000,0x8164D1F3,0xBC030774,0x9F841A9B
374         .long   0x3FFF0000,0x82CD8698,0xAC2BA1    375         .long   0x3FFF0000,0x82CD8698,0xAC2BA1D8,0x9FC1D5B9
375         .long   0x3FFF0000,0x843A28C3,0xACDE40    376         .long   0x3FFF0000,0x843A28C3,0xACDE4048,0xA0728369
376         .long   0x3FFF0000,0x85AAC367,0xCC487B    377         .long   0x3FFF0000,0x85AAC367,0xCC487B14,0x1FC5C95C
377         .long   0x3FFF0000,0x871F6196,0x9E8D10    378         .long   0x3FFF0000,0x871F6196,0x9E8D1010,0x1EE85C9F
378         .long   0x3FFF0000,0x88980E80,0x92DA85    379         .long   0x3FFF0000,0x88980E80,0x92DA8528,0x9FA20729
379         .long   0x3FFF0000,0x8A14D575,0x496EFD    380         .long   0x3FFF0000,0x8A14D575,0x496EFD9C,0xA07BF9AF
380         .long   0x3FFF0000,0x8B95C1E3,0xEA8BD6    381         .long   0x3FFF0000,0x8B95C1E3,0xEA8BD6E8,0xA0020DCF
381         .long   0x3FFF0000,0x8D1ADF5B,0x7E5BA9    382         .long   0x3FFF0000,0x8D1ADF5B,0x7E5BA9E4,0x205A63DA
382         .long   0x3FFF0000,0x8EA4398B,0x45CD53    383         .long   0x3FFF0000,0x8EA4398B,0x45CD53C0,0x1EB70051
383         .long   0x3FFF0000,0x9031DC43,0x1466B1    384         .long   0x3FFF0000,0x9031DC43,0x1466B1DC,0x1F6EB029
384         .long   0x3FFF0000,0x91C3D373,0xAB11C3    385         .long   0x3FFF0000,0x91C3D373,0xAB11C338,0xA0781494
385         .long   0x3FFF0000,0x935A2B2F,0x13E6E9    386         .long   0x3FFF0000,0x935A2B2F,0x13E6E92C,0x9EB319B0
386         .long   0x3FFF0000,0x94F4EFA8,0xFEF709    387         .long   0x3FFF0000,0x94F4EFA8,0xFEF70960,0x2017457D
387         .long   0x3FFF0000,0x96942D37,0x20185A    388         .long   0x3FFF0000,0x96942D37,0x20185A00,0x1F11D537
388         .long   0x3FFF0000,0x9837F051,0x8DB8A9    389         .long   0x3FFF0000,0x9837F051,0x8DB8A970,0x9FB952DD
389         .long   0x3FFF0000,0x99E04593,0x20B7FA    390         .long   0x3FFF0000,0x99E04593,0x20B7FA64,0x1FE43087
390         .long   0x3FFF0000,0x9B8D39B9,0xD54E55    391         .long   0x3FFF0000,0x9B8D39B9,0xD54E5538,0x1FA2A818
391         .long   0x3FFF0000,0x9D3ED9A7,0x2CFFB7    392         .long   0x3FFF0000,0x9D3ED9A7,0x2CFFB750,0x1FDE494D
392         .long   0x3FFF0000,0x9EF53260,0x91A111    393         .long   0x3FFF0000,0x9EF53260,0x91A111AC,0x20504890
393         .long   0x3FFF0000,0xA0B0510F,0xB9714F    394         .long   0x3FFF0000,0xA0B0510F,0xB9714FC4,0xA073691C
394         .long   0x3FFF0000,0xA2704303,0x0C4968    395         .long   0x3FFF0000,0xA2704303,0x0C496818,0x1F9B7A05
395         .long   0x3FFF0000,0xA43515AE,0x09E680    396         .long   0x3FFF0000,0xA43515AE,0x09E680A0,0xA0797126
396         .long   0x3FFF0000,0xA5FED6A9,0xB15138    397         .long   0x3FFF0000,0xA5FED6A9,0xB15138EC,0xA071A140
397         .long   0x3FFF0000,0xA7CD93B4,0xE96535    398         .long   0x3FFF0000,0xA7CD93B4,0xE9653568,0x204F62DA
398         .long   0x3FFF0000,0xA9A15AB4,0xEA7C0E    399         .long   0x3FFF0000,0xA9A15AB4,0xEA7C0EF8,0x1F283C4A
399         .long   0x3FFF0000,0xAB7A39B5,0xA93ED3    400         .long   0x3FFF0000,0xAB7A39B5,0xA93ED338,0x9F9A7FDC
400         .long   0x3FFF0000,0xAD583EEA,0x42A14A    401         .long   0x3FFF0000,0xAD583EEA,0x42A14AC8,0xA05B3FAC
401         .long   0x3FFF0000,0xAF3B78AD,0x690A43    402         .long   0x3FFF0000,0xAF3B78AD,0x690A4374,0x1FDF2610
402         .long   0x3FFF0000,0xB123F581,0xD2AC25    403         .long   0x3FFF0000,0xB123F581,0xD2AC2590,0x9F705F90
403         .long   0x3FFF0000,0xB311C412,0xA91124    404         .long   0x3FFF0000,0xB311C412,0xA9112488,0x201F678A
404         .long   0x3FFF0000,0xB504F333,0xF9DE64    405         .long   0x3FFF0000,0xB504F333,0xF9DE6484,0x1F32FB13
405         .long   0x3FFF0000,0xB6FD91E3,0x28D177    406         .long   0x3FFF0000,0xB6FD91E3,0x28D17790,0x20038B30
406         .long   0x3FFF0000,0xB8FBAF47,0x62FB9E    407         .long   0x3FFF0000,0xB8FBAF47,0x62FB9EE8,0x200DC3CC
407         .long   0x3FFF0000,0xBAFF5AB2,0x133E45    408         .long   0x3FFF0000,0xBAFF5AB2,0x133E45FC,0x9F8B2AE6
408         .long   0x3FFF0000,0xBD08A39F,0x580C36    409         .long   0x3FFF0000,0xBD08A39F,0x580C36C0,0xA02BBF70
409         .long   0x3FFF0000,0xBF1799B6,0x7A7310    410         .long   0x3FFF0000,0xBF1799B6,0x7A731084,0xA00BF518
410         .long   0x3FFF0000,0xC12C4CCA,0x667094    411         .long   0x3FFF0000,0xC12C4CCA,0x66709458,0xA041DD41
411         .long   0x3FFF0000,0xC346CCDA,0x249764    412         .long   0x3FFF0000,0xC346CCDA,0x24976408,0x9FDF137B
412         .long   0x3FFF0000,0xC5672A11,0x5506DA    413         .long   0x3FFF0000,0xC5672A11,0x5506DADC,0x201F1568
413         .long   0x3FFF0000,0xC78D74C8,0xABB9B1    414         .long   0x3FFF0000,0xC78D74C8,0xABB9B15C,0x1FC13A2E
414         .long   0x3FFF0000,0xC9B9BD86,0x6E2F27    415         .long   0x3FFF0000,0xC9B9BD86,0x6E2F27A4,0xA03F8F03
415         .long   0x3FFF0000,0xCBEC14FE,0xF2727C    416         .long   0x3FFF0000,0xCBEC14FE,0xF2727C5C,0x1FF4907D
416         .long   0x3FFF0000,0xCE248C15,0x1F8480    417         .long   0x3FFF0000,0xCE248C15,0x1F8480E4,0x9E6E53E4
417         .long   0x3FFF0000,0xD06333DA,0xEF2B25    418         .long   0x3FFF0000,0xD06333DA,0xEF2B2594,0x1FD6D45C
418         .long   0x3FFF0000,0xD2A81D91,0xF12AE4    419         .long   0x3FFF0000,0xD2A81D91,0xF12AE45C,0xA076EDB9
419         .long   0x3FFF0000,0xD4F35AAB,0xCFEDFA    420         .long   0x3FFF0000,0xD4F35AAB,0xCFEDFA20,0x9FA6DE21
420         .long   0x3FFF0000,0xD744FCCA,0xD69D6A    421         .long   0x3FFF0000,0xD744FCCA,0xD69D6AF4,0x1EE69A2F
421         .long   0x3FFF0000,0xD99D15C2,0x78AFD7    422         .long   0x3FFF0000,0xD99D15C2,0x78AFD7B4,0x207F439F
422         .long   0x3FFF0000,0xDBFBB797,0xDAF237    423         .long   0x3FFF0000,0xDBFBB797,0xDAF23754,0x201EC207
423         .long   0x3FFF0000,0xDE60F482,0x5E0E91    424         .long   0x3FFF0000,0xDE60F482,0x5E0E9124,0x9E8BE175
424         .long   0x3FFF0000,0xE0CCDEEC,0x2A94E1    425         .long   0x3FFF0000,0xE0CCDEEC,0x2A94E110,0x20032C4B
425         .long   0x3FFF0000,0xE33F8972,0xBE8A5A    426         .long   0x3FFF0000,0xE33F8972,0xBE8A5A50,0x2004DFF5
426         .long   0x3FFF0000,0xE5B906E7,0x7C8348    427         .long   0x3FFF0000,0xE5B906E7,0x7C8348A8,0x1E72F47A
427         .long   0x3FFF0000,0xE8396A50,0x3C4BDC    428         .long   0x3FFF0000,0xE8396A50,0x3C4BDC68,0x1F722F22
428         .long   0x3FFF0000,0xEAC0C6E7,0xDD2439    429         .long   0x3FFF0000,0xEAC0C6E7,0xDD243930,0xA017E945
429         .long   0x3FFF0000,0xED4F301E,0xD9942B    430         .long   0x3FFF0000,0xED4F301E,0xD9942B84,0x1F401A5B
430         .long   0x3FFF0000,0xEFE4B99B,0xDCDAF5    431         .long   0x3FFF0000,0xEFE4B99B,0xDCDAF5CC,0x9FB9A9E3
431         .long   0x3FFF0000,0xF281773C,0x59FFB1    432         .long   0x3FFF0000,0xF281773C,0x59FFB138,0x20744C05
432         .long   0x3FFF0000,0xF5257D15,0x2486CC    433         .long   0x3FFF0000,0xF5257D15,0x2486CC2C,0x1F773A19
433         .long   0x3FFF0000,0xF7D0DF73,0x0AD13B    434         .long   0x3FFF0000,0xF7D0DF73,0x0AD13BB8,0x1FFE90D5
434         .long   0x3FFF0000,0xFA83B2DB,0x722A03    435         .long   0x3FFF0000,0xFA83B2DB,0x722A033C,0xA041ED22
435         .long   0x3FFF0000,0xFD3E0C0C,0xF486C1    436         .long   0x3FFF0000,0xFD3E0C0C,0xF486C174,0x1F853F3A
436                                                   437 
437         .set    ADJFLAG,L_SCR2                    438         .set    ADJFLAG,L_SCR2
438         .set    SCALE,FP_SCR1                     439         .set    SCALE,FP_SCR1
439         .set    ADJSCALE,FP_SCR2                  440         .set    ADJSCALE,FP_SCR2
440         .set    SC,FP_SCR3                        441         .set    SC,FP_SCR3
441         .set    ONEBYSC,FP_SCR4                   442         .set    ONEBYSC,FP_SCR4
442                                                   443 
443         | xref  t_frcinx                          444         | xref  t_frcinx
444         |xref   t_extdnrm                         445         |xref   t_extdnrm
445         |xref   t_unfl                            446         |xref   t_unfl
446         |xref   t_ovfl                            447         |xref   t_ovfl
447                                                   448 
448         .global setoxd                            449         .global setoxd
449 setoxd:                                           450 setoxd:
450 |--entry point for EXP(X), X is denormalized      451 |--entry point for EXP(X), X is denormalized
451         movel           (%a0),%d0                 452         movel           (%a0),%d0
452         andil           #0x80000000,%d0           453         andil           #0x80000000,%d0
453         oril            #0x00800000,%d0           454         oril            #0x00800000,%d0         | ...sign(X)*2^(-126)
454         movel           %d0,-(%sp)                455         movel           %d0,-(%sp)
455         fmoves          #0x3F800000,%fp0          456         fmoves          #0x3F800000,%fp0
456         fmovel          %d1,%fpcr                 457         fmovel          %d1,%fpcr
457         fadds           (%sp)+,%fp0               458         fadds           (%sp)+,%fp0
458         bra             t_frcinx                  459         bra             t_frcinx
459                                                   460 
460         .global setox                             461         .global setox
461 setox:                                            462 setox:
462 |--entry point for EXP(X), here X is finite, n    463 |--entry point for EXP(X), here X is finite, non-zero, and not NaN's
463                                                   464 
464 |--Step 1.                                        465 |--Step 1.
465         movel           (%a0),%d0        | ...    466         movel           (%a0),%d0        | ...load part of input X
466         andil           #0x7FFF0000,%d0 | ...b    467         andil           #0x7FFF0000,%d0 | ...biased expo. of X
467         cmpil           #0x3FBE0000,%d0 | ...2    468         cmpil           #0x3FBE0000,%d0 | ...2^(-65)
468         bges            EXPC1           | ...n    469         bges            EXPC1           | ...normal case
469         bra             EXPSM                     470         bra             EXPSM
470                                                   471 
471 EXPC1:                                            472 EXPC1:
472 |--The case |X| >= 2^(-65)                        473 |--The case |X| >= 2^(-65)
473         movew           4(%a0),%d0      | ...e    474         movew           4(%a0),%d0      | ...expo. and partial sig. of |X|
474         cmpil           #0x400CB167,%d0 | ...1    475         cmpil           #0x400CB167,%d0 | ...16380 log2 trunc. 16 bits
475         blts            EXPMAIN  | ...normal c    476         blts            EXPMAIN  | ...normal case
476         bra             EXPBIG                    477         bra             EXPBIG
477                                                   478 
478 EXPMAIN:                                          479 EXPMAIN:
479 |--Step 2.                                        480 |--Step 2.
480 |--This is the normal branch:   2^(-65) <= |X|    481 |--This is the normal branch:   2^(-65) <= |X| < 16380 log2.
481         fmovex          (%a0),%fp0      | ...l    482         fmovex          (%a0),%fp0      | ...load input from (a0)
482                                                   483 
483         fmovex          %fp0,%fp1                 484         fmovex          %fp0,%fp1
484         fmuls           #0x42B8AA3B,%fp0          485         fmuls           #0x42B8AA3B,%fp0        | ...64/log2 * X
485         fmovemx %fp2-%fp2/%fp3,-(%a7)             486         fmovemx %fp2-%fp2/%fp3,-(%a7)           | ...save fp2
486         movel           #0,ADJFLAG(%a6)           487         movel           #0,ADJFLAG(%a6)
487         fmovel          %fp0,%d0                  488         fmovel          %fp0,%d0                | ...N = int( X * 64/log2 )
488         lea             EXPTBL,%a1                489         lea             EXPTBL,%a1
489         fmovel          %d0,%fp0                  490         fmovel          %d0,%fp0                | ...convert to floating-format
490                                                   491 
491         movel           %d0,L_SCR1(%a6) | ...s    492         movel           %d0,L_SCR1(%a6) | ...save N temporarily
492         andil           #0x3F,%d0                 493         andil           #0x3F,%d0               | ...D0 is J = N mod 64
493         lsll            #4,%d0                    494         lsll            #4,%d0
494         addal           %d0,%a1         | ...a    495         addal           %d0,%a1         | ...address of 2^(J/64)
495         movel           L_SCR1(%a6),%d0           496         movel           L_SCR1(%a6),%d0
496         asrl            #6,%d0          | ...D    497         asrl            #6,%d0          | ...D0 is M
497         addiw           #0x3FFF,%d0     | ...b    498         addiw           #0x3FFF,%d0     | ...biased expo. of 2^(M)
498         movew           L2,L_SCR1(%a6)  | ...p    499         movew           L2,L_SCR1(%a6)  | ...prefetch L2, no need in CB
499                                                   500 
500 EXPCONT1:                                         501 EXPCONT1:
501 |--Step 3.                                        502 |--Step 3.
502 |--fp1,fp2 saved on the stack. fp0 is N, fp1 i    503 |--fp1,fp2 saved on the stack. fp0 is N, fp1 is X,
503 |--a0 points to 2^(J/64), D0 is biased expo. o    504 |--a0 points to 2^(J/64), D0 is biased expo. of 2^(M)
504         fmovex          %fp0,%fp2                 505         fmovex          %fp0,%fp2
505         fmuls           #0xBC317218,%fp0          506         fmuls           #0xBC317218,%fp0        | ...N * L1, L1 = lead(-log2/64)
506         fmulx           L2,%fp2         | ...N    507         fmulx           L2,%fp2         | ...N * L2, L1+L2 = -log2/64
507         faddx           %fp1,%fp0              !! 508         faddx           %fp1,%fp0               | ...X + N*L1
508         faddx           %fp2,%fp0                 509         faddx           %fp2,%fp0               | ...fp0 is R, reduced arg.
509 |       MOVE.W          #$3FA5,EXPA3    ...loa    510 |       MOVE.W          #$3FA5,EXPA3    ...load EXPA3 in cache
510                                                   511 
511 |--Step 4.                                        512 |--Step 4.
512 |--WE NOW COMPUTE EXP(R)-1 BY A POLYNOMIAL        513 |--WE NOW COMPUTE EXP(R)-1 BY A POLYNOMIAL
513 |-- R + R*R*(A1 + R*(A2 + R*(A3 + R*(A4 + R*A5    514 |-- R + R*R*(A1 + R*(A2 + R*(A3 + R*(A4 + R*A5))))
514 |--TO FULLY UTILIZE THE PIPELINE, WE COMPUTE S    515 |--TO FULLY UTILIZE THE PIPELINE, WE COMPUTE S = R*R
515 |--[R+R*S*(A2+S*A4)] + [S*(A1+S*(A3+S*A5))]       516 |--[R+R*S*(A2+S*A4)] + [S*(A1+S*(A3+S*A5))]
516                                                   517 
517         fmovex          %fp0,%fp1                 518         fmovex          %fp0,%fp1
518         fmulx           %fp1,%fp1              !! 519         fmulx           %fp1,%fp1               | ...fp1 IS S = R*R
519                                                   520 
520         fmoves          #0x3AB60B70,%fp2          521         fmoves          #0x3AB60B70,%fp2        | ...fp2 IS A5
521 |       MOVE.W          #0,2(%a1)       ...loa    522 |       MOVE.W          #0,2(%a1)       ...load 2^(J/64) in cache
522                                                   523 
523         fmulx           %fp1,%fp2              !! 524         fmulx           %fp1,%fp2               | ...fp2 IS S*A5
524         fmovex          %fp1,%fp3                 525         fmovex          %fp1,%fp3
525         fmuls           #0x3C088895,%fp3          526         fmuls           #0x3C088895,%fp3        | ...fp3 IS S*A4
526                                                   527 
527         faddd           EXPA3,%fp2      | ...f    528         faddd           EXPA3,%fp2      | ...fp2 IS A3+S*A5
528         faddd           EXPA2,%fp3      | ...f    529         faddd           EXPA2,%fp3      | ...fp3 IS A2+S*A4
529                                                   530 
530         fmulx           %fp1,%fp2              !! 531         fmulx           %fp1,%fp2               | ...fp2 IS S*(A3+S*A5)
531         movew           %d0,SCALE(%a6)  | ...S    532         movew           %d0,SCALE(%a6)  | ...SCALE is 2^(M) in extended
532         clrw            SCALE+2(%a6)              533         clrw            SCALE+2(%a6)
533         movel           #0x80000000,SCALE+4(%a    534         movel           #0x80000000,SCALE+4(%a6)
534         clrl            SCALE+8(%a6)              535         clrl            SCALE+8(%a6)
535                                                   536 
536         fmulx           %fp1,%fp3              !! 537         fmulx           %fp1,%fp3               | ...fp3 IS S*(A2+S*A4)
537                                                   538 
538         fadds           #0x3F000000,%fp2          539         fadds           #0x3F000000,%fp2        | ...fp2 IS A1+S*(A3+S*A5)
539         fmulx           %fp0,%fp3              !! 540         fmulx           %fp0,%fp3               | ...fp3 IS R*S*(A2+S*A4)
540                                                   541 
541         fmulx           %fp1,%fp2              !! 542         fmulx           %fp1,%fp2               | ...fp2 IS S*(A1+S*(A3+S*A5))
542         faddx           %fp3,%fp0              !! 543         faddx           %fp3,%fp0               | ...fp0 IS R+R*S*(A2+S*A4),
543 |                                       ...fp3    544 |                                       ...fp3 released
544                                                   545 
545         fmovex          (%a1)+,%fp1     | ...f    546         fmovex          (%a1)+,%fp1     | ...fp1 is lead. pt. of 2^(J/64)
546         faddx           %fp2,%fp0              !! 547         faddx           %fp2,%fp0               | ...fp0 is EXP(R) - 1
547 |                                       ...fp2    548 |                                       ...fp2 released
548                                                   549 
549 |--Step 5                                         550 |--Step 5
550 |--final reconstruction process                   551 |--final reconstruction process
551 |--EXP(X) = 2^M * ( 2^(J/64) + 2^(J/64)*(EXP(R    552 |--EXP(X) = 2^M * ( 2^(J/64) + 2^(J/64)*(EXP(R)-1) )
552                                                   553 
553         fmulx           %fp1,%fp0              !! 554         fmulx           %fp1,%fp0               | ...2^(J/64)*(Exp(R)-1)
554         fmovemx (%a7)+,%fp2-%fp2/%fp3   | ...f    555         fmovemx (%a7)+,%fp2-%fp2/%fp3   | ...fp2 restored
555         fadds           (%a1),%fp0      | ...a    556         fadds           (%a1),%fp0      | ...accurate 2^(J/64)
556                                                   557 
557         faddx           %fp1,%fp0              !! 558         faddx           %fp1,%fp0               | ...2^(J/64) + 2^(J/64)*...
558         movel           ADJFLAG(%a6),%d0          559         movel           ADJFLAG(%a6),%d0
559                                                   560 
560 |--Step 6                                         561 |--Step 6
561         tstl            %d0                       562         tstl            %d0
562         beqs            NORMAL                    563         beqs            NORMAL
563 ADJUST:                                           564 ADJUST:
564         fmulx           ADJSCALE(%a6),%fp0        565         fmulx           ADJSCALE(%a6),%fp0
565 NORMAL:                                           566 NORMAL:
566         fmovel          %d1,%FPCR              !! 567         fmovel          %d1,%FPCR               | ...restore user FPCR
567         fmulx           SCALE(%a6),%fp0 | ...m    568         fmulx           SCALE(%a6),%fp0 | ...multiply 2^(M)
568         bra             t_frcinx                  569         bra             t_frcinx
569                                                   570 
570 EXPSM:                                            571 EXPSM:
571 |--Step 7                                         572 |--Step 7
572         fmovemx (%a0),%fp0-%fp0 | ...in case X    573         fmovemx (%a0),%fp0-%fp0 | ...in case X is denormalized
573         fmovel          %d1,%FPCR                 574         fmovel          %d1,%FPCR
574         fadds           #0x3F800000,%fp0          575         fadds           #0x3F800000,%fp0        | ...1+X in user mode
575         bra             t_frcinx                  576         bra             t_frcinx
576                                                   577 
577 EXPBIG:                                           578 EXPBIG:
578 |--Step 8                                         579 |--Step 8
579         cmpil           #0x400CB27C,%d0 | ...1    580         cmpil           #0x400CB27C,%d0 | ...16480 log2
580         bgts            EXP2BIG                   581         bgts            EXP2BIG
581 |--Steps 8.2 -- 8.6                               582 |--Steps 8.2 -- 8.6
582         fmovex          (%a0),%fp0      | ...l    583         fmovex          (%a0),%fp0      | ...load input from (a0)
583                                                   584 
584         fmovex          %fp0,%fp1                 585         fmovex          %fp0,%fp1
585         fmuls           #0x42B8AA3B,%fp0          586         fmuls           #0x42B8AA3B,%fp0        | ...64/log2 * X
586         fmovemx  %fp2-%fp2/%fp3,-(%a7)            587         fmovemx  %fp2-%fp2/%fp3,-(%a7)          | ...save fp2
587         movel           #1,ADJFLAG(%a6)           588         movel           #1,ADJFLAG(%a6)
588         fmovel          %fp0,%d0                  589         fmovel          %fp0,%d0                | ...N = int( X * 64/log2 )
589         lea             EXPTBL,%a1                590         lea             EXPTBL,%a1
590         fmovel          %d0,%fp0                  591         fmovel          %d0,%fp0                | ...convert to floating-format
591         movel           %d0,L_SCR1(%a6)           592         movel           %d0,L_SCR1(%a6)                 | ...save N temporarily
592         andil           #0x3F,%d0                 593         andil           #0x3F,%d0                | ...D0 is J = N mod 64
593         lsll            #4,%d0                    594         lsll            #4,%d0
594         addal           %d0,%a1                   595         addal           %d0,%a1                 | ...address of 2^(J/64)
595         movel           L_SCR1(%a6),%d0           596         movel           L_SCR1(%a6),%d0
596         asrl            #6,%d0                    597         asrl            #6,%d0                  | ...D0 is K
597         movel           %d0,L_SCR1(%a6)           598         movel           %d0,L_SCR1(%a6)                 | ...save K temporarily
598         asrl            #1,%d0                    599         asrl            #1,%d0                  | ...D0 is M1
599         subl            %d0,L_SCR1(%a6)           600         subl            %d0,L_SCR1(%a6)                 | ...a1 is M
600         addiw           #0x3FFF,%d0               601         addiw           #0x3FFF,%d0             | ...biased expo. of 2^(M1)
601         movew           %d0,ADJSCALE(%a6)         602         movew           %d0,ADJSCALE(%a6)               | ...ADJSCALE := 2^(M1)
602         clrw            ADJSCALE+2(%a6)           603         clrw            ADJSCALE+2(%a6)
603         movel           #0x80000000,ADJSCALE+4    604         movel           #0x80000000,ADJSCALE+4(%a6)
604         clrl            ADJSCALE+8(%a6)           605         clrl            ADJSCALE+8(%a6)
605         movel           L_SCR1(%a6),%d0           606         movel           L_SCR1(%a6),%d0                 | ...D0 is M
606         addiw           #0x3FFF,%d0               607         addiw           #0x3FFF,%d0             | ...biased expo. of 2^(M)
607         bra             EXPCONT1                  608         bra             EXPCONT1                | ...go back to Step 3
608                                                   609 
609 EXP2BIG:                                          610 EXP2BIG:
610 |--Step 9                                         611 |--Step 9
611         fmovel          %d1,%FPCR                 612         fmovel          %d1,%FPCR
612         movel           (%a0),%d0                 613         movel           (%a0),%d0
613         bclrb           #sign_bit,(%a0)           614         bclrb           #sign_bit,(%a0)         | ...setox always returns positive
614         cmpil           #0,%d0                    615         cmpil           #0,%d0
615         blt             t_unfl                    616         blt             t_unfl
616         bra             t_ovfl                    617         bra             t_ovfl
617                                                   618 
618         .global setoxm1d                          619         .global setoxm1d
619 setoxm1d:                                         620 setoxm1d:
620 |--entry point for EXPM1(X), here X is denorma    621 |--entry point for EXPM1(X), here X is denormalized
621 |--Step 0.                                        622 |--Step 0.
622         bra             t_extdnrm                 623         bra             t_extdnrm
623                                                   624 
624                                                   625 
625         .global setoxm1                           626         .global setoxm1
626 setoxm1:                                          627 setoxm1:
627 |--entry point for EXPM1(X), here X is finite,    628 |--entry point for EXPM1(X), here X is finite, non-zero, non-NaN
628                                                   629 
629 |--Step 1.                                        630 |--Step 1.
630 |--Step 1.1                                       631 |--Step 1.1
631         movel           (%a0),%d0        | ...    632         movel           (%a0),%d0        | ...load part of input X
632         andil           #0x7FFF0000,%d0 | ...b    633         andil           #0x7FFF0000,%d0 | ...biased expo. of X
633         cmpil           #0x3FFD0000,%d0 | ...1    634         cmpil           #0x3FFD0000,%d0 | ...1/4
634         bges            EM1CON1  | ...|X| >= 1    635         bges            EM1CON1  | ...|X| >= 1/4
635         bra             EM1SM                     636         bra             EM1SM
636                                                   637 
637 EM1CON1:                                          638 EM1CON1:
638 |--Step 1.3                                       639 |--Step 1.3
639 |--The case |X| >= 1/4                            640 |--The case |X| >= 1/4
640         movew           4(%a0),%d0      | ...e    641         movew           4(%a0),%d0      | ...expo. and partial sig. of |X|
641         cmpil           #0x4004C215,%d0 | ...7    642         cmpil           #0x4004C215,%d0 | ...70log2 rounded up to 16 bits
642         bles            EM1MAIN  | ...1/4 <= |    643         bles            EM1MAIN  | ...1/4 <= |X| <= 70log2
643         bra             EM1BIG                    644         bra             EM1BIG
644                                                   645 
645 EM1MAIN:                                          646 EM1MAIN:
646 |--Step 2.                                        647 |--Step 2.
647 |--This is the case:    1/4 <= |X| <= 70 log2.    648 |--This is the case:    1/4 <= |X| <= 70 log2.
648         fmovex          (%a0),%fp0      | ...l    649         fmovex          (%a0),%fp0      | ...load input from (a0)
649                                                   650 
650         fmovex          %fp0,%fp1                 651         fmovex          %fp0,%fp1
651         fmuls           #0x42B8AA3B,%fp0          652         fmuls           #0x42B8AA3B,%fp0        | ...64/log2 * X
652         fmovemx %fp2-%fp2/%fp3,-(%a7)             653         fmovemx %fp2-%fp2/%fp3,-(%a7)           | ...save fp2
653 |       MOVE.W          #$3F81,EM1A4              654 |       MOVE.W          #$3F81,EM1A4            ...prefetch in CB mode
654         fmovel          %fp0,%d0                  655         fmovel          %fp0,%d0                | ...N = int( X * 64/log2 )
655         lea             EXPTBL,%a1                656         lea             EXPTBL,%a1
656         fmovel          %d0,%fp0                  657         fmovel          %d0,%fp0                | ...convert to floating-format
657                                                   658 
658         movel           %d0,L_SCR1(%a6)           659         movel           %d0,L_SCR1(%a6)                 | ...save N temporarily
659         andil           #0x3F,%d0                 660         andil           #0x3F,%d0                | ...D0 is J = N mod 64
660         lsll            #4,%d0                    661         lsll            #4,%d0
661         addal           %d0,%a1                   662         addal           %d0,%a1                 | ...address of 2^(J/64)
662         movel           L_SCR1(%a6),%d0           663         movel           L_SCR1(%a6),%d0
663         asrl            #6,%d0                    664         asrl            #6,%d0                  | ...D0 is M
664         movel           %d0,L_SCR1(%a6)           665         movel           %d0,L_SCR1(%a6)                 | ...save a copy of M
665 |       MOVE.W          #$3FDC,L2                 666 |       MOVE.W          #$3FDC,L2               ...prefetch L2 in CB mode
666                                                   667 
667 |--Step 3.                                        668 |--Step 3.
668 |--fp1,fp2 saved on the stack. fp0 is N, fp1 i    669 |--fp1,fp2 saved on the stack. fp0 is N, fp1 is X,
669 |--a0 points to 2^(J/64), D0 and a1 both conta    670 |--a0 points to 2^(J/64), D0 and a1 both contain M
670         fmovex          %fp0,%fp2                 671         fmovex          %fp0,%fp2
671         fmuls           #0xBC317218,%fp0          672         fmuls           #0xBC317218,%fp0        | ...N * L1, L1 = lead(-log2/64)
672         fmulx           L2,%fp2         | ...N    673         fmulx           L2,%fp2         | ...N * L2, L1+L2 = -log2/64
673         faddx           %fp1,%fp0        | ...    674         faddx           %fp1,%fp0        | ...X + N*L1
674         faddx           %fp2,%fp0        | ...    675         faddx           %fp2,%fp0        | ...fp0 is R, reduced arg.
675 |       MOVE.W          #$3FC5,EM1A2              676 |       MOVE.W          #$3FC5,EM1A2            ...load EM1A2 in cache
676         addiw           #0x3FFF,%d0               677         addiw           #0x3FFF,%d0             | ...D0 is biased expo. of 2^M
677                                                   678 
678 |--Step 4.                                        679 |--Step 4.
679 |--WE NOW COMPUTE EXP(R)-1 BY A POLYNOMIAL        680 |--WE NOW COMPUTE EXP(R)-1 BY A POLYNOMIAL
680 |-- R + R*R*(A1 + R*(A2 + R*(A3 + R*(A4 + R*(A    681 |-- R + R*R*(A1 + R*(A2 + R*(A3 + R*(A4 + R*(A5 + R*A6)))))
681 |--TO FULLY UTILIZE THE PIPELINE, WE COMPUTE S    682 |--TO FULLY UTILIZE THE PIPELINE, WE COMPUTE S = R*R
682 |--[R*S*(A2+S*(A4+S*A6))] + [R+S*(A1+S*(A3+S*A    683 |--[R*S*(A2+S*(A4+S*A6))] + [R+S*(A1+S*(A3+S*A5))]
683                                                   684 
684         fmovex          %fp0,%fp1                 685         fmovex          %fp0,%fp1
685         fmulx           %fp1,%fp1                 686         fmulx           %fp1,%fp1               | ...fp1 IS S = R*R
686                                                   687 
687         fmoves          #0x3950097B,%fp2          688         fmoves          #0x3950097B,%fp2        | ...fp2 IS a6
688 |       MOVE.W          #0,2(%a1)       ...loa    689 |       MOVE.W          #0,2(%a1)       ...load 2^(J/64) in cache
689                                                   690 
690         fmulx           %fp1,%fp2                 691         fmulx           %fp1,%fp2               | ...fp2 IS S*A6
691         fmovex          %fp1,%fp3                 692         fmovex          %fp1,%fp3
692         fmuls           #0x3AB60B6A,%fp3          693         fmuls           #0x3AB60B6A,%fp3        | ...fp3 IS S*A5
693                                                   694 
694         faddd           EM1A4,%fp2      | ...f    695         faddd           EM1A4,%fp2      | ...fp2 IS A4+S*A6
695         faddd           EM1A3,%fp3      | ...f    696         faddd           EM1A3,%fp3      | ...fp3 IS A3+S*A5
696         movew           %d0,SC(%a6)               697         movew           %d0,SC(%a6)             | ...SC is 2^(M) in extended
697         clrw            SC+2(%a6)                 698         clrw            SC+2(%a6)
698         movel           #0x80000000,SC+4(%a6)     699         movel           #0x80000000,SC+4(%a6)
699         clrl            SC+8(%a6)                 700         clrl            SC+8(%a6)
700                                                   701 
701         fmulx           %fp1,%fp2                 702         fmulx           %fp1,%fp2               | ...fp2 IS S*(A4+S*A6)
702         movel           L_SCR1(%a6),%d0           703         movel           L_SCR1(%a6),%d0         | ...D0 is      M
703         negw            %d0             | ...D    704         negw            %d0             | ...D0 is -M
704         fmulx           %fp1,%fp3                 705         fmulx           %fp1,%fp3               | ...fp3 IS S*(A3+S*A5)
705         addiw           #0x3FFF,%d0     | ...b    706         addiw           #0x3FFF,%d0     | ...biased expo. of 2^(-M)
706         faddd           EM1A2,%fp2      | ...f    707         faddd           EM1A2,%fp2      | ...fp2 IS A2+S*(A4+S*A6)
707         fadds           #0x3F000000,%fp3          708         fadds           #0x3F000000,%fp3        | ...fp3 IS A1+S*(A3+S*A5)
708                                                   709 
709         fmulx           %fp1,%fp2                 710         fmulx           %fp1,%fp2               | ...fp2 IS S*(A2+S*(A4+S*A6))
710         oriw            #0x8000,%d0     | ...s    711         oriw            #0x8000,%d0     | ...signed/expo. of -2^(-M)
711         movew           %d0,ONEBYSC(%a6)          712         movew           %d0,ONEBYSC(%a6)        | ...OnebySc is -2^(-M)
712         clrw            ONEBYSC+2(%a6)            713         clrw            ONEBYSC+2(%a6)
713         movel           #0x80000000,ONEBYSC+4(    714         movel           #0x80000000,ONEBYSC+4(%a6)
714         clrl            ONEBYSC+8(%a6)            715         clrl            ONEBYSC+8(%a6)
715         fmulx           %fp3,%fp1                 716         fmulx           %fp3,%fp1               | ...fp1 IS S*(A1+S*(A3+S*A5))
716 |                                       ...fp3    717 |                                       ...fp3 released
717                                                   718 
718         fmulx           %fp0,%fp2                 719         fmulx           %fp0,%fp2               | ...fp2 IS R*S*(A2+S*(A4+S*A6))
719         faddx           %fp1,%fp0                 720         faddx           %fp1,%fp0               | ...fp0 IS R+S*(A1+S*(A3+S*A5))
720 |                                       ...fp1    721 |                                       ...fp1 released
721                                                   722 
722         faddx           %fp2,%fp0                 723         faddx           %fp2,%fp0               | ...fp0 IS EXP(R)-1
723 |                                       ...fp2    724 |                                       ...fp2 released
724         fmovemx (%a7)+,%fp2-%fp2/%fp3   | ...f    725         fmovemx (%a7)+,%fp2-%fp2/%fp3   | ...fp2 restored
725                                                   726 
726 |--Step 5                                         727 |--Step 5
727 |--Compute 2^(J/64)*p                             728 |--Compute 2^(J/64)*p
728                                                   729 
729         fmulx           (%a1),%fp0      | ...2    730         fmulx           (%a1),%fp0      | ...2^(J/64)*(Exp(R)-1)
730                                                   731 
731 |--Step 6                                         732 |--Step 6
732 |--Step 6.1                                       733 |--Step 6.1
733         movel           L_SCR1(%a6),%d0           734         movel           L_SCR1(%a6),%d0         | ...retrieve M
734         cmpil           #63,%d0                   735         cmpil           #63,%d0
735         bles            MLE63                     736         bles            MLE63
736 |--Step 6.2     M >= 64                           737 |--Step 6.2     M >= 64
737         fmoves          12(%a1),%fp1    | ...f    738         fmoves          12(%a1),%fp1    | ...fp1 is t
738         faddx           ONEBYSC(%a6),%fp1         739         faddx           ONEBYSC(%a6),%fp1       | ...fp1 is t+OnebySc
739         faddx           %fp1,%fp0                 740         faddx           %fp1,%fp0               | ...p+(t+OnebySc), fp1 released
740         faddx           (%a1),%fp0      | ...T    741         faddx           (%a1),%fp0      | ...T+(p+(t+OnebySc))
741         bras            EM1SCALE                  742         bras            EM1SCALE
742 MLE63:                                            743 MLE63:
743 |--Step 6.3     M <= 63                           744 |--Step 6.3     M <= 63
744         cmpil           #-3,%d0                   745         cmpil           #-3,%d0
745         bges            MGEN3                     746         bges            MGEN3
746 MLTN3:                                            747 MLTN3:
747 |--Step 6.4     M <= -4                           748 |--Step 6.4     M <= -4
748         fadds           12(%a1),%fp0    | ...p    749         fadds           12(%a1),%fp0    | ...p+t
749         faddx           (%a1),%fp0      | ...T    750         faddx           (%a1),%fp0      | ...T+(p+t)
750         faddx           ONEBYSC(%a6),%fp0         751         faddx           ONEBYSC(%a6),%fp0       | ...OnebySc + (T+(p+t))
751         bras            EM1SCALE                  752         bras            EM1SCALE
752 MGEN3:                                            753 MGEN3:
753 |--Step 6.5     -3 <= M <= 63                     754 |--Step 6.5     -3 <= M <= 63
754         fmovex          (%a1)+,%fp1     | ...f    755         fmovex          (%a1)+,%fp1     | ...fp1 is T
755         fadds           (%a1),%fp0      | ...f    756         fadds           (%a1),%fp0      | ...fp0 is p+t
756         faddx           ONEBYSC(%a6),%fp1         757         faddx           ONEBYSC(%a6),%fp1       | ...fp1 is T+OnebySc
757         faddx           %fp1,%fp0                 758         faddx           %fp1,%fp0               | ...(T+OnebySc)+(p+t)
758                                                   759 
759 EM1SCALE:                                         760 EM1SCALE:
760 |--Step 6.6                                       761 |--Step 6.6
761         fmovel          %d1,%FPCR                 762         fmovel          %d1,%FPCR
762         fmulx           SC(%a6),%fp0              763         fmulx           SC(%a6),%fp0
763                                                   764 
764         bra             t_frcinx                  765         bra             t_frcinx
765                                                   766 
766 EM1SM:                                            767 EM1SM:
767 |--Step 7       |X| < 1/4.                        768 |--Step 7       |X| < 1/4.
768         cmpil           #0x3FBE0000,%d0 | ...2    769         cmpil           #0x3FBE0000,%d0 | ...2^(-65)
769         bges            EM1POLY                   770         bges            EM1POLY
770                                                   771 
771 EM1TINY:                                          772 EM1TINY:
772 |--Step 8       |X| < 2^(-65)                     773 |--Step 8       |X| < 2^(-65)
773         cmpil           #0x00330000,%d0 | ...2    774         cmpil           #0x00330000,%d0 | ...2^(-16312)
774         blts            EM12TINY                  775         blts            EM12TINY
775 |--Step 8.2                                       776 |--Step 8.2
776         movel           #0x80010000,SC(%a6)       777         movel           #0x80010000,SC(%a6)     | ...SC is -2^(-16382)
777         movel           #0x80000000,SC+4(%a6)     778         movel           #0x80000000,SC+4(%a6)
778         clrl            SC+8(%a6)                 779         clrl            SC+8(%a6)
779         fmovex          (%a0),%fp0                780         fmovex          (%a0),%fp0
780         fmovel          %d1,%FPCR                 781         fmovel          %d1,%FPCR
781         faddx           SC(%a6),%fp0              782         faddx           SC(%a6),%fp0
782                                                   783 
783         bra             t_frcinx                  784         bra             t_frcinx
784                                                   785 
785 EM12TINY:                                         786 EM12TINY:
786 |--Step 8.3                                       787 |--Step 8.3
787         fmovex          (%a0),%fp0                788         fmovex          (%a0),%fp0
788         fmuld           TWO140,%fp0               789         fmuld           TWO140,%fp0
789         movel           #0x80010000,SC(%a6)       790         movel           #0x80010000,SC(%a6)
790         movel           #0x80000000,SC+4(%a6)     791         movel           #0x80000000,SC+4(%a6)
791         clrl            SC+8(%a6)                 792         clrl            SC+8(%a6)
792         faddx           SC(%a6),%fp0              793         faddx           SC(%a6),%fp0
793         fmovel          %d1,%FPCR                 794         fmovel          %d1,%FPCR
794         fmuld           TWON140,%fp0              795         fmuld           TWON140,%fp0
795                                                   796 
796         bra             t_frcinx                  797         bra             t_frcinx
797                                                   798 
798 EM1POLY:                                          799 EM1POLY:
799 |--Step 9       exp(X)-1 by a simple polynomia    800 |--Step 9       exp(X)-1 by a simple polynomial
800         fmovex          (%a0),%fp0      | ...f    801         fmovex          (%a0),%fp0      | ...fp0 is X
801         fmulx           %fp0,%fp0                 802         fmulx           %fp0,%fp0               | ...fp0 is S := X*X
802         fmovemx %fp2-%fp2/%fp3,-(%a7)   | ...s    803         fmovemx %fp2-%fp2/%fp3,-(%a7)   | ...save fp2
803         fmoves          #0x2F30CAA8,%fp1          804         fmoves          #0x2F30CAA8,%fp1        | ...fp1 is B12
804         fmulx           %fp0,%fp1                 805         fmulx           %fp0,%fp1               | ...fp1 is S*B12
805         fmoves          #0x310F8290,%fp2          806         fmoves          #0x310F8290,%fp2        | ...fp2 is B11
806         fadds           #0x32D73220,%fp1          807         fadds           #0x32D73220,%fp1        | ...fp1 is B10+S*B12
807                                                   808 
808         fmulx           %fp0,%fp2                 809         fmulx           %fp0,%fp2               | ...fp2 is S*B11
809         fmulx           %fp0,%fp1                 810         fmulx           %fp0,%fp1               | ...fp1 is S*(B10 + ...
810                                                   811 
811         fadds           #0x3493F281,%fp2          812         fadds           #0x3493F281,%fp2        | ...fp2 is B9+S*...
812         faddd           EM1B8,%fp1      | ...f    813         faddd           EM1B8,%fp1      | ...fp1 is B8+S*...
813                                                   814 
814         fmulx           %fp0,%fp2                 815         fmulx           %fp0,%fp2               | ...fp2 is S*(B9+...
815         fmulx           %fp0,%fp1                 816         fmulx           %fp0,%fp1               | ...fp1 is S*(B8+...
816                                                   817 
817         faddd           EM1B7,%fp2      | ...f    818         faddd           EM1B7,%fp2      | ...fp2 is B7+S*...
818         faddd           EM1B6,%fp1      | ...f    819         faddd           EM1B6,%fp1      | ...fp1 is B6+S*...
819                                                   820 
820         fmulx           %fp0,%fp2                 821         fmulx           %fp0,%fp2               | ...fp2 is S*(B7+...
821         fmulx           %fp0,%fp1                 822         fmulx           %fp0,%fp1               | ...fp1 is S*(B6+...
822                                                   823 
823         faddd           EM1B5,%fp2      | ...f    824         faddd           EM1B5,%fp2      | ...fp2 is B5+S*...
824         faddd           EM1B4,%fp1      | ...f    825         faddd           EM1B4,%fp1      | ...fp1 is B4+S*...
825                                                   826 
826         fmulx           %fp0,%fp2                 827         fmulx           %fp0,%fp2               | ...fp2 is S*(B5+...
827         fmulx           %fp0,%fp1                 828         fmulx           %fp0,%fp1               | ...fp1 is S*(B4+...
828                                                   829 
829         faddd           EM1B3,%fp2      | ...f    830         faddd           EM1B3,%fp2      | ...fp2 is B3+S*...
830         faddx           EM1B2,%fp1      | ...f    831         faddx           EM1B2,%fp1      | ...fp1 is B2+S*...
831                                                   832 
832         fmulx           %fp0,%fp2                 833         fmulx           %fp0,%fp2               | ...fp2 is S*(B3+...
833         fmulx           %fp0,%fp1                 834         fmulx           %fp0,%fp1               | ...fp1 is S*(B2+...
834                                                   835 
835         fmulx           %fp0,%fp2                 836         fmulx           %fp0,%fp2               | ...fp2 is S*S*(B3+...)
836         fmulx           (%a0),%fp1      | ...f    837         fmulx           (%a0),%fp1      | ...fp1 is X*S*(B2...
837                                                   838 
838         fmuls           #0x3F000000,%fp0          839         fmuls           #0x3F000000,%fp0        | ...fp0 is S*B1
839         faddx           %fp2,%fp1                 840         faddx           %fp2,%fp1               | ...fp1 is Q
840 |                                       ...fp2    841 |                                       ...fp2 released
841                                                   842 
842         fmovemx (%a7)+,%fp2-%fp2/%fp3   | ...f    843         fmovemx (%a7)+,%fp2-%fp2/%fp3   | ...fp2 restored
843                                                   844 
844         faddx           %fp1,%fp0                 845         faddx           %fp1,%fp0               | ...fp0 is S*B1+Q
845 |                                       ...fp1    846 |                                       ...fp1 released
846                                                   847 
847         fmovel          %d1,%FPCR                 848         fmovel          %d1,%FPCR
848         faddx           (%a0),%fp0                849         faddx           (%a0),%fp0
849                                                   850 
850         bra             t_frcinx                  851         bra             t_frcinx
851                                                   852 
852 EM1BIG:                                           853 EM1BIG:
853 |--Step 10      |X| > 70 log2                     854 |--Step 10      |X| > 70 log2
854         movel           (%a0),%d0                 855         movel           (%a0),%d0
855         cmpil           #0,%d0                    856         cmpil           #0,%d0
856         bgt             EXPC1                     857         bgt             EXPC1
857 |--Step 10.2                                      858 |--Step 10.2
858         fmoves          #0xBF800000,%fp0          859         fmoves          #0xBF800000,%fp0        | ...fp0 is -1
859         fmovel          %d1,%FPCR                 860         fmovel          %d1,%FPCR
860         fadds           #0x00800000,%fp0          861         fadds           #0x00800000,%fp0        | ...-1 + 2^(-126)
861                                                   862 
862         bra             t_frcinx                  863         bra             t_frcinx
863                                                   864 
864         |end                                      865         |end
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php