~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/arch/m68k/fpsp040/setox.S

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /arch/m68k/fpsp040/setox.S (Architecture m68k) and /arch/mips/fpsp040/setox.S (Architecture mips)


  1 |                                                 
  2 |       setox.sa 3.1 12/10/90                     
  3 |                                                 
  4 |       The entry point setox computes the exp    
  5 |       setoxd does the same except the input     
  6 |       number. setoxm1 computes exp(X)-1, and    
  7 |       exp(X)-1 for denormalized X.              
  8 |                                                 
  9 |       INPUT                                     
 10 |       -----                                     
 11 |       Double-extended value in memory locati    
 12 |       register a0.                              
 13 |                                                 
 14 |       OUTPUT                                    
 15 |       ------                                    
 16 |       exp(X) or exp(X)-1 returned in floatin    
 17 |                                                 
 18 |       ACCURACY and MONOTONICITY                 
 19 |       -------------------------                 
 20 |       The returned result is within 0.85 ulp    
 21 |       within 0.5001 ulp to 53 bits if the re    
 22 |       to double precision. The result is pro    
 23 |       precision.                                
 24 |                                                 
 25 |       SPEED                                     
 26 |       -----                                     
 27 |       Two timings are measured, both in the     
 28 |       first one is measured when the functio    
 29 |       (so the instructions and data are not     
 30 |       second one is measured when the functi    
 31 |       input argument.                           
 32 |                                                 
 33 |       The program setox takes approximately     
 34 |       argument X whose magnitude is less tha    
 35 |       is the usual situation. For the less c    
 36 |       depending on their values, the program    
 37 |       but no worse than 10% slower even in t    
 38 |                                                 
 39 |       The program setoxm1 takes approximatel    
 40 |       argument X, 0.25 <= |X| < 70log2. For     
 41 |       approximately ??? / ??? cycles. For th    
 42 |       depending on their values, the program    
 43 |       but no worse than 10% slower even in t    
 44 |                                                 
 45 |       ALGORITHM and IMPLEMENTATION NOTES        
 46 |       ----------------------------------        
 47 |                                                 
 48 |       setoxd                                    
 49 |       ------                                    
 50 |       Step 1. Set ans := 1.0                    
 51 |                                                 
 52 |       Step 2. Return  ans := ans + sign(X)*2    
 53 |       Notes:  This will always generate one     
 54 |                                                 
 55 |                                                 
 56 |       setox                                     
 57 |       -----                                     
 58 |                                                 
 59 |       Step 1. Filter out extreme cases of in    
 60 |               1.1     If |X| >= 2^(-65), go     
 61 |               1.2     Go to Step 7.             
 62 |               1.3     If |X| < 16380 log(2),    
 63 |               1.4     Go to Step 8.             
 64 |       Notes:  The usual case should take the    
 65 |                To avoid the use of floating-    
 66 |                compact representation of |X|    
 67 |                32-bit integer, the upper (mo    
 68 |                the sign and biased exponent     
 69 |                bits are the 16 most signific    
 70 |                explicit bit) bits of |X|. Co    
 71 |                in Steps 1.1 and 1.3 can be p    
 72 |                Note also that the constant 1    
 73 |                is also in the compact form.     
 74 |                to Step 2 guarantees |X| < 16    
 75 |                to have a small number of cas    
 76 |                but close to, 16380 log(2) an    
 77 |                taken.                           
 78 |                                                 
 79 |       Step 2. Calculate N = round-to-nearest    
 80 |               2.1     Set AdjFlag := 0 (indi    
 81 |               2.2     N := round-to-nearest-    
 82 |               2.3     Calculate       J = N     
 83 |               2.4     Calculate       M = (N    
 84 |               2.5     Calculate the address     
 85 |               2.6     Create the value Scale    
 86 |       Notes:  The calculation in 2.2 is real    
 87 |                                                 
 88 |                       Z := X * constant         
 89 |                       N := round-to-nearest-    
 90 |                                                 
 91 |                where                            
 92 |                                                 
 93 |                       constant := single-pre    
 94 |                                                 
 95 |                Using a single-precision cons    
 96 |                Another effect of using a sin    
 97 |                that the calculated value Z i    
 98 |                                                 
 99 |                       Z = X*(64/log2)*(1+eps    
100 |                                                 
101 |                This error has to be consider    
102 |                                                 
103 |       Step 3. Calculate X - N*log2/64.          
104 |               3.1     R := X + N*L1, where L    
105 |               3.2     R := R + N*L2, L2 := e    
106 |       Notes:  a) The way L1 and L2 are chose    
107 |                the value      -log2/64          
108 |                b) N*L1 is exact because N is    
109 |                L1 is no longer than 24 bits.    
110 |                c) The calculation X+N*L1 is     
111 |                Thus, R is practically X+N(L1    
112 |                d) It is important to estimat    
113 |                Step 3.2.                        
114 |                                                 
115 |                       N = rnd-to-int( X*64/l    
116 |                       X*64/log2 (1+eps)         
117 |                       X*64/log2 - N   =         
118 |                       X - N*log2/64   =         
119 |                                                 
120 |                                                 
121 |                Now |X| <= 16446 log2, thus      
122 |                                                 
123 |                       |X - N*log2/64| <= (0.    
124 |                                       <= 0.5    
125 |                This bound will be used in St    
126 |                                                 
127 |       Step 4. Approximate exp(R)-1 by a poly    
128 |                       p = R + R*R*(A1 + R*(A    
129 |       Notes:  a) In order to reduce memory a    
130 |                made as "short" as possible:     
131 |                are single precision; A2 and     
132 |                b) Even with the restrictions    
133 |                       |p - (exp(R)-1)| < 2^(    
134 |                Note that 0.0062 is slightly     
135 |                c) To fully utilize the pipel    
136 |                two independent pieces of rou    
137 |                       p = [ R + R*S*(A2 + S*    
138 |                               [ S*(A1 + S*(A    
139 |                where S = R*R.                   
140 |                                                 
141 |       Step 5. Compute 2^(J/64)*exp(R) = 2^(J    
142 |                               ans := T + ( T    
143 |                where T and t are the stored     
144 |       Notes:  2^(J/64) is stored as T and t     
145 |                2^(J/64) to roughly 85 bits;     
146 |                and t is in single precision.    
147 |                to 62 bits so that the last t    
148 |                reason for such a special for    
149 |                will all be exact --- a prope    
150 |                more accurate computation of     
151 |                                                 
152 |       Step 6. Reconstruction of exp(X)          
153 |                       exp(X) = 2^M * 2^(J/64    
154 |               6.1     If AdjFlag = 0, go to     
155 |               6.2     ans := ans * AdjScale     
156 |               6.3     Restore the user FPCR     
157 |               6.4     Return ans := ans * Sc    
158 |       Notes:  If AdjFlag = 0, we have X = Ml    
159 |                |M| <= 16380, and Scale = 2^M    
160 |                neither overflow nor underflo    
161 |                means that                       
162 |                       X = (M1+M)log2 + Jlog2    
163 |                Hence, exp(X) may overflow or    
164 |                When that is the case, AdjSca    
165 |                approximately M. Thus 6.2 wil    
166 |                Possible exception in 6.4 is     
167 |                The inexact exception is not     
168 |                one can argue that the inexac    
169 |                raised, to simulate that exce    
170 |                flag is worth in practical us    
171 |                                                 
172 |       Step 7. Return 1 + X.                     
173 |               7.1     ans := X                  
174 |               7.2     Restore user FPCR.        
175 |               7.3     Return ans := 1 + ans.    
176 |       Notes:  For non-zero X, the inexact ex    
177 |                raised by 7.3. That is the on    
178 |                Note also that we use the FMO    
179 |                in Step 7.1 to avoid unnecess    
180 |                the FMOVEM may not seem relev    
181 |                the precaution will be useful    
182 |                this code where the separate     
183 |                will be done away with.)         
184 |                                                 
185 |       Step 8. Handle exp(X) where |X| >= 163    
186 |               8.1     If |X| > 16480 log2, g    
187 |               (mimic 2.2 - 2.6)                 
188 |               8.2     N := round-to-integer(    
189 |               8.3     Calculate J = N mod 64    
190 |               8.4     K := (N-J)/64, M1 := t    
191 |               8.5     Calculate the address     
192 |               8.6     Create the values Scal    
193 |               8.7     Go to Step 3.             
194 |       Notes:  Refer to notes for 2.2 - 2.6.     
195 |                                                 
196 |       Step 9. Handle exp(X), |X| > 16480 log    
197 |               9.1     If X < 0, go to 9.3       
198 |               9.2     ans := Huge, go to 9.4    
199 |               9.3     ans := Tiny.              
200 |               9.4     Restore user FPCR.        
201 |               9.5     Return ans := ans * an    
202 |       Notes:  Exp(X) will surely overflow or    
203 |                X's sign. "Huge" and "Tiny" a    
204 |                extended-precision numbers wh    
205 |                with an inexact result. Thus,    
206 |                inexact together with either     
207 |                                                 
208 |                                                 
209 |       setoxm1d                                  
210 |       --------                                  
211 |                                                 
212 |       Step 1. Set ans := 0                      
213 |                                                 
214 |       Step 2. Return  ans := X + ans. Exit.     
215 |       Notes:  This will return X with the ap    
216 |                precision prescribed by the u    
217 |                                                 
218 |       setoxm1                                   
219 |       -------                                   
220 |                                                 
221 |       Step 1. Check |X|                         
222 |               1.1     If |X| >= 1/4, go to S    
223 |               1.2     Go to Step 7.             
224 |               1.3     If |X| < 70 log(2), go    
225 |               1.4     Go to Step 10.            
226 |       Notes:  The usual case should take the    
227 |                However, it is conceivable |X    
228 |                because EXPM1 is intended to     
229 |                when |X| is small. For furthe    
230 |                see the notes on Step 1 of se    
231 |                                                 
232 |       Step 2. Calculate N = round-to-nearest    
233 |               2.1     N := round-to-nearest-    
234 |               2.2     Calculate       J = N     
235 |               2.3     Calculate       M = (N    
236 |               2.4     Calculate the address     
237 |               2.5     Create the values Sc =    
238 |       Notes:  See the notes on Step 2 of set    
239 |                                                 
240 |       Step 3. Calculate X - N*log2/64.          
241 |               3.1     R := X + N*L1, where L    
242 |               3.2     R := R + N*L2, L2 := e    
243 |       Notes:  Applying the analysis of Step     
244 |                shows that |R| <= 0.0055 (not    
245 |                this case).                      
246 |                                                 
247 |       Step 4. Approximate exp(R)-1 by a poly    
248 |                       p = R+R*R*(A1+R*(A2+R*    
249 |       Notes:  a) In order to reduce memory a    
250 |                made as "short" as possible:     
251 |                are single precision; A2, A3     
252 |                b) Even with the restriction     
253 |                       |p - (exp(R)-1)| <        
254 |                for all |R| <= 0.0055.           
255 |                c) To fully utilize the pipel    
256 |                two independent pieces of rou    
257 |                       p = [ R*S*(A2 + S*(A4     
258 |                               [ R + S*(A1 +     
259 |                where S = R*R.                   
260 |                                                 
261 |       Step 5. Compute 2^(J/64)*p by             
262 |                               p := T*p          
263 |                where T and t are the stored     
264 |       Notes:  2^(J/64) is stored as T and t     
265 |                2^(J/64) to roughly 85 bits;     
266 |                and t is in single precision.    
267 |                to 62 bits so that the last t    
268 |                reason for such a special for    
269 |                will all be exact --- a prope    
270 |                in Step 6 below. The total re    
271 |                bigger than 2^(-67.7) compare    
272 |                                                 
273 |       Step 6. Reconstruction of exp(X)-1        
274 |                       exp(X)-1 = 2^M * ( 2^(    
275 |               6.1     If M <= 63, go to Step    
276 |               6.2     ans := T + (p + (t + O    
277 |               6.3     If M >= -3, go to 6.5.    
278 |               6.4     ans := (T + (p + t)) +    
279 |               6.5     ans := (T + OnebySc) +    
280 |               6.6     Restore user FPCR.        
281 |               6.7     Return ans := Sc * ans    
282 |       Notes:  The various arrangements of th    
283 |                evaluations.                     
284 |                                                 
285 |       Step 7. exp(X)-1 for |X| < 1/4.           
286 |               7.1     If |X| >= 2^(-65), go     
287 |               7.2     Go to Step 8.             
288 |                                                 
289 |       Step 8. Calculate exp(X)-1, |X| < 2^(-    
290 |               8.1     If |X| < 2^(-16312), g    
291 |               8.2     Restore FPCR; return a    
292 |               8.3     X := X * 2^(140).         
293 |               8.4     Restore FPCR; ans := a    
294 |                Return ans := ans*2^(140). Ex    
295 |       Notes:  The idea is to return "X - tin    
296 |                precision and rounding modes.    
297 |                inefficiency, we stay away fr    
298 |                best we can. For |X| >= 2^(-1    
299 |                8.2 generates the inexact exc    
300 |                                                 
301 |       Step 9. Calculate exp(X)-1, |X| < 1/4,    
302 |                       p = X + X*X*(B1 + X*(B    
303 |       Notes:  a) In order to reduce memory a    
304 |                made as "short" as possible:     
305 |                are single precision; B3 to B    
306 |                B2 is double extended.           
307 |                b) Even with the restriction     
308 |                       |p - (exp(X)-1)| < |X|    
309 |                for all |X| <= 0.251.            
310 |                Note that 0.251 is slightly b    
311 |                c) To fully preserve accuracy    
312 |                as     X + ( S*B1 +    Q ) wh    
313 |                       Q       =       X*S*(B    
314 |                d) To fully utilize the pipel    
315 |                two independent pieces of rou    
316 |                       Q = [ X*S*(B2 + S*(B4     
317 |                               [ S*S*(B3 + S*    
318 |                                                 
319 |       Step 10.        Calculate exp(X)-1 for    
320 |               10.1 If X >= 70log2 , exp(X) -    
321 |                purposes. Therefore, go to St    
322 |               10.2 If X <= -70log2, exp(X) -    
323 |                ans := -1                        
324 |                Restore user FPCR                
325 |                Return ans := ans + 2^(-126).    
326 |       Notes:  10.2 will always create an ine    
327 |                in the user rounding precisio    
328 |                                                 
329 |                                                 
330                                                   
331 |               Copyright (C) Motorola, Inc. 1    
332 |                       All Rights Reserved       
333 |                                                 
334 |       For details on the license for this fi    
335 |       file, README, in this same directory.     
336                                                   
337 |setox  idnt    2,1 | Motorola 040 Floating Po    
338                                                   
339         |section        8                         
340                                                   
341 #include "fpsp.h"                                 
342                                                   
343 L2:     .long   0x3FDC0000,0x82E30865,0x4361C4    
344                                                   
345 EXPA3:  .long   0x3FA55555,0x55554431             
346 EXPA2:  .long   0x3FC55555,0x55554018             
347                                                   
348 HUGE:   .long   0x7FFE0000,0xFFFFFFFF,0xFFFFFF    
349 TINY:   .long   0x00010000,0xFFFFFFFF,0xFFFFFF    
350                                                   
351 EM1A4:  .long   0x3F811111,0x11174385             
352 EM1A3:  .long   0x3FA55555,0x55554F5A             
353                                                   
354 EM1A2:  .long   0x3FC55555,0x55555555,0x000000    
355                                                   
356 EM1B8:  .long   0x3EC71DE3,0xA5774682             
357 EM1B7:  .long   0x3EFA01A0,0x19D7CB68             
358                                                   
359 EM1B6:  .long   0x3F2A01A0,0x1A019DF3             
360 EM1B5:  .long   0x3F56C16C,0x16C170E2             
361                                                   
362 EM1B4:  .long   0x3F811111,0x11111111             
363 EM1B3:  .long   0x3FA55555,0x55555555             
364                                                   
365 EM1B2:  .long   0x3FFC0000,0xAAAAAAAA,0xAAAAAA    
366         .long   0x00000000                        
367                                                   
368 TWO140: .long   0x48B00000,0x00000000             
369 TWON140:        .long   0x37300000,0x00000000     
370                                                   
371 EXPTBL:                                           
372         .long   0x3FFF0000,0x80000000,0x000000    
373         .long   0x3FFF0000,0x8164D1F3,0xBC0307    
374         .long   0x3FFF0000,0x82CD8698,0xAC2BA1    
375         .long   0x3FFF0000,0x843A28C3,0xACDE40    
376         .long   0x3FFF0000,0x85AAC367,0xCC487B    
377         .long   0x3FFF0000,0x871F6196,0x9E8D10    
378         .long   0x3FFF0000,0x88980E80,0x92DA85    
379         .long   0x3FFF0000,0x8A14D575,0x496EFD    
380         .long   0x3FFF0000,0x8B95C1E3,0xEA8BD6    
381         .long   0x3FFF0000,0x8D1ADF5B,0x7E5BA9    
382         .long   0x3FFF0000,0x8EA4398B,0x45CD53    
383         .long   0x3FFF0000,0x9031DC43,0x1466B1    
384         .long   0x3FFF0000,0x91C3D373,0xAB11C3    
385         .long   0x3FFF0000,0x935A2B2F,0x13E6E9    
386         .long   0x3FFF0000,0x94F4EFA8,0xFEF709    
387         .long   0x3FFF0000,0x96942D37,0x20185A    
388         .long   0x3FFF0000,0x9837F051,0x8DB8A9    
389         .long   0x3FFF0000,0x99E04593,0x20B7FA    
390         .long   0x3FFF0000,0x9B8D39B9,0xD54E55    
391         .long   0x3FFF0000,0x9D3ED9A7,0x2CFFB7    
392         .long   0x3FFF0000,0x9EF53260,0x91A111    
393         .long   0x3FFF0000,0xA0B0510F,0xB9714F    
394         .long   0x3FFF0000,0xA2704303,0x0C4968    
395         .long   0x3FFF0000,0xA43515AE,0x09E680    
396         .long   0x3FFF0000,0xA5FED6A9,0xB15138    
397         .long   0x3FFF0000,0xA7CD93B4,0xE96535    
398         .long   0x3FFF0000,0xA9A15AB4,0xEA7C0E    
399         .long   0x3FFF0000,0xAB7A39B5,0xA93ED3    
400         .long   0x3FFF0000,0xAD583EEA,0x42A14A    
401         .long   0x3FFF0000,0xAF3B78AD,0x690A43    
402         .long   0x3FFF0000,0xB123F581,0xD2AC25    
403         .long   0x3FFF0000,0xB311C412,0xA91124    
404         .long   0x3FFF0000,0xB504F333,0xF9DE64    
405         .long   0x3FFF0000,0xB6FD91E3,0x28D177    
406         .long   0x3FFF0000,0xB8FBAF47,0x62FB9E    
407         .long   0x3FFF0000,0xBAFF5AB2,0x133E45    
408         .long   0x3FFF0000,0xBD08A39F,0x580C36    
409         .long   0x3FFF0000,0xBF1799B6,0x7A7310    
410         .long   0x3FFF0000,0xC12C4CCA,0x667094    
411         .long   0x3FFF0000,0xC346CCDA,0x249764    
412         .long   0x3FFF0000,0xC5672A11,0x5506DA    
413         .long   0x3FFF0000,0xC78D74C8,0xABB9B1    
414         .long   0x3FFF0000,0xC9B9BD86,0x6E2F27    
415         .long   0x3FFF0000,0xCBEC14FE,0xF2727C    
416         .long   0x3FFF0000,0xCE248C15,0x1F8480    
417         .long   0x3FFF0000,0xD06333DA,0xEF2B25    
418         .long   0x3FFF0000,0xD2A81D91,0xF12AE4    
419         .long   0x3FFF0000,0xD4F35AAB,0xCFEDFA    
420         .long   0x3FFF0000,0xD744FCCA,0xD69D6A    
421         .long   0x3FFF0000,0xD99D15C2,0x78AFD7    
422         .long   0x3FFF0000,0xDBFBB797,0xDAF237    
423         .long   0x3FFF0000,0xDE60F482,0x5E0E91    
424         .long   0x3FFF0000,0xE0CCDEEC,0x2A94E1    
425         .long   0x3FFF0000,0xE33F8972,0xBE8A5A    
426         .long   0x3FFF0000,0xE5B906E7,0x7C8348    
427         .long   0x3FFF0000,0xE8396A50,0x3C4BDC    
428         .long   0x3FFF0000,0xEAC0C6E7,0xDD2439    
429         .long   0x3FFF0000,0xED4F301E,0xD9942B    
430         .long   0x3FFF0000,0xEFE4B99B,0xDCDAF5    
431         .long   0x3FFF0000,0xF281773C,0x59FFB1    
432         .long   0x3FFF0000,0xF5257D15,0x2486CC    
433         .long   0x3FFF0000,0xF7D0DF73,0x0AD13B    
434         .long   0x3FFF0000,0xFA83B2DB,0x722A03    
435         .long   0x3FFF0000,0xFD3E0C0C,0xF486C1    
436                                                   
437         .set    ADJFLAG,L_SCR2                    
438         .set    SCALE,FP_SCR1                     
439         .set    ADJSCALE,FP_SCR2                  
440         .set    SC,FP_SCR3                        
441         .set    ONEBYSC,FP_SCR4                   
442                                                   
443         | xref  t_frcinx                          
444         |xref   t_extdnrm                         
445         |xref   t_unfl                            
446         |xref   t_ovfl                            
447                                                   
448         .global setoxd                            
449 setoxd:                                           
450 |--entry point for EXP(X), X is denormalized      
451         movel           (%a0),%d0                 
452         andil           #0x80000000,%d0           
453         oril            #0x00800000,%d0           
454         movel           %d0,-(%sp)                
455         fmoves          #0x3F800000,%fp0          
456         fmovel          %d1,%fpcr                 
457         fadds           (%sp)+,%fp0               
458         bra             t_frcinx                  
459                                                   
460         .global setox                             
461 setox:                                            
462 |--entry point for EXP(X), here X is finite, n    
463                                                   
464 |--Step 1.                                        
465         movel           (%a0),%d0        | ...    
466         andil           #0x7FFF0000,%d0 | ...b    
467         cmpil           #0x3FBE0000,%d0 | ...2    
468         bges            EXPC1           | ...n    
469         bra             EXPSM                     
470                                                   
471 EXPC1:                                            
472 |--The case |X| >= 2^(-65)                        
473         movew           4(%a0),%d0      | ...e    
474         cmpil           #0x400CB167,%d0 | ...1    
475         blts            EXPMAIN  | ...normal c    
476         bra             EXPBIG                    
477                                                   
478 EXPMAIN:                                          
479 |--Step 2.                                        
480 |--This is the normal branch:   2^(-65) <= |X|    
481         fmovex          (%a0),%fp0      | ...l    
482                                                   
483         fmovex          %fp0,%fp1                 
484         fmuls           #0x42B8AA3B,%fp0          
485         fmovemx %fp2-%fp2/%fp3,-(%a7)             
486         movel           #0,ADJFLAG(%a6)           
487         fmovel          %fp0,%d0                  
488         lea             EXPTBL,%a1                
489         fmovel          %d0,%fp0                  
490                                                   
491         movel           %d0,L_SCR1(%a6) | ...s    
492         andil           #0x3F,%d0                 
493         lsll            #4,%d0                    
494         addal           %d0,%a1         | ...a    
495         movel           L_SCR1(%a6),%d0           
496         asrl            #6,%d0          | ...D    
497         addiw           #0x3FFF,%d0     | ...b    
498         movew           L2,L_SCR1(%a6)  | ...p    
499                                                   
500 EXPCONT1:                                         
501 |--Step 3.                                        
502 |--fp1,fp2 saved on the stack. fp0 is N, fp1 i    
503 |--a0 points to 2^(J/64), D0 is biased expo. o    
504         fmovex          %fp0,%fp2                 
505         fmuls           #0xBC317218,%fp0          
506         fmulx           L2,%fp2         | ...N    
507         faddx           %fp1,%fp0                 
508         faddx           %fp2,%fp0                 
509 |       MOVE.W          #$3FA5,EXPA3    ...loa    
510                                                   
511 |--Step 4.                                        
512 |--WE NOW COMPUTE EXP(R)-1 BY A POLYNOMIAL        
513 |-- R + R*R*(A1 + R*(A2 + R*(A3 + R*(A4 + R*A5    
514 |--TO FULLY UTILIZE THE PIPELINE, WE COMPUTE S    
515 |--[R+R*S*(A2+S*A4)] + [S*(A1+S*(A3+S*A5))]       
516                                                   
517         fmovex          %fp0,%fp1                 
518         fmulx           %fp1,%fp1                 
519                                                   
520         fmoves          #0x3AB60B70,%fp2          
521 |       MOVE.W          #0,2(%a1)       ...loa    
522                                                   
523         fmulx           %fp1,%fp2                 
524         fmovex          %fp1,%fp3                 
525         fmuls           #0x3C088895,%fp3          
526                                                   
527         faddd           EXPA3,%fp2      | ...f    
528         faddd           EXPA2,%fp3      | ...f    
529                                                   
530         fmulx           %fp1,%fp2                 
531         movew           %d0,SCALE(%a6)  | ...S    
532         clrw            SCALE+2(%a6)              
533         movel           #0x80000000,SCALE+4(%a    
534         clrl            SCALE+8(%a6)              
535                                                   
536         fmulx           %fp1,%fp3                 
537                                                   
538         fadds           #0x3F000000,%fp2          
539         fmulx           %fp0,%fp3                 
540                                                   
541         fmulx           %fp1,%fp2                 
542         faddx           %fp3,%fp0                 
543 |                                       ...fp3    
544                                                   
545         fmovex          (%a1)+,%fp1     | ...f    
546         faddx           %fp2,%fp0                 
547 |                                       ...fp2    
548                                                   
549 |--Step 5                                         
550 |--final reconstruction process                   
551 |--EXP(X) = 2^M * ( 2^(J/64) + 2^(J/64)*(EXP(R    
552                                                   
553         fmulx           %fp1,%fp0                 
554         fmovemx (%a7)+,%fp2-%fp2/%fp3   | ...f    
555         fadds           (%a1),%fp0      | ...a    
556                                                   
557         faddx           %fp1,%fp0                 
558         movel           ADJFLAG(%a6),%d0          
559                                                   
560 |--Step 6                                         
561         tstl            %d0                       
562         beqs            NORMAL                    
563 ADJUST:                                           
564         fmulx           ADJSCALE(%a6),%fp0        
565 NORMAL:                                           
566         fmovel          %d1,%FPCR                 
567         fmulx           SCALE(%a6),%fp0 | ...m    
568         bra             t_frcinx                  
569                                                   
570 EXPSM:                                            
571 |--Step 7                                         
572         fmovemx (%a0),%fp0-%fp0 | ...in case X    
573         fmovel          %d1,%FPCR                 
574         fadds           #0x3F800000,%fp0          
575         bra             t_frcinx                  
576                                                   
577 EXPBIG:                                           
578 |--Step 8                                         
579         cmpil           #0x400CB27C,%d0 | ...1    
580         bgts            EXP2BIG                   
581 |--Steps 8.2 -- 8.6                               
582         fmovex          (%a0),%fp0      | ...l    
583                                                   
584         fmovex          %fp0,%fp1                 
585         fmuls           #0x42B8AA3B,%fp0          
586         fmovemx  %fp2-%fp2/%fp3,-(%a7)            
587         movel           #1,ADJFLAG(%a6)           
588         fmovel          %fp0,%d0                  
589         lea             EXPTBL,%a1                
590         fmovel          %d0,%fp0                  
591         movel           %d0,L_SCR1(%a6)           
592         andil           #0x3F,%d0                 
593         lsll            #4,%d0                    
594         addal           %d0,%a1                   
595         movel           L_SCR1(%a6),%d0           
596         asrl            #6,%d0                    
597         movel           %d0,L_SCR1(%a6)           
598         asrl            #1,%d0                    
599         subl            %d0,L_SCR1(%a6)           
600         addiw           #0x3FFF,%d0               
601         movew           %d0,ADJSCALE(%a6)         
602         clrw            ADJSCALE+2(%a6)           
603         movel           #0x80000000,ADJSCALE+4    
604         clrl            ADJSCALE+8(%a6)           
605         movel           L_SCR1(%a6),%d0           
606         addiw           #0x3FFF,%d0               
607         bra             EXPCONT1                  
608                                                   
609 EXP2BIG:                                          
610 |--Step 9                                         
611         fmovel          %d1,%FPCR                 
612         movel           (%a0),%d0                 
613         bclrb           #sign_bit,(%a0)           
614         cmpil           #0,%d0                    
615         blt             t_unfl                    
616         bra             t_ovfl                    
617                                                   
618         .global setoxm1d                          
619 setoxm1d:                                         
620 |--entry point for EXPM1(X), here X is denorma    
621 |--Step 0.                                        
622         bra             t_extdnrm                 
623                                                   
624                                                   
625         .global setoxm1                           
626 setoxm1:                                          
627 |--entry point for EXPM1(X), here X is finite,    
628                                                   
629 |--Step 1.                                        
630 |--Step 1.1                                       
631         movel           (%a0),%d0        | ...    
632         andil           #0x7FFF0000,%d0 | ...b    
633         cmpil           #0x3FFD0000,%d0 | ...1    
634         bges            EM1CON1  | ...|X| >= 1    
635         bra             EM1SM                     
636                                                   
637 EM1CON1:                                          
638 |--Step 1.3                                       
639 |--The case |X| >= 1/4                            
640         movew           4(%a0),%d0      | ...e    
641         cmpil           #0x4004C215,%d0 | ...7    
642         bles            EM1MAIN  | ...1/4 <= |    
643         bra             EM1BIG                    
644                                                   
645 EM1MAIN:                                          
646 |--Step 2.                                        
647 |--This is the case:    1/4 <= |X| <= 70 log2.    
648         fmovex          (%a0),%fp0      | ...l    
649                                                   
650         fmovex          %fp0,%fp1                 
651         fmuls           #0x42B8AA3B,%fp0          
652         fmovemx %fp2-%fp2/%fp3,-(%a7)             
653 |       MOVE.W          #$3F81,EM1A4              
654         fmovel          %fp0,%d0                  
655         lea             EXPTBL,%a1                
656         fmovel          %d0,%fp0                  
657                                                   
658         movel           %d0,L_SCR1(%a6)           
659         andil           #0x3F,%d0                 
660         lsll            #4,%d0                    
661         addal           %d0,%a1                   
662         movel           L_SCR1(%a6),%d0           
663         asrl            #6,%d0                    
664         movel           %d0,L_SCR1(%a6)           
665 |       MOVE.W          #$3FDC,L2                 
666                                                   
667 |--Step 3.                                        
668 |--fp1,fp2 saved on the stack. fp0 is N, fp1 i    
669 |--a0 points to 2^(J/64), D0 and a1 both conta    
670         fmovex          %fp0,%fp2                 
671         fmuls           #0xBC317218,%fp0          
672         fmulx           L2,%fp2         | ...N    
673         faddx           %fp1,%fp0        | ...    
674         faddx           %fp2,%fp0        | ...    
675 |       MOVE.W          #$3FC5,EM1A2              
676         addiw           #0x3FFF,%d0               
677                                                   
678 |--Step 4.                                        
679 |--WE NOW COMPUTE EXP(R)-1 BY A POLYNOMIAL        
680 |-- R + R*R*(A1 + R*(A2 + R*(A3 + R*(A4 + R*(A    
681 |--TO FULLY UTILIZE THE PIPELINE, WE COMPUTE S    
682 |--[R*S*(A2+S*(A4+S*A6))] + [R+S*(A1+S*(A3+S*A    
683                                                   
684         fmovex          %fp0,%fp1                 
685         fmulx           %fp1,%fp1                 
686                                                   
687         fmoves          #0x3950097B,%fp2          
688 |       MOVE.W          #0,2(%a1)       ...loa    
689                                                   
690         fmulx           %fp1,%fp2                 
691         fmovex          %fp1,%fp3                 
692         fmuls           #0x3AB60B6A,%fp3          
693                                                   
694         faddd           EM1A4,%fp2      | ...f    
695         faddd           EM1A3,%fp3      | ...f    
696         movew           %d0,SC(%a6)               
697         clrw            SC+2(%a6)                 
698         movel           #0x80000000,SC+4(%a6)     
699         clrl            SC+8(%a6)                 
700                                                   
701         fmulx           %fp1,%fp2                 
702         movel           L_SCR1(%a6),%d0           
703         negw            %d0             | ...D    
704         fmulx           %fp1,%fp3                 
705         addiw           #0x3FFF,%d0     | ...b    
706         faddd           EM1A2,%fp2      | ...f    
707         fadds           #0x3F000000,%fp3          
708                                                   
709         fmulx           %fp1,%fp2                 
710         oriw            #0x8000,%d0     | ...s    
711         movew           %d0,ONEBYSC(%a6)          
712         clrw            ONEBYSC+2(%a6)            
713         movel           #0x80000000,ONEBYSC+4(    
714         clrl            ONEBYSC+8(%a6)            
715         fmulx           %fp3,%fp1                 
716 |                                       ...fp3    
717                                                   
718         fmulx           %fp0,%fp2                 
719         faddx           %fp1,%fp0                 
720 |                                       ...fp1    
721                                                   
722         faddx           %fp2,%fp0                 
723 |                                       ...fp2    
724         fmovemx (%a7)+,%fp2-%fp2/%fp3   | ...f    
725                                                   
726 |--Step 5                                         
727 |--Compute 2^(J/64)*p                             
728                                                   
729         fmulx           (%a1),%fp0      | ...2    
730                                                   
731 |--Step 6                                         
732 |--Step 6.1                                       
733         movel           L_SCR1(%a6),%d0           
734         cmpil           #63,%d0                   
735         bles            MLE63                     
736 |--Step 6.2     M >= 64                           
737         fmoves          12(%a1),%fp1    | ...f    
738         faddx           ONEBYSC(%a6),%fp1         
739         faddx           %fp1,%fp0                 
740         faddx           (%a1),%fp0      | ...T    
741         bras            EM1SCALE                  
742 MLE63:                                            
743 |--Step 6.3     M <= 63                           
744         cmpil           #-3,%d0                   
745         bges            MGEN3                     
746 MLTN3:                                            
747 |--Step 6.4     M <= -4                           
748         fadds           12(%a1),%fp0    | ...p    
749         faddx           (%a1),%fp0      | ...T    
750         faddx           ONEBYSC(%a6),%fp0         
751         bras            EM1SCALE                  
752 MGEN3:                                            
753 |--Step 6.5     -3 <= M <= 63                     
754         fmovex          (%a1)+,%fp1     | ...f    
755         fadds           (%a1),%fp0      | ...f    
756         faddx           ONEBYSC(%a6),%fp1         
757         faddx           %fp1,%fp0                 
758                                                   
759 EM1SCALE:                                         
760 |--Step 6.6                                       
761         fmovel          %d1,%FPCR                 
762         fmulx           SC(%a6),%fp0              
763                                                   
764         bra             t_frcinx                  
765                                                   
766 EM1SM:                                            
767 |--Step 7       |X| < 1/4.                        
768         cmpil           #0x3FBE0000,%d0 | ...2    
769         bges            EM1POLY                   
770                                                   
771 EM1TINY:                                          
772 |--Step 8       |X| < 2^(-65)                     
773         cmpil           #0x00330000,%d0 | ...2    
774         blts            EM12TINY                  
775 |--Step 8.2                                       
776         movel           #0x80010000,SC(%a6)       
777         movel           #0x80000000,SC+4(%a6)     
778         clrl            SC+8(%a6)                 
779         fmovex          (%a0),%fp0                
780         fmovel          %d1,%FPCR                 
781         faddx           SC(%a6),%fp0              
782                                                   
783         bra             t_frcinx                  
784                                                   
785 EM12TINY:                                         
786 |--Step 8.3                                       
787         fmovex          (%a0),%fp0                
788         fmuld           TWO140,%fp0               
789         movel           #0x80010000,SC(%a6)       
790         movel           #0x80000000,SC+4(%a6)     
791         clrl            SC+8(%a6)                 
792         faddx           SC(%a6),%fp0              
793         fmovel          %d1,%FPCR                 
794         fmuld           TWON140,%fp0              
795                                                   
796         bra             t_frcinx                  
797                                                   
798 EM1POLY:                                          
799 |--Step 9       exp(X)-1 by a simple polynomia    
800         fmovex          (%a0),%fp0      | ...f    
801         fmulx           %fp0,%fp0                 
802         fmovemx %fp2-%fp2/%fp3,-(%a7)   | ...s    
803         fmoves          #0x2F30CAA8,%fp1          
804         fmulx           %fp0,%fp1                 
805         fmoves          #0x310F8290,%fp2          
806         fadds           #0x32D73220,%fp1          
807                                                   
808         fmulx           %fp0,%fp2                 
809         fmulx           %fp0,%fp1                 
810                                                   
811         fadds           #0x3493F281,%fp2          
812         faddd           EM1B8,%fp1      | ...f    
813                                                   
814         fmulx           %fp0,%fp2                 
815         fmulx           %fp0,%fp1                 
816                                                   
817         faddd           EM1B7,%fp2      | ...f    
818         faddd           EM1B6,%fp1      | ...f    
819                                                   
820         fmulx           %fp0,%fp2                 
821         fmulx           %fp0,%fp1                 
822                                                   
823         faddd           EM1B5,%fp2      | ...f    
824         faddd           EM1B4,%fp1      | ...f    
825                                                   
826         fmulx           %fp0,%fp2                 
827         fmulx           %fp0,%fp1                 
828                                                   
829         faddd           EM1B3,%fp2      | ...f    
830         faddx           EM1B2,%fp1      | ...f    
831                                                   
832         fmulx           %fp0,%fp2                 
833         fmulx           %fp0,%fp1                 
834                                                   
835         fmulx           %fp0,%fp2                 
836         fmulx           (%a0),%fp1      | ...f    
837                                                   
838         fmuls           #0x3F000000,%fp0          
839         faddx           %fp2,%fp1                 
840 |                                       ...fp2    
841                                                   
842         fmovemx (%a7)+,%fp2-%fp2/%fp3   | ...f    
843                                                   
844         faddx           %fp1,%fp0                 
845 |                                       ...fp1    
846                                                   
847         fmovel          %d1,%FPCR                 
848         faddx           (%a0),%fp0                
849                                                   
850         bra             t_frcinx                  
851                                                   
852 EM1BIG:                                           
853 |--Step 10      |X| > 70 log2                     
854         movel           (%a0),%d0                 
855         cmpil           #0,%d0                    
856         bgt             EXPC1                     
857 |--Step 10.2                                      
858         fmoves          #0xBF800000,%fp0          
859         fmovel          %d1,%FPCR                 
860         fadds           #0x00800000,%fp0          
861                                                   
862         bra             t_frcinx                  
863                                                   
864         |end                                      
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php