~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/arch/sparc/lib/M7memset.S

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /arch/sparc/lib/M7memset.S (Version linux-6.12-rc7) and /arch/sparc/lib/M7memset.S (Version linux-4.10.17)


  1 /*                                                
  2  * M7memset.S: SPARC M7 optimized memset.         
  3  *                                                
  4  * Copyright (c) 2016, Oracle and/or its affil    
  5  */                                               
  6                                                   
  7 /*                                                
  8  * M7memset.S: M7 optimized memset.               
  9  *                                                
 10  * char *memset(sp, c, n)                         
 11  *                                                
 12  * Set an array of n chars starting at sp to t    
 13  * Return sp.                                     
 14  *                                                
 15  * Fast assembler language version of the foll    
 16  * which represents the `standard' for the C-l    
 17  *                                                
 18  *      void *                                    
 19  *      memset(void *sp1, int c, size_t n)        
 20  *      {                                         
 21  *          if (n != 0) {                         
 22  *              char *sp = sp1;                   
 23  *              do {                              
 24  *                  *sp++ = (char)c;              
 25  *              } while (--n != 0);               
 26  *          }                                     
 27  *          return (sp1);                         
 28  *      }                                         
 29  *                                                
 30  * The algorithm is as follows :                  
 31  *                                                
 32  *      For small 6 or fewer bytes stores, byt    
 33  *                                                
 34  *      For less than 32 bytes stores, align t    
 35  *      Then store as many 4-byte chunks, foll    
 36  *                                                
 37  *      For sizes greater than 32 bytes, align    
 38  *      if (count >= 64) {                        
 39  *              store 8-bytes chunks to align     
 40  *              if (value to be set is zero &&    
 41  *                      Using BIS stores, set     
 42  *                      64-byte cache line to     
 43  *                      other seven long words    
 44  *              }                                 
 45  *              else if (count >= MIN_LOOP) {     
 46  *                      Using BIS stores, set     
 47  *                      ST_CHUNK cache lines (    
 48  *                      loop is entered.          
 49  *                      In the main loop, cont    
 50  *                      word of each cache lin    
 51  *                      setting the other seve    
 52  *                      cache line until fewer    
 53  *                      Then set the remaining    
 54  *                      line that has already     
 55  *              }                                 
 56  *              store remaining data in 64-byt    
 57  *              64 bytes remain.                  
 58  *       }                                        
 59  *       Store as many 8-byte chunks, followed    
 60  *                                                
 61  * BIS = Block Init Store                         
 62  *   Doing the advance store of the first elem    
 63  *   initiates the displacement of a cache lin    
 64  *   instruction in the pipeline. That avoids     
 65  *   such as filling the miss buffer. The perf    
 66  *   similar to prefetching for normal stores.    
 67  *   The special case for zero fills runs fast    
 68  *   cycles than the normal memset loop.          
 69  *                                                
 70  * We only use BIS for memset of greater than     
 71  * BIS stores must be followed by a membar #St    
 72  * the BIS store must be balanced against the     
 73  */                                               
 74                                                   
 75 /*                                                
 76  * ASI_STBI_P marks the cache line as "least r    
 77  * which means if many threads are active, it     
 78  * of being pushed out of the cache between th    
 79  * store and the final stores.                    
 80  * Thus, we use ASI_STBIMRU_P which marks the     
 81  * "most recently used" for all but the last s    
 82  */                                               
 83                                                   
 84 #include <asm/asi.h>                              
 85 #include <asm/page.h>                             
 86                                                   
 87 #define ASI_STBI_P      ASI_BLK_INIT_QUAD_LDD_    
 88 #define ASI_STBIMRU_P   ASI_ST_BLKINIT_MRU_P      
 89                                                   
 90                                                   
 91 #define ST_CHUNK        24   /* multiple of 4     
 92 #define MIN_LOOP        16320                     
 93 #define MIN_ZERO        512                       
 94                                                   
 95         .section        ".text"                   
 96         .align          32                        
 97                                                   
 98 /*                                                
 99  * Define clear_page(dest) as memset(dest, 0,     
100  * (can create a more optimized version later.    
101  */                                               
102         .globl          M7clear_page              
103         .globl          M7clear_user_page         
104 M7clear_page:           /* clear_page(dest) */    
105 M7clear_user_page:                                
106         set     PAGE_SIZE, %o1                    
107         /* fall through into bzero code */        
108                                                   
109         .size           M7clear_page,.-M7clear    
110         .size           M7clear_user_page,.-M7    
111                                                   
112 /*                                                
113  * Define bzero(dest, n) as memset(dest, 0, n)    
114  * (can create a more optimized version later.    
115  */                                               
116         .globl          M7bzero                   
117 M7bzero:                /* bzero(dest, size) *    
118         mov     %o1, %o2                          
119         mov     0, %o1                            
120         /* fall through into memset code */       
121                                                   
122         .size           M7bzero,.-M7bzero         
123                                                   
124         .global         M7memset                  
125         .type           M7memset, #function       
126         .register       %g3, #scratch             
127 M7memset:                                         
128         mov     %o0, %o5                ! copy    
129         cmp     %o2, 7                  ! if s    
130         bleu,pn %xcc, .wrchar                     
131          and     %o1, 0xff, %o1          ! o1     
132                                                   
133         sll     %o1, 8, %o3                       
134         or      %o1, %o3, %o1           ! now     
135         sll     %o1, 16, %o3                      
136         cmp     %o2, 32                           
137         blu,pn  %xcc, .wdalign                    
138          or      %o1, %o3, %o1           ! now    
139                                                   
140         sllx    %o1, 32, %o3                      
141         or      %o1, %o3, %o1           ! now     
142                                                   
143 .dbalign:                                         
144         andcc   %o5, 7, %o3             ! is s    
145         bz,pt   %xcc, .blkalign         ! alre    
146          sub     %o3, 8, %o3             ! -(b    
147                                                   
148         add     %o2, %o3, %o2           ! upda    
149         ! Set -(%o3) bytes till sp1 long word     
150 1:      stb     %o1, [%o5]              ! ther    
151         inccc   %o3                     ! byte    
152         bl,pt   %xcc, 1b                          
153          inc     %o5                              
154                                                   
155         ! Now sp1 is long word aligned (sp1 is    
156 .blkalign:                                        
157         cmp     %o2, 64                 ! chec    
158         blu,pn  %xcc, .wrshort                    
159          mov     %o2, %o3                         
160                                                   
161         andcc   %o5, 63, %o3            ! is s    
162         bz,pt   %xcc, .blkwr            ! now     
163          sub     %o3, 64, %o3            ! o3     
164         add     %o2, %o3, %o2           ! o2 i    
165                                                   
166         ! Store -(%o3) bytes till dst is block    
167         ! Use long word stores.                   
168         ! Recall that dst is already long word    
169 1:                                                
170         addcc   %o3, 8, %o3                       
171         stx     %o1, [%o5]                        
172         bl,pt   %xcc, 1b                          
173          add     %o5, 8, %o5                      
174                                                   
175         ! Now sp1 is block aligned                
176 .blkwr:                                           
177         andn    %o2, 63, %o4            ! calc    
178         brz,pn  %o1, .wrzero            ! spec    
179          and     %o2, 63, %o3            ! %o3    
180                                                   
181         set     MIN_LOOP, %g1                     
182         cmp     %o4, %g1                ! chec    
183         blu,pn  %xcc, .short_set        ! to j    
184                                         ! must    
185          nop                                      
186                                                   
187         ! initial cache-clearing stores           
188         ! get store pipeline moving               
189         rd      %asi, %g3               ! save    
190         wr     %g0, ASI_STBIMRU_P, %asi           
191                                                   
192         ! Primary memset loop for large memset    
193 .wr_loop:                                         
194         sub     %o5, 8, %o5             ! adju    
195         mov     ST_CHUNK, %g1                     
196 .wr_loop_start:                                   
197         stxa    %o1, [%o5+8]%asi                  
198         subcc   %g1, 4, %g1                       
199         stxa    %o1, [%o5+8+64]%asi               
200         add     %o5, 256, %o5                     
201         stxa    %o1, [%o5+8-128]%asi              
202         bgu     %xcc, .wr_loop_start              
203          stxa    %o1, [%o5+8-64]%asi              
204                                                   
205         sub     %o5, ST_CHUNK*64, %o5   ! rese    
206         mov     ST_CHUNK, %g1                     
207                                                   
208 .wr_loop_rest:                                    
209         stxa    %o1, [%o5+8+8]%asi                
210         sub     %o4, 64, %o4                      
211         stxa    %o1, [%o5+16+8]%asi               
212         subcc   %g1, 1, %g1                       
213         stxa    %o1, [%o5+24+8]%asi               
214         stxa    %o1, [%o5+32+8]%asi               
215         stxa    %o1, [%o5+40+8]%asi               
216         add     %o5, 64, %o5                      
217         stxa    %o1, [%o5-8]%asi                  
218         bgu     %xcc, .wr_loop_rest               
219          stxa    %o1, [%o5]ASI_STBI_P             
220                                                   
221         ! If more than ST_CHUNK*64 bytes remai    
222         ! setting the first long word of each     
223         ! to keep the store pipeline moving.      
224                                                   
225         cmp     %o4, ST_CHUNK*64                  
226         bge,pt  %xcc, .wr_loop_start              
227          mov     ST_CHUNK, %g1                    
228                                                   
229         brz,a,pn %o4, .asi_done                   
230          add     %o5, 8, %o5             ! res    
231                                                   
232 .wr_loop_small:                                   
233         stxa    %o1, [%o5+8]%asi                  
234         stxa    %o1, [%o5+8+8]%asi                
235         stxa    %o1, [%o5+16+8]%asi               
236         stxa    %o1, [%o5+24+8]%asi               
237         stxa    %o1, [%o5+32+8]%asi               
238         subcc   %o4, 64, %o4                      
239         stxa    %o1, [%o5+40+8]%asi               
240         add     %o5, 64, %o5                      
241         stxa    %o1, [%o5-8]%asi                  
242         bgu,pt  %xcc, .wr_loop_small              
243          stxa    %o1, [%o5]ASI_STBI_P             
244                                                   
245         ba      .asi_done                         
246          add     %o5, 8, %o5             ! res    
247                                                   
248         ! Special case loop for zero fill mems    
249         ! For each 64 byte cache line, single     
250         ! clears line                             
251 .wrzero:                                          
252         cmp     %o4, MIN_ZERO           ! chec    
253                                         ! to p    
254         blu     %xcc, .short_set                  
255          nop                                      
256         sub     %o4, 256, %o4                     
257                                                   
258 .wrzero_loop:                                     
259         mov     64, %g3                           
260         stxa    %o1, [%o5]ASI_STBI_P              
261         subcc   %o4, 256, %o4                     
262         stxa    %o1, [%o5+%g3]ASI_STBI_P          
263         add     %o5, 256, %o5                     
264         sub     %g3, 192, %g3                     
265         stxa    %o1, [%o5+%g3]ASI_STBI_P          
266         add %g3, 64, %g3                          
267         bge,pt  %xcc, .wrzero_loop                
268          stxa    %o1, [%o5+%g3]ASI_STBI_P         
269         add     %o4, 256, %o4                     
270                                                   
271         brz,pn  %o4, .bsi_done                    
272          nop                                      
273                                                   
274 .wrzero_small:                                    
275         stxa    %o1, [%o5]ASI_STBI_P              
276         subcc   %o4, 64, %o4                      
277         bgu,pt  %xcc, .wrzero_small               
278          add     %o5, 64, %o5                     
279         ba,a    .bsi_done                         
280                                                   
281 .asi_done:                                        
282         wr      %g3, 0x0, %asi          ! rest    
283 .bsi_done:                                        
284         membar  #StoreStore             ! requ    
285                                                   
286 .short_set:                                       
287         cmp     %o4, 64                 ! chec    
288         blu     %xcc, 5f                          
289          nop                                      
290 4:                                      ! set     
291         stx     %o1, [%o5]                        
292         stx     %o1, [%o5+8]                      
293         stx     %o1, [%o5+16]                     
294         stx     %o1, [%o5+24]                     
295         subcc   %o4, 64, %o4                      
296         stx     %o1, [%o5+32]                     
297         stx     %o1, [%o5+40]                     
298         add     %o5, 64, %o5                      
299         stx     %o1, [%o5-16]                     
300         bgu,pt  %xcc, 4b                          
301          stx     %o1, [%o5-8]                     
302                                                   
303 5:                                                
304         ! Set the remaining long words            
305 .wrshort:                                         
306         subcc   %o3, 8, %o3             ! Can     
307         blu,pn  %xcc, .wrchars                    
308          and     %o2, 7, %o2             ! cal    
309 6:                                                
310         subcc   %o3, 8, %o3                       
311         stx     %o1, [%o5]              ! stor    
312         bgeu,pt %xcc, 6b                          
313          add     %o5, 8, %o5                      
314                                                   
315 .wrchars:                               ! chec    
316         brnz    %o2, .wrfin                       
317          nop                                      
318         retl                                      
319          nop                                      
320                                                   
321 .wdalign:                                         
322         andcc   %o5, 3, %o3             ! is s    
323         bz,pn   %xcc, .wrword                     
324          andn    %o2, 3, %o3             ! cre    
325                                                   
326         dec     %o2                     ! decr    
327         stb     %o1, [%o5]              ! clea    
328         b       .wdalign                          
329          inc     %o5                     ! nex    
330                                                   
331 .wrword:                                          
332         subcc   %o3, 4, %o3                       
333         st      %o1, [%o5]              ! 4-by    
334         bnz,pt  %xcc, .wrword                     
335          add     %o5, 4, %o5                      
336                                                   
337         and     %o2, 3, %o2             ! left    
338                                                   
339 .wrchar:                                          
340         ! Set the remaining bytes, if any         
341         brz     %o2, .exit                        
342          nop                                      
343 .wrfin:                                           
344         deccc   %o2                               
345         stb     %o1, [%o5]                        
346         bgu,pt  %xcc, .wrfin                      
347          inc     %o5                              
348 .exit:                                            
349         retl                            ! %o0     
350          nop                                      
351                                                   
352         .size           M7memset,.-M7memset       
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php