~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/admin-guide/mm/numaperf.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/admin-guide/mm/numaperf.rst (Version linux-6.12-rc7) and /Documentation/admin-guide/mm/numaperf.rst (Version linux-5.4.285)


  1 =======================                        !!   1 .. _numaperf:
  2 NUMA Memory Performance                        << 
  3 =======================                        << 
  4                                                     2 
                                                   >>   3 =============
  5 NUMA Locality                                       4 NUMA Locality
  6 =============                                       5 =============
  7                                                     6 
  8 Some platforms may have multiple types of memo      7 Some platforms may have multiple types of memory attached to a compute
  9 node. These disparate memory ranges may share       8 node. These disparate memory ranges may share some characteristics, such
 10 as CPU cache coherence, but may have different      9 as CPU cache coherence, but may have different performance. For example,
 11 different media types and buses affect bandwid     10 different media types and buses affect bandwidth and latency.
 12                                                    11 
 13 A system supports such heterogeneous memory by     12 A system supports such heterogeneous memory by grouping each memory type
 14 under different domains, or "nodes", based on      13 under different domains, or "nodes", based on locality and performance
 15 characteristics.  Some memory may share the sa     14 characteristics.  Some memory may share the same node as a CPU, and others
 16 are provided as memory only nodes. While memor     15 are provided as memory only nodes. While memory only nodes do not provide
 17 CPUs, they may still be local to one or more c     16 CPUs, they may still be local to one or more compute nodes relative to
 18 other nodes. The following diagram shows one s     17 other nodes. The following diagram shows one such example of two compute
 19 nodes with local memory and a memory only node     18 nodes with local memory and a memory only node for each of compute node::
 20                                                    19 
 21  +------------------+     +------------------+     20  +------------------+     +------------------+
 22  | Compute Node 0   +-----+ Compute Node 1   |     21  | Compute Node 0   +-----+ Compute Node 1   |
 23  | Local Node0 Mem  |     | Local Node1 Mem  |     22  | Local Node0 Mem  |     | Local Node1 Mem  |
 24  +--------+---------+     +--------+---------+     23  +--------+---------+     +--------+---------+
 25           |                        |               24           |                        |
 26  +--------+---------+     +--------+---------+     25  +--------+---------+     +--------+---------+
 27  | Slower Node2 Mem |     | Slower Node3 Mem |     26  | Slower Node2 Mem |     | Slower Node3 Mem |
 28  +------------------+     +--------+---------+     27  +------------------+     +--------+---------+
 29                                                    28 
 30 A "memory initiator" is a node containing one      29 A "memory initiator" is a node containing one or more devices such as
 31 CPUs or separate memory I/O devices that can i     30 CPUs or separate memory I/O devices that can initiate memory requests.
 32 A "memory target" is a node containing one or      31 A "memory target" is a node containing one or more physical address
 33 ranges accessible from one or more memory init     32 ranges accessible from one or more memory initiators.
 34                                                    33 
 35 When multiple memory initiators exist, they ma     34 When multiple memory initiators exist, they may not all have the same
 36 performance when accessing a given memory targ     35 performance when accessing a given memory target. Each initiator-target
 37 pair may be organized into different ranked ac     36 pair may be organized into different ranked access classes to represent
 38 this relationship. The highest performing init     37 this relationship. The highest performing initiator to a given target
 39 is considered to be one of that target's local     38 is considered to be one of that target's local initiators, and given
 40 the highest access class, 0. Any given target      39 the highest access class, 0. Any given target may have one or more
 41 local initiators, and any given initiator may      40 local initiators, and any given initiator may have multiple local
 42 memory targets.                                    41 memory targets.
 43                                                    42 
 44 To aid applications matching memory targets wi     43 To aid applications matching memory targets with their initiators, the
 45 kernel provides symlinks to each other. The fo     44 kernel provides symlinks to each other. The following example lists the
 46 relationship for the access class "0" memory i     45 relationship for the access class "0" memory initiators and targets::
 47                                                    46 
 48         # symlinks -v /sys/devices/system/node     47         # symlinks -v /sys/devices/system/node/nodeX/access0/targets/
 49         relative: /sys/devices/system/node/nod     48         relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY
 50                                                    49 
 51         # symlinks -v /sys/devices/system/node     50         # symlinks -v /sys/devices/system/node/nodeY/access0/initiators/
 52         relative: /sys/devices/system/node/nod     51         relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX
 53                                                    52 
 54 A memory initiator may have multiple memory ta     53 A memory initiator may have multiple memory targets in the same access
 55 class. The target memory's initiators in a giv     54 class. The target memory's initiators in a given class indicate the
 56 nodes' access characteristics share the same p     55 nodes' access characteristics share the same performance relative to other
 57 linked initiator nodes. Each target within an      56 linked initiator nodes. Each target within an initiator's access class,
 58 though, do not necessarily perform the same as     57 though, do not necessarily perform the same as each other.
 59                                                    58 
 60 The access class "1" is used to allow differen !!  59 ================
 61 that are CPUs and hence suitable for generic t << 
 62 IO initiators such as GPUs and NICs.  Unlike a << 
 63 nodes containing CPUs are considered.          << 
 64                                                << 
 65 NUMA Performance                                   60 NUMA Performance
 66 ================                                   61 ================
 67                                                    62 
 68 Applications may wish to consider which node t     63 Applications may wish to consider which node they want their memory to
 69 be allocated from based on the node's performa     64 be allocated from based on the node's performance characteristics. If
 70 the system provides these attributes, the kern     65 the system provides these attributes, the kernel exports them under the
 71 node sysfs hierarchy by appending the attribut     66 node sysfs hierarchy by appending the attributes directory under the
 72 memory node's access class 0 initiators as fol     67 memory node's access class 0 initiators as follows::
 73                                                    68 
 74         /sys/devices/system/node/nodeY/access0     69         /sys/devices/system/node/nodeY/access0/initiators/
 75                                                    70 
 76 These attributes apply only when accessed from     71 These attributes apply only when accessed from nodes that have the
 77 are linked under the this access's initiators. !!  72 are linked under the this access's inititiators.
 78                                                    73 
 79 The performance characteristics the kernel pro     74 The performance characteristics the kernel provides for the local initiators
 80 are exported are as follows::                      75 are exported are as follows::
 81                                                    76 
 82         # tree -P "read*|write*" /sys/devices/     77         # tree -P "read*|write*" /sys/devices/system/node/nodeY/access0/initiators/
 83         /sys/devices/system/node/nodeY/access0     78         /sys/devices/system/node/nodeY/access0/initiators/
 84         |-- read_bandwidth                         79         |-- read_bandwidth
 85         |-- read_latency                           80         |-- read_latency
 86         |-- write_bandwidth                        81         |-- write_bandwidth
 87         `-- write_latency                          82         `-- write_latency
 88                                                    83 
 89 The bandwidth attributes are provided in MiB/s     84 The bandwidth attributes are provided in MiB/second.
 90                                                    85 
 91 The latency attributes are provided in nanosec     86 The latency attributes are provided in nanoseconds.
 92                                                    87 
 93 The values reported here correspond to the rat     88 The values reported here correspond to the rated latency and bandwidth
 94 for the platform.                                  89 for the platform.
 95                                                    90 
 96 Access class 1 takes the same form but only in !!  91 ==========
 97 memory activity.                               << 
 98                                                << 
 99 NUMA Cache                                         92 NUMA Cache
100 ==========                                         93 ==========
101                                                    94 
102 System memory may be constructed in a hierarch     95 System memory may be constructed in a hierarchy of elements with various
103 performance characteristics in order to provid     96 performance characteristics in order to provide large address space of
104 slower performing memory cached by a smaller h     97 slower performing memory cached by a smaller higher performing memory. The
105 system physical addresses memory  initiators a     98 system physical addresses memory  initiators are aware of are provided
106 by the last memory level in the hierarchy. The     99 by the last memory level in the hierarchy. The system meanwhile uses
107 higher performing memory to transparently cach    100 higher performing memory to transparently cache access to progressively
108 slower levels.                                    101 slower levels.
109                                                   102 
110 The term "far memory" is used to denote the la    103 The term "far memory" is used to denote the last level memory in the
111 hierarchy. Each increasing cache level provide    104 hierarchy. Each increasing cache level provides higher performing
112 initiator access, and the term "near memory" r    105 initiator access, and the term "near memory" represents the fastest
113 cache provided by the system.                     106 cache provided by the system.
114                                                   107 
115 This numbering is different than CPU caches wh    108 This numbering is different than CPU caches where the cache level (ex:
116 L1, L2, L3) uses the CPU-side view where each     109 L1, L2, L3) uses the CPU-side view where each increased level is lower
117 performing. In contrast, the memory cache leve    110 performing. In contrast, the memory cache level is centric to the last
118 level memory, so the higher numbered cache lev    111 level memory, so the higher numbered cache level corresponds to  memory
119 nearer to the CPU, and further from far memory    112 nearer to the CPU, and further from far memory.
120                                                   113 
121 The memory-side caches are not directly addres    114 The memory-side caches are not directly addressable by software. When
122 software accesses a system address, the system    115 software accesses a system address, the system will return it from the
123 near memory cache if it is present. If it is n    116 near memory cache if it is present. If it is not present, the system
124 accesses the next level of memory until there     117 accesses the next level of memory until there is either a hit in that
125 cache level, or it reaches far memory.            118 cache level, or it reaches far memory.
126                                                   119 
127 An application does not need to know about cac    120 An application does not need to know about caching attributes in order
128 to use the system. Software may optionally que    121 to use the system. Software may optionally query the memory cache
129 attributes in order to maximize the performanc    122 attributes in order to maximize the performance out of such a setup.
130 If the system provides a way for the kernel to    123 If the system provides a way for the kernel to discover this information,
131 for example with ACPI HMAT (Heterogeneous Memo    124 for example with ACPI HMAT (Heterogeneous Memory Attribute Table),
132 the kernel will append these attributes to the    125 the kernel will append these attributes to the NUMA node memory target.
133                                                   126 
134 When the kernel first registers a memory cache    127 When the kernel first registers a memory cache with a node, the kernel
135 will create the following directory::             128 will create the following directory::
136                                                   129 
137         /sys/devices/system/node/nodeX/memory_    130         /sys/devices/system/node/nodeX/memory_side_cache/
138                                                   131 
139 If that directory is not present, the system e !! 132 If that directory is not present, the system either does not not provide
140 a memory-side cache, or that information is no    133 a memory-side cache, or that information is not accessible to the kernel.
141                                                   134 
142 The attributes for each level of cache is prov    135 The attributes for each level of cache is provided under its cache
143 level index::                                     136 level index::
144                                                   137 
145         /sys/devices/system/node/nodeX/memory_    138         /sys/devices/system/node/nodeX/memory_side_cache/indexA/
146         /sys/devices/system/node/nodeX/memory_    139         /sys/devices/system/node/nodeX/memory_side_cache/indexB/
147         /sys/devices/system/node/nodeX/memory_    140         /sys/devices/system/node/nodeX/memory_side_cache/indexC/
148                                                   141 
149 Each cache level's directory provides its attr    142 Each cache level's directory provides its attributes. For example, the
150 following shows a single cache level and the a    143 following shows a single cache level and the attributes available for
151 software to query::                               144 software to query::
152                                                   145 
153         # tree /sys/devices/system/node/node0/ !! 146         # tree sys/devices/system/node/node0/memory_side_cache/
154         /sys/devices/system/node/node0/memory_    147         /sys/devices/system/node/node0/memory_side_cache/
155         |-- index1                                148         |-- index1
156         |   |-- indexing                          149         |   |-- indexing
157         |   |-- line_size                         150         |   |-- line_size
158         |   |-- size                              151         |   |-- size
159         |   `-- write_policy                      152         |   `-- write_policy
160                                                   153 
161 The "indexing" will be 0 if it is a direct-map    154 The "indexing" will be 0 if it is a direct-mapped cache, and non-zero
162 for any other indexed based, multi-way associa    155 for any other indexed based, multi-way associativity.
163                                                   156 
164 The "line_size" is the number of bytes accesse    157 The "line_size" is the number of bytes accessed from the next cache
165 level on a miss.                                  158 level on a miss.
166                                                   159 
167 The "size" is the number of bytes provided by     160 The "size" is the number of bytes provided by this cache level.
168                                                   161 
169 The "write_policy" will be 0 for write-back, a    162 The "write_policy" will be 0 for write-back, and non-zero for
170 write-through caching.                            163 write-through caching.
171                                                   164 
                                                   >> 165 ========
172 See Also                                          166 See Also
173 ========                                          167 ========
174                                                   168 
175 [1] https://www.uefi.org/sites/default/files/r    169 [1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
176 - Section 5.2.27                                  170 - Section 5.2.27
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php