Linux/Documentation/admin-guide/mm/numaperf.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

1 ======================= !! 1 .. _numaperf: 2 NUMA Memory Performance << 3 ======================= << 4 2 >> 3 ============= 5 NUMA Locality 4 NUMA Locality 6 ============= 5 ============= 7 6 8 Some platforms may have multiple types of memo 7 Some platforms may have multiple types of memory attached to a compute 9 node. These disparate memory ranges may share 8 node. These disparate memory ranges may share some characteristics, such 10 as CPU cache coherence, but may have different 9 as CPU cache coherence, but may have different performance. For example, 11 different media types and buses affect bandwid 10 different media types and buses affect bandwidth and latency. 12 11 13 A system supports such heterogeneous memory by 12 A system supports such heterogeneous memory by grouping each memory type 14 under different domains, or "nodes", based on 13 under different domains, or "nodes", based on locality and performance 15 characteristics. Some memory may share the sa 14 characteristics. Some memory may share the same node as a CPU, and others 16 are provided as memory only nodes. While memor 15 are provided as memory only nodes. While memory only nodes do not provide 17 CPUs, they may still be local to one or more c 16 CPUs, they may still be local to one or more compute nodes relative to 18 other nodes. The following diagram shows one s 17 other nodes. The following diagram shows one such example of two compute 19 nodes with local memory and a memory only node 18 nodes with local memory and a memory only node for each of compute node:: 20 19 21 +------------------+ +------------------+ 20 +------------------+ +------------------+ 22 | Compute Node 0 +-----+ Compute Node 1 | 21 | Compute Node 0 +-----+ Compute Node 1 | 23 | Local Node0 Mem | | Local Node1 Mem | 22 | Local Node0 Mem | | Local Node1 Mem | 24 +--------+---------+ +--------+---------+ 23 +--------+---------+ +--------+---------+ 25 | | 24 | | 26 +--------+---------+ +--------+---------+ 25 +--------+---------+ +--------+---------+ 27 | Slower Node2 Mem | | Slower Node3 Mem | 26 | Slower Node2 Mem | | Slower Node3 Mem | 28 +------------------+ +--------+---------+ 27 +------------------+ +--------+---------+ 29 28 30 A "memory initiator" is a node containing one 29 A "memory initiator" is a node containing one or more devices such as 31 CPUs or separate memory I/O devices that can i 30 CPUs or separate memory I/O devices that can initiate memory requests. 32 A "memory target" is a node containing one or 31 A "memory target" is a node containing one or more physical address 33 ranges accessible from one or more memory init 32 ranges accessible from one or more memory initiators. 34 33 35 When multiple memory initiators exist, they ma 34 When multiple memory initiators exist, they may not all have the same 36 performance when accessing a given memory targ 35 performance when accessing a given memory target. Each initiator-target 37 pair may be organized into different ranked ac 36 pair may be organized into different ranked access classes to represent 38 this relationship. The highest performing init 37 this relationship. The highest performing initiator to a given target 39 is considered to be one of that target's local 38 is considered to be one of that target's local initiators, and given 40 the highest access class, 0. Any given target 39 the highest access class, 0. Any given target may have one or more 41 local initiators, and any given initiator may 40 local initiators, and any given initiator may have multiple local 42 memory targets. 41 memory targets. 43 42 44 To aid applications matching memory targets wi 43 To aid applications matching memory targets with their initiators, the 45 kernel provides symlinks to each other. The fo 44 kernel provides symlinks to each other. The following example lists the 46 relationship for the access class "0" memory i 45 relationship for the access class "0" memory initiators and targets:: 47 46 48 # symlinks -v /sys/devices/system/node 47 # symlinks -v /sys/devices/system/node/nodeX/access0/targets/ 49 relative: /sys/devices/system/node/nod 48 relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY 50 49 51 # symlinks -v /sys/devices/system/node 50 # symlinks -v /sys/devices/system/node/nodeY/access0/initiators/ 52 relative: /sys/devices/system/node/nod 51 relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX 53 52 54 A memory initiator may have multiple memory ta 53 A memory initiator may have multiple memory targets in the same access 55 class. The target memory's initiators in a giv 54 class. The target memory's initiators in a given class indicate the 56 nodes' access characteristics share the same p 55 nodes' access characteristics share the same performance relative to other 57 linked initiator nodes. Each target within an 56 linked initiator nodes. Each target within an initiator's access class, 58 though, do not necessarily perform the same as 57 though, do not necessarily perform the same as each other. 59 58 60 The access class "1" is used to allow differen 59 The access class "1" is used to allow differentiation between initiators 61 that are CPUs and hence suitable for generic t 60 that are CPUs and hence suitable for generic task scheduling, and 62 IO initiators such as GPUs and NICs. Unlike a 61 IO initiators such as GPUs and NICs. Unlike access class 0, only 63 nodes containing CPUs are considered. 62 nodes containing CPUs are considered. 64 63 >> 64 ================ 65 NUMA Performance 65 NUMA Performance 66 ================ 66 ================ 67 67 68 Applications may wish to consider which node t 68 Applications may wish to consider which node they want their memory to 69 be allocated from based on the node's performa 69 be allocated from based on the node's performance characteristics. If 70 the system provides these attributes, the kern 70 the system provides these attributes, the kernel exports them under the 71 node sysfs hierarchy by appending the attribut 71 node sysfs hierarchy by appending the attributes directory under the 72 memory node's access class 0 initiators as fol 72 memory node's access class 0 initiators as follows:: 73 73 74 /sys/devices/system/node/nodeY/access0 74 /sys/devices/system/node/nodeY/access0/initiators/ 75 75 76 These attributes apply only when accessed from 76 These attributes apply only when accessed from nodes that have the 77 are linked under the this access's initiators. !! 77 are linked under the this access's inititiators. 78 78 79 The performance characteristics the kernel pro 79 The performance characteristics the kernel provides for the local initiators 80 are exported are as follows:: 80 are exported are as follows:: 81 81 82 # tree -P "read*|write*" /sys/devices/ 82 # tree -P "read*|write*" /sys/devices/system/node/nodeY/access0/initiators/ 83 /sys/devices/system/node/nodeY/access0 83 /sys/devices/system/node/nodeY/access0/initiators/ 84 |-- read_bandwidth 84 |-- read_bandwidth 85 |-- read_latency 85 |-- read_latency 86 |-- write_bandwidth 86 |-- write_bandwidth 87 `-- write_latency 87 `-- write_latency 88 88 89 The bandwidth attributes are provided in MiB/s 89 The bandwidth attributes are provided in MiB/second. 90 90 91 The latency attributes are provided in nanosec 91 The latency attributes are provided in nanoseconds. 92 92 93 The values reported here correspond to the rat 93 The values reported here correspond to the rated latency and bandwidth 94 for the platform. 94 for the platform. 95 95 96 Access class 1 takes the same form but only in 96 Access class 1 takes the same form but only includes values for CPU to 97 memory activity. 97 memory activity. 98 98 >> 99 ========== 99 NUMA Cache 100 NUMA Cache 100 ========== 101 ========== 101 102 102 System memory may be constructed in a hierarch 103 System memory may be constructed in a hierarchy of elements with various 103 performance characteristics in order to provid 104 performance characteristics in order to provide large address space of 104 slower performing memory cached by a smaller h 105 slower performing memory cached by a smaller higher performing memory. The 105 system physical addresses memory initiators a 106 system physical addresses memory initiators are aware of are provided 106 by the last memory level in the hierarchy. The 107 by the last memory level in the hierarchy. The system meanwhile uses 107 higher performing memory to transparently cach 108 higher performing memory to transparently cache access to progressively 108 slower levels. 109 slower levels. 109 110 110 The term "far memory" is used to denote the la 111 The term "far memory" is used to denote the last level memory in the 111 hierarchy. Each increasing cache level provide 112 hierarchy. Each increasing cache level provides higher performing 112 initiator access, and the term "near memory" r 113 initiator access, and the term "near memory" represents the fastest 113 cache provided by the system. 114 cache provided by the system. 114 115 115 This numbering is different than CPU caches wh 116 This numbering is different than CPU caches where the cache level (ex: 116 L1, L2, L3) uses the CPU-side view where each 117 L1, L2, L3) uses the CPU-side view where each increased level is lower 117 performing. In contrast, the memory cache leve 118 performing. In contrast, the memory cache level is centric to the last 118 level memory, so the higher numbered cache lev 119 level memory, so the higher numbered cache level corresponds to memory 119 nearer to the CPU, and further from far memory 120 nearer to the CPU, and further from far memory. 120 121 121 The memory-side caches are not directly addres 122 The memory-side caches are not directly addressable by software. When 122 software accesses a system address, the system 123 software accesses a system address, the system will return it from the 123 near memory cache if it is present. If it is n 124 near memory cache if it is present. If it is not present, the system 124 accesses the next level of memory until there 125 accesses the next level of memory until there is either a hit in that 125 cache level, or it reaches far memory. 126 cache level, or it reaches far memory. 126 127 127 An application does not need to know about cac 128 An application does not need to know about caching attributes in order 128 to use the system. Software may optionally que 129 to use the system. Software may optionally query the memory cache 129 attributes in order to maximize the performanc 130 attributes in order to maximize the performance out of such a setup. 130 If the system provides a way for the kernel to 131 If the system provides a way for the kernel to discover this information, 131 for example with ACPI HMAT (Heterogeneous Memo 132 for example with ACPI HMAT (Heterogeneous Memory Attribute Table), 132 the kernel will append these attributes to the 133 the kernel will append these attributes to the NUMA node memory target. 133 134 134 When the kernel first registers a memory cache 135 When the kernel first registers a memory cache with a node, the kernel 135 will create the following directory:: 136 will create the following directory:: 136 137 137 /sys/devices/system/node/nodeX/memory_ 138 /sys/devices/system/node/nodeX/memory_side_cache/ 138 139 139 If that directory is not present, the system e 140 If that directory is not present, the system either does not provide 140 a memory-side cache, or that information is no 141 a memory-side cache, or that information is not accessible to the kernel. 141 142 142 The attributes for each level of cache is prov 143 The attributes for each level of cache is provided under its cache 143 level index:: 144 level index:: 144 145 145 /sys/devices/system/node/nodeX/memory_ 146 /sys/devices/system/node/nodeX/memory_side_cache/indexA/ 146 /sys/devices/system/node/nodeX/memory_ 147 /sys/devices/system/node/nodeX/memory_side_cache/indexB/ 147 /sys/devices/system/node/nodeX/memory_ 148 /sys/devices/system/node/nodeX/memory_side_cache/indexC/ 148 149 149 Each cache level's directory provides its attr 150 Each cache level's directory provides its attributes. For example, the 150 following shows a single cache level and the a 151 following shows a single cache level and the attributes available for 151 software to query:: 152 software to query:: 152 153 153 # tree /sys/devices/system/node/node0/ !! 154 # tree sys/devices/system/node/node0/memory_side_cache/ 154 /sys/devices/system/node/node0/memory_ 155 /sys/devices/system/node/node0/memory_side_cache/ 155 |-- index1 156 |-- index1 156 | |-- indexing 157 | |-- indexing 157 | |-- line_size 158 | |-- line_size 158 | |-- size 159 | |-- size 159 | `-- write_policy 160 | `-- write_policy 160 161 161 The "indexing" will be 0 if it is a direct-map 162 The "indexing" will be 0 if it is a direct-mapped cache, and non-zero 162 for any other indexed based, multi-way associa 163 for any other indexed based, multi-way associativity. 163 164 164 The "line_size" is the number of bytes accesse 165 The "line_size" is the number of bytes accessed from the next cache 165 level on a miss. 166 level on a miss. 166 167 167 The "size" is the number of bytes provided by 168 The "size" is the number of bytes provided by this cache level. 168 169 169 The "write_policy" will be 0 for write-back, a 170 The "write_policy" will be 0 for write-back, and non-zero for 170 write-through caching. 171 write-through caching. 171 172 >> 173 ======== 172 See Also 174 See Also 173 ======== 175 ======== 174 176 175 [1] https://www.uefi.org/sites/default/files/r 177 [1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf 176 - Section 5.2.27 178 - Section 5.2.27

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

TOMOYO Linux Cross Reference
Linux/Documentation/admin-guide/mm/numaperf.rst

Diff markup

Differences between /Documentation/admin-guide/mm/numaperf.rst (Version linux-6.12-rc7) and /Documentation/admin-guide/mm/numaperf.rst (Version linux-5.10.229)

TOMOYO Linux Cross Reference Linux/Documentation/admin-guide/mm/numaperf.rst

Diff markup

Differences between /Documentation/admin-guide/mm/numaperf.rst (Version linux-6.12-rc7) and /Documentation/admin-guide/mm/numaperf.rst (Version linux-5.10.229)

TOMOYO Linux Cross Reference
Linux/Documentation/admin-guide/mm/numaperf.rst