1 =========================================== 1 =========================================== 2 Automatically bind swap device to numa node 2 Automatically bind swap device to numa node 3 =========================================== 3 =========================================== 4 4 5 If the system has more than one swap device an 5 If the system has more than one swap device and swap device has the node 6 information, we can make use of this informati 6 information, we can make use of this information to decide which swap 7 device to use in get_swap_pages() to get bette 7 device to use in get_swap_pages() to get better performance. 8 8 9 9 10 How to use this feature 10 How to use this feature 11 ======================= 11 ======================= 12 12 13 Swap device has priority and that decides the 13 Swap device has priority and that decides the order of it to be used. To make 14 use of automatically binding, there is no need 14 use of automatically binding, there is no need to manipulate priority settings 15 for swap devices. e.g. on a 2 node machine, as 15 for swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and 16 swapB, with swapA attached to node 0 and swapB 16 swapB, with swapA attached to node 0 and swapB attached to node 1, are going 17 to be swapped on. Simply swapping them on by d 17 to be swapped on. Simply swapping them on by doing:: 18 18 19 # swapon /dev/swapA 19 # swapon /dev/swapA 20 # swapon /dev/swapB 20 # swapon /dev/swapB 21 21 22 Then node 0 will use the two swap devices in t 22 Then node 0 will use the two swap devices in the order of swapA then swapB and 23 node 1 will use the two swap devices in the or 23 node 1 will use the two swap devices in the order of swapB then swapA. Note 24 that the order of them being swapped on doesn' 24 that the order of them being swapped on doesn't matter. 25 25 26 A more complex example on a 4 node machine. As 26 A more complex example on a 4 node machine. Assume 6 swap devices are going to 27 be swapped on: swapA and swapB are attached to 27 be swapped on: swapA and swapB are attached to node 0, swapC is attached to 28 node 1, swapD and swapE are attached to node 2 28 node 1, swapD and swapE are attached to node 2 and swapF is attached to node3. 29 The way to swap them on is the same as above:: 29 The way to swap them on is the same as above:: 30 30 31 # swapon /dev/swapA 31 # swapon /dev/swapA 32 # swapon /dev/swapB 32 # swapon /dev/swapB 33 # swapon /dev/swapC 33 # swapon /dev/swapC 34 # swapon /dev/swapD 34 # swapon /dev/swapD 35 # swapon /dev/swapE 35 # swapon /dev/swapE 36 # swapon /dev/swapF 36 # swapon /dev/swapF 37 37 38 Then node 0 will use them in the order of:: 38 Then node 0 will use them in the order of:: 39 39 40 swapA/swapB -> swapC -> swapD -> swapE 40 swapA/swapB -> swapC -> swapD -> swapE -> swapF 41 41 42 swapA and swapB will be used in a round robin 42 swapA and swapB will be used in a round robin mode before any other swap device. 43 43 44 node 1 will use them in the order of:: 44 node 1 will use them in the order of:: 45 45 46 swapC -> swapA -> swapB -> swapD -> sw 46 swapC -> swapA -> swapB -> swapD -> swapE -> swapF 47 47 48 node 2 will use them in the order of:: 48 node 2 will use them in the order of:: 49 49 50 swapD/swapE -> swapA -> swapB -> swapC 50 swapD/swapE -> swapA -> swapB -> swapC -> swapF 51 51 52 Similaly, swapD and swapE will be used in a ro 52 Similaly, swapD and swapE will be used in a round robin mode before any 53 other swap devices. 53 other swap devices. 54 54 55 node 3 will use them in the order of:: 55 node 3 will use them in the order of:: 56 56 57 swapF -> swapA -> swapB -> swapC -> sw 57 swapF -> swapA -> swapB -> swapC -> swapD -> swapE 58 58 59 59 60 Implementation details 60 Implementation details 61 ====================== 61 ====================== 62 62 63 The current code uses a priority based list, s 63 The current code uses a priority based list, swap_avail_list, to decide 64 which swap device to use and if multiple swap 64 which swap device to use and if multiple swap devices share the same 65 priority, they are used round robin. This chan 65 priority, they are used round robin. This change here replaces the single 66 global swap_avail_list with a per-numa-node li 66 global swap_avail_list with a per-numa-node list, i.e. for each numa node, 67 it sees its own priority based list of availab 67 it sees its own priority based list of available swap devices. Swap 68 device's priority can be promoted on its match 68 device's priority can be promoted on its matching node's swap_avail_list. 69 69 70 The current swap device's priority is set as: 70 The current swap device's priority is set as: user can set a >=0 value, 71 or the system will pick one starting from -1 t 71 or the system will pick one starting from -1 then downwards. The priority 72 value in the swap_avail_list is the negated va 72 value in the swap_avail_list is the negated value of the swap device's 73 due to plist being sorted from low to high. Th 73 due to plist being sorted from low to high. The new policy doesn't change 74 the semantics for priority >=0 cases, the prev 74 the semantics for priority >=0 cases, the previous starting from -1 then 75 downwards now becomes starting from -2 then do 75 downwards now becomes starting from -2 then downwards and -1 is reserved 76 as the promoted value. So if multiple swap dev 76 as the promoted value. So if multiple swap devices are attached to the same 77 node, they will all be promoted to priority -1 77 node, they will all be promoted to priority -1 on that node's plist and will 78 be used round robin before any other swap devi 78 be used round robin before any other swap devices.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.