1 ============================ 2 Subsystem Trace Points: kmem 3 ============================ 4 5 The kmem tracing system captures events related to object and page allocation 6 within the kernel. Broadly speaking there are five major subheadings. 7 8 - Slab allocation of small objects of unknown type (kmalloc) 9 - Slab allocation of small objects of known type 10 - Page allocation 11 - Per-CPU Allocator Activity 12 - External Fragmentation 13 14 This document describes what each of the tracepoints is and why they 15 might be useful. 16 17 1. Slab allocation of small objects of unknown type 18 =================================================== 19 :: 20 21 kmalloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s 22 kmalloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d 23 kfree call_site=%lx ptr=%p 24 25 Heavy activity for these events may indicate that a specific cache is 26 justified, particularly if kmalloc slab pages are getting significantly 27 internal fragmented as a result of the allocation pattern. By correlating 28 kmalloc with kfree, it may be possible to identify memory leaks and where 29 the allocation sites were. 30 31 32 2. Slab allocation of small objects of known type 33 ================================================= 34 :: 35 36 kmem_cache_alloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s 37 kmem_cache_alloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d 38 kmem_cache_free call_site=%lx ptr=%p 39 40 These events are similar in usage to the kmalloc-related events except that 41 it is likely easier to pin the event down to a specific cache. At the time 42 of writing, no information is available on what slab is being allocated from, 43 but the call_site can usually be used to extrapolate that information. 44 45 3. Page allocation 46 ================== 47 :: 48 49 mm_page_alloc page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s 50 mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d 51 mm_page_free page=%p pfn=%lu order=%d 52 mm_page_free_batched page=%p pfn=%lu order=%d cold=%d 53 54 These four events deal with page allocation and freeing. mm_page_alloc is 55 a simple indicator of page allocator activity. Pages may be allocated from 56 the per-CPU allocator (high performance) or the buddy allocator. 57 58 If pages are allocated directly from the buddy allocator, the 59 mm_page_alloc_zone_locked event is triggered. This event is important as high 60 amounts of activity imply high activity on the zone->lock. Taking this lock 61 impairs performance by disabling interrupts, dirtying cache lines between 62 CPUs and serialising many CPUs. 63 64 When a page is freed directly by the caller, the only mm_page_free event 65 is triggered. Significant amounts of activity here could indicate that the 66 callers should be batching their activities. 67 68 When pages are freed in batch, the also mm_page_free_batched is triggered. 69 Broadly speaking, pages are taken off the LRU lock in bulk and 70 freed in batch with a page list. Significant amounts of activity here could 71 indicate that the system is under memory pressure and can also indicate 72 contention on the lruvec->lru_lock. 73 74 4. Per-CPU Allocator Activity 75 ============================= 76 :: 77 78 mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d 79 mm_page_pcpu_drain page=%p pfn=%lu order=%d cpu=%d migratetype=%d 80 81 In front of the page allocator is a per-cpu page allocator. It exists only 82 for order-0 pages, reduces contention on the zone->lock and reduces the 83 amount of writing on struct page. 84 85 When a per-CPU list is empty or pages of the wrong type are allocated, 86 the zone->lock will be taken once and the per-CPU list refilled. The event 87 triggered is mm_page_alloc_zone_locked for each page allocated with the 88 event indicating whether it is for a percpu_refill or not. 89 90 When the per-CPU list is too full, a number of pages are freed, each one 91 which triggers a mm_page_pcpu_drain event. 92 93 The individual nature of the events is so that pages can be tracked 94 between allocation and freeing. A number of drain or refill pages that occur 95 consecutively imply the zone->lock being taken once. Large amounts of per-CPU 96 refills and drains could imply an imbalance between CPUs where too much work 97 is being concentrated in one place. It could also indicate that the per-CPU 98 lists should be a larger size. Finally, large amounts of refills on one CPU 99 and drains on another could be a factor in causing large amounts of cache 100 line bounces due to writes between CPUs and worth investigating if pages 101 can be allocated and freed on the same CPU through some algorithm change. 102 103 5. External Fragmentation 104 ========================= 105 :: 106 107 mm_page_alloc_extfrag page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d 108 109 External fragmentation affects whether a high-order allocation will be 110 successful or not. For some types of hardware, this is important although 111 it is avoided where possible. If the system is using huge pages and needs 112 to be able to resize the pool over the lifetime of the system, this value 113 is important. 114 115 Large numbers of this event implies that memory is fragmenting and 116 high-order allocations will start failing at some time in the future. One 117 means of reducing the occurrence of this event is to increase the size of 118 min_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where 119 pageblock_size is usually the size of the default hugepage size.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.