~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/driver-api/nvdimm/nvdimm.rst

Version: ~ [ linux-6.12-rc7 ] ~ [ linux-6.11.7 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.60 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.116 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.171 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.229 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.285 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.323 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.12 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

Diff markup

Differences between /Documentation/driver-api/nvdimm/nvdimm.rst (Version linux-6.12-rc7) and /Documentation/driver-api/nvdimm/nvdimm.rst (Version linux-5.3.18)


  1 ===============================                     1 ===============================
  2 LIBNVDIMM: Non-Volatile Devices                     2 LIBNVDIMM: Non-Volatile Devices
  3 ===============================                     3 ===============================
  4                                                     4 
  5 libnvdimm - kernel / libndctl - userspace help      5 libnvdimm - kernel / libndctl - userspace helper library
  6                                                     6 
  7 nvdimm@lists.linux.dev                         !!   7 linux-nvdimm@lists.01.org
  8                                                     8 
  9 Version 13                                          9 Version 13
 10                                                    10 
 11 .. contents:                                       11 .. contents:
 12                                                    12 
 13         Glossary                                   13         Glossary
 14         Overview                                   14         Overview
 15             Supporting Documents                   15             Supporting Documents
 16             Git Trees                              16             Git Trees
 17         LIBNVDIMM PMEM                         !!  17         LIBNVDIMM PMEM and BLK
 18             PMEM-REGIONs, Atomic Sectors, and  !!  18         Why BLK?
                                                   >>  19             PMEM vs BLK
                                                   >>  20                 BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
 19         Example NVDIMM Platform                    21         Example NVDIMM Platform
 20         LIBNVDIMM Kernel Device Model and LIBN     22         LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
 21             LIBNDCTL: Context                      23             LIBNDCTL: Context
 22                 libndctl: instantiate a new li     24                 libndctl: instantiate a new library context example
 23             LIBNVDIMM/LIBNDCTL: Bus                25             LIBNVDIMM/LIBNDCTL: Bus
 24                 libnvdimm: control class devic     26                 libnvdimm: control class device in /sys/class
 25                 libnvdimm: bus                     27                 libnvdimm: bus
 26                 libndctl: bus enumeration exam     28                 libndctl: bus enumeration example
 27             LIBNVDIMM/LIBNDCTL: DIMM (NMEM)        29             LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
 28                 libnvdimm: DIMM (NMEM)             30                 libnvdimm: DIMM (NMEM)
 29                 libndctl: DIMM enumeration exa     31                 libndctl: DIMM enumeration example
 30             LIBNVDIMM/LIBNDCTL: Region             32             LIBNVDIMM/LIBNDCTL: Region
 31                 libnvdimm: region                  33                 libnvdimm: region
 32                 libndctl: region enumeration e     34                 libndctl: region enumeration example
 33                 Why Not Encode the Region Type     35                 Why Not Encode the Region Type into the Region Name?
 34                 How Do I Determine the Major T     36                 How Do I Determine the Major Type of a Region?
 35             LIBNVDIMM/LIBNDCTL: Namespace          37             LIBNVDIMM/LIBNDCTL: Namespace
 36                 libnvdimm: namespace               38                 libnvdimm: namespace
 37                 libndctl: namespace enumeratio     39                 libndctl: namespace enumeration example
 38                 libndctl: namespace creation e     40                 libndctl: namespace creation example
 39                 Why the Term "namespace"?          41                 Why the Term "namespace"?
 40             LIBNVDIMM/LIBNDCTL: Block Translat     42             LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
 41                 libnvdimm: btt layout              43                 libnvdimm: btt layout
 42                 libndctl: btt creation example     44                 libndctl: btt creation example
 43         Summary LIBNDCTL Diagram                   45         Summary LIBNDCTL Diagram
 44                                                    46 
 45                                                    47 
 46 Glossary                                           48 Glossary
 47 ========                                           49 ========
 48                                                    50 
 49 PMEM:                                              51 PMEM:
 50   A system-physical-address range where writes     52   A system-physical-address range where writes are persistent.  A
 51   block device composed of PMEM is capable of      53   block device composed of PMEM is capable of DAX.  A PMEM address range
 52   may span an interleave of several DIMMs.         54   may span an interleave of several DIMMs.
 53                                                    55 
                                                   >>  56 BLK:
                                                   >>  57   A set of one or more programmable memory mapped apertures provided
                                                   >>  58   by a DIMM to access its media.  This indirection precludes the
                                                   >>  59   performance benefit of interleaving, but enables DIMM-bounded failure
                                                   >>  60   modes.
                                                   >>  61 
 54 DPA:                                               62 DPA:
 55   DIMM Physical Address, is a DIMM-relative of     63   DIMM Physical Address, is a DIMM-relative offset.  With one DIMM in
 56   the system there would be a 1:1 system-physi     64   the system there would be a 1:1 system-physical-address:DPA association.
 57   Once more DIMMs are added a memory controlle     65   Once more DIMMs are added a memory controller interleave must be
 58   decoded to determine the DPA associated with     66   decoded to determine the DPA associated with a given
 59   system-physical-address.                     !!  67   system-physical-address.  BLK capacity always has a 1:1 relationship
                                                   >>  68   with a single-DIMM's DPA range.
 60                                                    69 
 61 DAX:                                               70 DAX:
 62   File system extensions to bypass the page ca     71   File system extensions to bypass the page cache and block layer to
 63   mmap persistent memory, from a PMEM block de     72   mmap persistent memory, from a PMEM block device, directly into a
 64   process address space.                           73   process address space.
 65                                                    74 
 66 DSM:                                               75 DSM:
 67   Device Specific Method: ACPI method to contr !!  76   Device Specific Method: ACPI method to to control specific
 68   device - in this case the firmware.              77   device - in this case the firmware.
 69                                                    78 
 70 DCR:                                               79 DCR:
 71   NVDIMM Control Region Structure defined in A     80   NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5.
 72   It defines a vendor-id, device-id, and inter     81   It defines a vendor-id, device-id, and interface format for a given DIMM.
 73                                                    82 
 74 BTT:                                               83 BTT:
 75   Block Translation Table: Persistent memory i     84   Block Translation Table: Persistent memory is byte addressable.
 76   Existing software may have an expectation th     85   Existing software may have an expectation that the power-fail-atomicity
 77   of writes is at least one sector, 512 bytes.     86   of writes is at least one sector, 512 bytes.  The BTT is an indirection
 78   table with atomic update semantics to front  !!  87   table with atomic update semantics to front a PMEM/BLK block device
 79   driver and present arbitrary atomic sector s     88   driver and present arbitrary atomic sector sizes.
 80                                                    89 
 81 LABEL:                                             90 LABEL:
 82   Metadata stored on a DIMM device that partit     91   Metadata stored on a DIMM device that partitions and identifies
 83   (persistently names) capacity allocated to d !!  92   (persistently names) storage between PMEM and BLK.  It also partitions
 84   also indicates whether an address abstractio !!  93   BLK storage to host BTTs with different parameters per BLK-partition.
 85   the namespace.  Note that traditional partit !!  94   Note that traditional partition tables, GPT/MBR, are layered on top of a
 86   layered on top of a PMEM namespace, or an ad !!  95   BLK or PMEM device.
 87   if present, but partition support is depreca << 
 88                                                    96 
 89                                                    97 
 90 Overview                                           98 Overview
 91 ========                                           99 ========
 92                                                   100 
 93 The LIBNVDIMM subsystem provides support for P !! 101 The LIBNVDIMM subsystem provides support for three types of NVDIMMs, namely,
 94 firmware or a device driver. On ACPI based sys !! 102 PMEM, BLK, and NVDIMM devices that can simultaneously support both PMEM
 95 conveys persistent memory resource via the ACP !! 103 and BLK mode access.  These three modes of operation are described by
 96 Interface Table" in ACPI 6. While the LIBNVDIM !! 104 the "NVDIMM Firmware Interface Table" (NFIT) in ACPI 6.  While the LIBNVDIMM
 97 is generic and supports pre-NFIT platforms, it !! 105 implementation is generic and supports pre-NFIT platforms, it was guided
 98 superset of capabilities need to support this  !! 106 by the superset of capabilities need to support this ACPI 6 definition
 99 NVDIMM resources. The original implementation  !! 107 for NVDIMM resources.  The bulk of the kernel implementation is in place
100 block-window-aperture capability described in  !! 108 to handle the case where DPA accessible via PMEM is aliased with DPA
101 has since been abandoned and never shipped in  !! 109 accessible via BLK.  When that occurs a LABEL is needed to reserve DPA
                                                   >> 110 for exclusive access via one mode a time.
102                                                   111 
103 Supporting Documents                              112 Supporting Documents
104 --------------------                              113 --------------------
105                                                   114 
106 ACPI 6:                                           115 ACPI 6:
107         https://www.uefi.org/sites/default/fil !! 116         http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
108 NVDIMM Namespace:                                 117 NVDIMM Namespace:
109         https://pmem.io/documents/NVDIMM_Names !! 118         http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
110 DSM Interface Example:                            119 DSM Interface Example:
111         https://pmem.io/documents/NVDIMM_DSM_I !! 120         http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
112 Driver Writer's Guide:                            121 Driver Writer's Guide:
113         https://pmem.io/documents/NVDIMM_Drive !! 122         http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
114                                                   123 
115 Git Trees                                         124 Git Trees
116 ---------                                         125 ---------
117                                                   126 
118 LIBNVDIMM:                                        127 LIBNVDIMM:
119         https://git.kernel.org/cgit/linux/kern !! 128         https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git
120 LIBNDCTL:                                         129 LIBNDCTL:
121         https://github.com/pmem/ndctl.git         130         https://github.com/pmem/ndctl.git
                                                   >> 131 PMEM:
                                                   >> 132         https://github.com/01org/prd
122                                                   133 
123                                                   134 
124 LIBNVDIMM PMEM                                 !! 135 LIBNVDIMM PMEM and BLK
125 ==============                                 !! 136 ======================
126                                                   137 
127 Prior to the arrival of the NFIT, non-volatile    138 Prior to the arrival of the NFIT, non-volatile memory was described to a
128 system in various ad-hoc ways.  Usually only t    139 system in various ad-hoc ways.  Usually only the bare minimum was
129 provided, namely, a single system-physical-add    140 provided, namely, a single system-physical-address range where writes
130 are expected to be durable after a system powe    141 are expected to be durable after a system power loss.  Now, the NFIT
131 specification standardizes not only the descri    142 specification standardizes not only the description of PMEM, but also
132 platform message-passing entry points for cont !! 143 BLK and platform message-passing entry points for control and
                                                   >> 144 configuration.
                                                   >> 145 
                                                   >> 146 For each NVDIMM access method (PMEM, BLK), LIBNVDIMM provides a block
                                                   >> 147 device driver:
133                                                   148 
134 PMEM (nd_pmem.ko): Drives a system-physical-ad !! 149     1. PMEM (nd_pmem.ko): Drives a system-physical-address range.  This
135 contiguous in system memory and may be interle !! 150        range is contiguous in system memory and may be interleaved (hardware
136 striped) across multiple DIMMs.  When interlea !! 151        memory controller striped) across multiple DIMMs.  When interleaved the
137 provide details of which DIMMs are participati !! 152        platform may optionally provide details of which DIMMs are participating
138                                                !! 153        in the interleave.
139 It is worth noting that when the labeling capa !! 154 
140 namespace label index block is found), then no !! 155        Note that while LIBNVDIMM describes system-physical-address ranges that may
141 by default as userspace needs to do at least o !! 156        alias with BLK access as ND_NAMESPACE_PMEM ranges and those without
142 the PMEM range.  In contrast ND_NAMESPACE_IO r !! 157        alias as ND_NAMESPACE_IO ranges, to the nd_pmem driver there is no
143 can be immediately attached to nd_pmem. This l !! 158        distinction.  The different device-types are an implementation detail
144 label-less or "legacy".                        !! 159        that userspace can exploit to implement policies like "only interface
                                                   >> 160        with address ranges from certain DIMMs".  It is worth noting that when
                                                   >> 161        aliasing is present and a DIMM lacks a label, then no block device can
                                                   >> 162        be created by default as userspace needs to do at least one allocation
                                                   >> 163        of DPA to the PMEM range.  In contrast ND_NAMESPACE_IO ranges, once
                                                   >> 164        registered, can be immediately attached to nd_pmem.
                                                   >> 165 
                                                   >> 166     2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
                                                   >> 167        defined apertures.  A set of apertures will access just one DIMM.
                                                   >> 168        Multiple windows (apertures) allow multiple concurrent accesses, much like
                                                   >> 169        tagged-command-queuing, and would likely be used by different threads or
                                                   >> 170        different CPUs.
                                                   >> 171 
                                                   >> 172        The NFIT specification defines a standard format for a BLK-aperture, but
                                                   >> 173        the spec also allows for vendor specific layouts, and non-NFIT BLK
                                                   >> 174        implementations may have other designs for BLK I/O.  For this reason
                                                   >> 175        "nd_blk" calls back into platform-specific code to perform the I/O.
145                                                   176 
146 PMEM-REGIONs, Atomic Sectors, and DAX          !! 177        One such implementation is defined in the "Driver Writer's Guide" and "DSM
147 -------------------------------------          !! 178        Interface Example".
                                                   >> 179 
                                                   >> 180 
                                                   >> 181 Why BLK?
                                                   >> 182 ========
148                                                   183 
149 For the cases where an application or filesyst !! 184 While PMEM provides direct byte-addressable CPU-load/store access to
150 update guarantees it can register a BTT on a P !! 185 NVDIMM storage, it does not provide the best system RAS (recovery,
                                                   >> 186 availability, and serviceability) model.  An access to a corrupted
                                                   >> 187 system-physical-address address causes a CPU exception while an access
                                                   >> 188 to a corrupted address through an BLK-aperture causes that block window
                                                   >> 189 to raise an error status in a register.  The latter is more aligned with
                                                   >> 190 the standard error model that host-bus-adapter attached disks present.
                                                   >> 191 
                                                   >> 192 Also, if an administrator ever wants to replace a memory it is easier to
                                                   >> 193 service a system at DIMM module boundaries.  Compare this to PMEM where
                                                   >> 194 data could be interleaved in an opaque hardware specific manner across
                                                   >> 195 several DIMMs.
                                                   >> 196 
                                                   >> 197 PMEM vs BLK
                                                   >> 198 -----------
                                                   >> 199 
                                                   >> 200 BLK-apertures solve these RAS problems, but their presence is also the
                                                   >> 201 major contributing factor to the complexity of the ND subsystem.  They
                                                   >> 202 complicate the implementation because PMEM and BLK alias in DPA space.
                                                   >> 203 Any given DIMM's DPA-range may contribute to one or more
                                                   >> 204 system-physical-address sets of interleaved DIMMs, *and* may also be
                                                   >> 205 accessed in its entirety through its BLK-aperture.  Accessing a DPA
                                                   >> 206 through a system-physical-address while simultaneously accessing the
                                                   >> 207 same DPA through a BLK-aperture has undefined results.  For this reason,
                                                   >> 208 DIMMs with this dual interface configuration include a DSM function to
                                                   >> 209 store/retrieve a LABEL.  The LABEL effectively partitions the DPA-space
                                                   >> 210 into exclusive system-physical-address and BLK-aperture accessible
                                                   >> 211 regions.  For simplicity a DIMM is allowed a PMEM "region" per each
                                                   >> 212 interleave set in which it is a member.  The remaining DPA space can be
                                                   >> 213 carved into an arbitrary number of BLK devices with discontiguous
                                                   >> 214 extents.
                                                   >> 215 
                                                   >> 216 BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
                                                   >> 217 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                   >> 218 
                                                   >> 219 One of the few
                                                   >> 220 reasons to allow multiple BLK namespaces per REGION is so that each
                                                   >> 221 BLK-namespace can be configured with a BTT with unique atomic sector
                                                   >> 222 sizes.  While a PMEM device can host a BTT the LABEL specification does
                                                   >> 223 not provide for a sector size to be specified for a PMEM namespace.
                                                   >> 224 
                                                   >> 225 This is due to the expectation that the primary usage model for PMEM is
                                                   >> 226 via DAX, and the BTT is incompatible with DAX.  However, for the cases
                                                   >> 227 where an application or filesystem still needs atomic sector update
                                                   >> 228 guarantees it can register a BTT on a PMEM device or partition.  See
151 LIBNVDIMM/NDCTL: Block Translation Table "btt"    229 LIBNVDIMM/NDCTL: Block Translation Table "btt"
152                                                   230 
153                                                   231 
154 Example NVDIMM Platform                           232 Example NVDIMM Platform
155 =======================                           233 =======================
156                                                   234 
157 For the remainder of this document the followi    235 For the remainder of this document the following diagram will be
158 referenced for any example sysfs layouts::        236 referenced for any example sysfs layouts::
159                                                   237 
160                                                   238 
161                                (a)             !! 239                                (a)               (b)           DIMM   BLK-REGION
162             +-------------------+--------+----    240             +-------------------+--------+--------+--------+
163   +------+  |       pm0.0       |  free  | pm1 !! 241   +------+  |       pm0.0       | blk2.0 | pm1.0  | blk2.1 |    0      region2
164   | imc0 +--+- - - region0- - - +--------+        242   | imc0 +--+- - - region0- - - +--------+        +--------+
165   +--+---+  |       pm0.0       |  free  | pm1 !! 243   +--+---+  |       pm0.0       | blk3.0 | pm1.0  | blk3.1 |    1      region3
166      |      +-------------------+--------v        244      |      +-------------------+--------v        v--------+
167   +--+---+                               |        245   +--+---+                               |                 |
168   | cpu0 |                                        246   | cpu0 |                                     region1
169   +--+---+                               |        247   +--+---+                               |                 |
170      |      +----------------------------^        248      |      +----------------------------^        ^--------+
171   +--+---+  |           free             | pm1 !! 249   +--+---+  |           blk4.0           | pm1.0  | blk4.0 |    2      region4
172   | imc1 +--+----------------------------|        250   | imc1 +--+----------------------------|        +--------+
173   +------+  |           free             | pm1 !! 251   +------+  |           blk5.0           | pm1.0  | blk5.0 |    3      region5
174             +----------------------------+----    252             +----------------------------+--------+--------+
175                                                   253 
176 In this platform we have four DIMMs and two me    254 In this platform we have four DIMMs and two memory controllers in one
177 socket.  Each PMEM interleave set is identifie !! 255 socket.  Each unique interface (BLK or PMEM) to DPA space is identified
178 a dynamically assigned id.                     !! 256 by a region device with a dynamically assigned id (REGION0 - REGION5).
179                                                   257 
180     1. The first portion of DIMM0 and DIMM1 ar    258     1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
181        single PMEM namespace is created in the    259        single PMEM namespace is created in the REGION0-SPA-range that spans most
182        of DIMM0 and DIMM1 with a user-specifie    260        of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
183        interleaved system-physical-address ran !! 261        interleaved system-physical-address range is reclaimed as BLK-aperture
184        another PMEM namespace to be defined.   !! 262        accessed space starting at DPA-offset (a) into each DIMM.  In that
                                                   >> 263        reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
                                                   >> 264        REGION3 where "blk2.0" and "blk3.0" are just human readable names that
                                                   >> 265        could be set to any user-desired name in the LABEL.
185                                                   266 
186     2. In the last portion of DIMM0 and DIMM1     267     2. In the last portion of DIMM0 and DIMM1 we have an interleaved
187        system-physical-address range, REGION1,    268        system-physical-address range, REGION1, that spans those two DIMMs as
188        well as DIMM2 and DIMM3.  Some of REGIO    269        well as DIMM2 and DIMM3.  Some of REGION1 is allocated to a PMEM namespace
189        named "pm1.0".                          !! 270        named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for
                                                   >> 271        each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
                                                   >> 272        "blk5.0".
                                                   >> 273 
                                                   >> 274     3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
                                                   >> 275        interleaved system-physical-address range (i.e. the DPA address past
                                                   >> 276        offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.
                                                   >> 277        Note, that this example shows that BLK-aperture namespaces don't need to
                                                   >> 278        be contiguous in DPA-space.
190                                                   279 
191     This bus is provided by the kernel under t    280     This bus is provided by the kernel under the device
192     /sys/devices/platform/nfit_test.0 when the !! 281     /sys/devices/platform/nfit_test.0 when CONFIG_NFIT_TEST is enabled and
193     tools/testing/nvdimm is loaded. This modul !! 282     the nfit_test.ko module is loaded.  This not only test LIBNVDIMM but the
194     LIBNVDIMM and the  acpi_nfit.ko driver.    !! 283     acpi_nfit.ko driver as well.
195                                                   284 
196                                                   285 
197 LIBNVDIMM Kernel Device Model and LIBNDCTL Use    286 LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
198 ==============================================    287 ========================================================
199                                                   288 
200 What follows is a description of the LIBNVDIMM    289 What follows is a description of the LIBNVDIMM sysfs layout and a
201 corresponding object hierarchy diagram as view    290 corresponding object hierarchy diagram as viewed through the LIBNDCTL
202 API.  The example sysfs paths and diagrams are    291 API.  The example sysfs paths and diagrams are relative to the Example
203 NVDIMM Platform which is also the LIBNVDIMM bu    292 NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit
204 test.                                             293 test.
205                                                   294 
206 LIBNDCTL: Context                                 295 LIBNDCTL: Context
207 -----------------                                 296 -----------------
208                                                   297 
209 Every API call in the LIBNDCTL library require    298 Every API call in the LIBNDCTL library requires a context that holds the
210 logging parameters and other library instance     299 logging parameters and other library instance state.  The library is
211 based on the libabc template:                     300 based on the libabc template:
212                                                   301 
213         https://git.kernel.org/cgit/linux/kern    302         https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git
214                                                   303 
215 LIBNDCTL: instantiate a new library context ex    304 LIBNDCTL: instantiate a new library context example
216 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    305 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
217                                                   306 
218 ::                                                307 ::
219                                                   308 
220         struct ndctl_ctx *ctx;                    309         struct ndctl_ctx *ctx;
221                                                   310 
222         if (ndctl_new(&ctx) == 0)                 311         if (ndctl_new(&ctx) == 0)
223                 return ctx;                       312                 return ctx;
224         else                                      313         else
225                 return NULL;                      314                 return NULL;
226                                                   315 
227 LIBNVDIMM/LIBNDCTL: Bus                           316 LIBNVDIMM/LIBNDCTL: Bus
228 -----------------------                           317 -----------------------
229                                                   318 
230 A bus has a 1:1 relationship with an NFIT.  Th    319 A bus has a 1:1 relationship with an NFIT.  The current expectation for
231 ACPI based systems is that there is only ever     320 ACPI based systems is that there is only ever one platform-global NFIT.
232 That said, it is trivial to register multiple     321 That said, it is trivial to register multiple NFITs, the specification
233 does not preclude it.  The infrastructure supp    322 does not preclude it.  The infrastructure supports multiple busses and
234 we use this capability to test multiple NFIT c    323 we use this capability to test multiple NFIT configurations in the unit
235 test.                                             324 test.
236                                                   325 
237 LIBNVDIMM: control class device in /sys/class     326 LIBNVDIMM: control class device in /sys/class
238 ---------------------------------------------     327 ---------------------------------------------
239                                                   328 
240 This character device accepts DSM messages to     329 This character device accepts DSM messages to be passed to DIMM
241 identified by its NFIT handle::                   330 identified by its NFIT handle::
242                                                   331 
243         /sys/class/nd/ndctl0                      332         /sys/class/nd/ndctl0
244         |-- dev                                   333         |-- dev
245         |-- device -> ../../../ndbus0             334         |-- device -> ../../../ndbus0
246         |-- subsystem -> ../../../../../../../    335         |-- subsystem -> ../../../../../../../class/nd
247                                                   336 
248                                                   337 
249                                                   338 
250 LIBNVDIMM: bus                                    339 LIBNVDIMM: bus
251 --------------                                    340 --------------
252                                                   341 
253 ::                                                342 ::
254                                                   343 
255         struct nvdimm_bus *nvdimm_bus_register    344         struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
256                struct nvdimm_bus_descriptor *n    345                struct nvdimm_bus_descriptor *nfit_desc);
257                                                   346 
258 ::                                                347 ::
259                                                   348 
260         /sys/devices/platform/nfit_test.0/ndbu    349         /sys/devices/platform/nfit_test.0/ndbus0
261         |-- commands                              350         |-- commands
262         |-- nd                                    351         |-- nd
263         |-- nfit                                  352         |-- nfit
264         |-- nmem0                                 353         |-- nmem0
265         |-- nmem1                                 354         |-- nmem1
266         |-- nmem2                                 355         |-- nmem2
267         |-- nmem3                                 356         |-- nmem3
268         |-- power                                 357         |-- power
269         |-- provider                              358         |-- provider
270         |-- region0                               359         |-- region0
271         |-- region1                               360         |-- region1
272         |-- region2                               361         |-- region2
273         |-- region3                               362         |-- region3
274         |-- region4                               363         |-- region4
275         |-- region5                               364         |-- region5
276         |-- uevent                                365         |-- uevent
277         `-- wait_probe                            366         `-- wait_probe
278                                                   367 
279 LIBNDCTL: bus enumeration example                 368 LIBNDCTL: bus enumeration example
280 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                 369 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
281                                                   370 
282 Find the bus handle that describes the bus fro    371 Find the bus handle that describes the bus from Example NVDIMM Platform::
283                                                   372 
284         static struct ndctl_bus *get_bus_by_pr    373         static struct ndctl_bus *get_bus_by_provider(struct ndctl_ctx *ctx,
285                         const char *provider)     374                         const char *provider)
286         {                                         375         {
287                 struct ndctl_bus *bus;            376                 struct ndctl_bus *bus;
288                                                   377 
289                 ndctl_bus_foreach(ctx, bus)       378                 ndctl_bus_foreach(ctx, bus)
290                         if (strcmp(provider, n    379                         if (strcmp(provider, ndctl_bus_get_provider(bus)) == 0)
291                                 return bus;       380                                 return bus;
292                                                   381 
293                 return NULL;                      382                 return NULL;
294         }                                         383         }
295                                                   384 
296         bus = get_bus_by_provider(ctx, "nfit_t    385         bus = get_bus_by_provider(ctx, "nfit_test.0");
297                                                   386 
298                                                   387 
299 LIBNVDIMM/LIBNDCTL: DIMM (NMEM)                   388 LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
300 -------------------------------                   389 -------------------------------
301                                                   390 
302 The DIMM device provides a character device fo    391 The DIMM device provides a character device for sending commands to
303 hardware, and it is a container for LABELs.  I    392 hardware, and it is a container for LABELs.  If the DIMM is defined by
304 NFIT then an optional 'nfit' attribute sub-dir    393 NFIT then an optional 'nfit' attribute sub-directory is available to add
305 NFIT-specifics.                                   394 NFIT-specifics.
306                                                   395 
307 Note that the kernel device name for "DIMMs" i    396 Note that the kernel device name for "DIMMs" is "nmemX".  The NFIT
308 describes these devices via "Memory Device to     397 describes these devices via "Memory Device to System Physical Address
309 Range Mapping Structure", and there is no requ    398 Range Mapping Structure", and there is no requirement that they actually
310 be physical DIMMs, so we use a more generic na    399 be physical DIMMs, so we use a more generic name.
311                                                   400 
312 LIBNVDIMM: DIMM (NMEM)                            401 LIBNVDIMM: DIMM (NMEM)
313 ^^^^^^^^^^^^^^^^^^^^^^                            402 ^^^^^^^^^^^^^^^^^^^^^^
314                                                   403 
315 ::                                                404 ::
316                                                   405 
317         struct nvdimm *nvdimm_create(struct nv    406         struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
318                         const struct attribute    407                         const struct attribute_group **groups, unsigned long flags,
319                         unsigned long *dsm_mas    408                         unsigned long *dsm_mask);
320                                                   409 
321 ::                                                410 ::
322                                                   411 
323         /sys/devices/platform/nfit_test.0/ndbu    412         /sys/devices/platform/nfit_test.0/ndbus0
324         |-- nmem0                                 413         |-- nmem0
325         |   |-- available_slots                   414         |   |-- available_slots
326         |   |-- commands                          415         |   |-- commands
327         |   |-- dev                               416         |   |-- dev
328         |   |-- devtype                           417         |   |-- devtype
329         |   |-- driver -> ../../../../../bus/n    418         |   |-- driver -> ../../../../../bus/nd/drivers/nvdimm
330         |   |-- modalias                          419         |   |-- modalias
331         |   |-- nfit                              420         |   |-- nfit
332         |   |   |-- device                        421         |   |   |-- device
333         |   |   |-- format                        422         |   |   |-- format
334         |   |   |-- handle                        423         |   |   |-- handle
335         |   |   |-- phys_id                       424         |   |   |-- phys_id
336         |   |   |-- rev_id                        425         |   |   |-- rev_id
337         |   |   |-- serial                        426         |   |   |-- serial
338         |   |   `-- vendor                        427         |   |   `-- vendor
339         |   |-- state                             428         |   |-- state
340         |   |-- subsystem -> ../../../../../bu    429         |   |-- subsystem -> ../../../../../bus/nd
341         |   `-- uevent                            430         |   `-- uevent
342         |-- nmem1                                 431         |-- nmem1
343         [..]                                      432         [..]
344                                                   433 
345                                                   434 
346 LIBNDCTL: DIMM enumeration example                435 LIBNDCTL: DIMM enumeration example
347 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                436 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
348                                                   437 
349 Note, in this example we are assuming NFIT-def    438 Note, in this example we are assuming NFIT-defined DIMMs which are
350 identified by an "nfit_handle" a 32-bit value     439 identified by an "nfit_handle" a 32-bit value where:
351                                                   440 
352    - Bit 3:0 DIMM number within the memory cha    441    - Bit 3:0 DIMM number within the memory channel
353    - Bit 7:4 memory channel number                442    - Bit 7:4 memory channel number
354    - Bit 11:8 memory controller ID                443    - Bit 11:8 memory controller ID
355    - Bit 15:12 socket ID (within scope of a No    444    - Bit 15:12 socket ID (within scope of a Node controller if node
356      controller is present)                       445      controller is present)
357    - Bit 27:16 Node Controller ID                 446    - Bit 27:16 Node Controller ID
358    - Bit 31:28 Reserved                           447    - Bit 31:28 Reserved
359                                                   448 
360 ::                                                449 ::
361                                                   450 
362         static struct ndctl_dimm *get_dimm_by_    451         static struct ndctl_dimm *get_dimm_by_handle(struct ndctl_bus *bus,
363                unsigned int handle)               452                unsigned int handle)
364         {                                         453         {
365                 struct ndctl_dimm *dimm;          454                 struct ndctl_dimm *dimm;
366                                                   455 
367                 ndctl_dimm_foreach(bus, dimm)     456                 ndctl_dimm_foreach(bus, dimm)
368                         if (ndctl_dimm_get_han    457                         if (ndctl_dimm_get_handle(dimm) == handle)
369                                 return dimm;      458                                 return dimm;
370                                                   459 
371                 return NULL;                      460                 return NULL;
372         }                                         461         }
373                                                   462 
374         #define DIMM_HANDLE(n, s, i, c, d) \      463         #define DIMM_HANDLE(n, s, i, c, d) \
375                 (((n & 0xfff) << 16) | ((s & 0    464                 (((n & 0xfff) << 16) | ((s & 0xf) << 12) | ((i & 0xf) << 8) \
376                  | ((c & 0xf) << 4) | (d & 0xf    465                  | ((c & 0xf) << 4) | (d & 0xf))
377                                                   466 
378         dimm = get_dimm_by_handle(bus, DIMM_HA    467         dimm = get_dimm_by_handle(bus, DIMM_HANDLE(0, 0, 0, 0, 0));
379                                                   468 
380 LIBNVDIMM/LIBNDCTL: Region                        469 LIBNVDIMM/LIBNDCTL: Region
381 --------------------------                        470 --------------------------
382                                                   471 
383 A generic REGION device is registered for each !! 472 A generic REGION device is registered for each PMEM range or BLK-aperture
384 range. Per the example there are 2 PMEM region !! 473 set.  Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
385 bus. The primary role of regions are to be a c !! 474 sets on the "nfit_test.0" bus.  The primary role of regions are to be a
386 mapping is a tuple of <DIMM, DPA-start-offset, !! 475 container of "mappings".  A mapping is a tuple of <DIMM,
387                                                !! 476 DPA-start-offset, length>.
388 LIBNVDIMM provides a built-in driver for REGIO !! 477 
389 is responsible for all parsing LABELs, if pres !! 478 LIBNVDIMM provides a built-in driver for these REGION devices.  This driver
390 devices for the nd_pmem driver to consume.     !! 479 is responsible for reconciling the aliased DPA mappings across all
                                                   >> 480 regions, parsing the LABEL, if present, and then emitting NAMESPACE
                                                   >> 481 devices with the resolved/exclusive DPA-boundaries for the nd_pmem or
                                                   >> 482 nd_blk device driver to consume.
391                                                   483 
392 In addition to the generic attributes of "mapp    484 In addition to the generic attributes of "mapping"s, "interleave_ways"
393 and "size" the REGION device also exports some    485 and "size" the REGION device also exports some convenience attributes.
394 "nstype" indicates the integer type of namespa    486 "nstype" indicates the integer type of namespace-device this region
395 emits, "devtype" duplicates the DEVTYPE variab    487 emits, "devtype" duplicates the DEVTYPE variable stored by udev at the
396 'add' event, "modalias" duplicates the MODALIA    488 'add' event, "modalias" duplicates the MODALIAS variable stored by udev
397 at the 'add' event, and finally, the optional     489 at the 'add' event, and finally, the optional "spa_index" is provided in
398 the case where the region is defined by a SPA.    490 the case where the region is defined by a SPA.
399                                                   491 
400 LIBNVDIMM: region::                               492 LIBNVDIMM: region::
401                                                   493 
402         struct nd_region *nvdimm_pmem_region_c    494         struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
403                         struct nd_region_desc     495                         struct nd_region_desc *ndr_desc);
                                                   >> 496         struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
                                                   >> 497                         struct nd_region_desc *ndr_desc);
404                                                   498 
405 ::                                                499 ::
406                                                   500 
407         /sys/devices/platform/nfit_test.0/ndbu    501         /sys/devices/platform/nfit_test.0/ndbus0
408         |-- region0                               502         |-- region0
409         |   |-- available_size                    503         |   |-- available_size
410         |   |-- btt0                              504         |   |-- btt0
411         |   |-- btt_seed                          505         |   |-- btt_seed
412         |   |-- devtype                           506         |   |-- devtype
413         |   |-- driver -> ../../../../../bus/n    507         |   |-- driver -> ../../../../../bus/nd/drivers/nd_region
414         |   |-- init_namespaces                   508         |   |-- init_namespaces
415         |   |-- mapping0                          509         |   |-- mapping0
416         |   |-- mapping1                          510         |   |-- mapping1
417         |   |-- mappings                          511         |   |-- mappings
418         |   |-- modalias                          512         |   |-- modalias
419         |   |-- namespace0.0                      513         |   |-- namespace0.0
420         |   |-- namespace_seed                    514         |   |-- namespace_seed
421         |   |-- numa_node                         515         |   |-- numa_node
422         |   |-- nfit                              516         |   |-- nfit
423         |   |   `-- spa_index                     517         |   |   `-- spa_index
424         |   |-- nstype                            518         |   |-- nstype
425         |   |-- set_cookie                        519         |   |-- set_cookie
426         |   |-- size                              520         |   |-- size
427         |   |-- subsystem -> ../../../../../bu    521         |   |-- subsystem -> ../../../../../bus/nd
428         |   `-- uevent                            522         |   `-- uevent
429         |-- region1                               523         |-- region1
430         [..]                                      524         [..]
431                                                   525 
432 LIBNDCTL: region enumeration example              526 LIBNDCTL: region enumeration example
433 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^              527 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
434                                                   528 
435 Sample region retrieval routines based on NFIT    529 Sample region retrieval routines based on NFIT-unique data like
436 "spa_index" (interleave set id).               !! 530 "spa_index" (interleave set id) for PMEM and "nfit_handle" (dimm id) for
437                                                !! 531 BLK::
438 ::                                             << 
439                                                   532 
440         static struct ndctl_region *get_pmem_r    533         static struct ndctl_region *get_pmem_region_by_spa_index(struct ndctl_bus *bus,
441                         unsigned int spa_index    534                         unsigned int spa_index)
442         {                                         535         {
443                 struct ndctl_region *region;      536                 struct ndctl_region *region;
444                                                   537 
445                 ndctl_region_foreach(bus, regi    538                 ndctl_region_foreach(bus, region) {
446                         if (ndctl_region_get_t    539                         if (ndctl_region_get_type(region) != ND_DEVICE_REGION_PMEM)
447                                 continue;         540                                 continue;
448                         if (ndctl_region_get_s    541                         if (ndctl_region_get_spa_index(region) == spa_index)
449                                 return region;    542                                 return region;
450                 }                                 543                 }
451                 return NULL;                      544                 return NULL;
452         }                                         545         }
453                                                   546 
                                                   >> 547         static struct ndctl_region *get_blk_region_by_dimm_handle(struct ndctl_bus *bus,
                                                   >> 548                         unsigned int handle)
                                                   >> 549         {
                                                   >> 550                 struct ndctl_region *region;
                                                   >> 551 
                                                   >> 552                 ndctl_region_foreach(bus, region) {
                                                   >> 553                         struct ndctl_mapping *map;
                                                   >> 554 
                                                   >> 555                         if (ndctl_region_get_type(region) != ND_DEVICE_REGION_BLOCK)
                                                   >> 556                                 continue;
                                                   >> 557                         ndctl_mapping_foreach(region, map) {
                                                   >> 558                                 struct ndctl_dimm *dimm = ndctl_mapping_get_dimm(map);
                                                   >> 559 
                                                   >> 560                                 if (ndctl_dimm_get_handle(dimm) == handle)
                                                   >> 561                                         return region;
                                                   >> 562                         }
                                                   >> 563                 }
                                                   >> 564                 return NULL;
                                                   >> 565         }
                                                   >> 566 
                                                   >> 567 
                                                   >> 568 Why Not Encode the Region Type into the Region Name?
                                                   >> 569 ----------------------------------------------------
                                                   >> 570 
                                                   >> 571 At first glance it seems since NFIT defines just PMEM and BLK interface
                                                   >> 572 types that we should simply name REGION devices with something derived
                                                   >> 573 from those type names.  However, the ND subsystem explicitly keeps the
                                                   >> 574 REGION name generic and expects userspace to always consider the
                                                   >> 575 region-attributes for four reasons:
                                                   >> 576 
                                                   >> 577     1. There are already more than two REGION and "namespace" types.  For
                                                   >> 578        PMEM there are two subtypes.  As mentioned previously we have PMEM where
                                                   >> 579        the constituent DIMM devices are known and anonymous PMEM.  For BLK
                                                   >> 580        regions the NFIT specification already anticipates vendor specific
                                                   >> 581        implementations.  The exact distinction of what a region contains is in
                                                   >> 582        the region-attributes not the region-name or the region-devtype.
                                                   >> 583 
                                                   >> 584     2. A region with zero child-namespaces is a possible configuration.  For
                                                   >> 585        example, the NFIT allows for a DCR to be published without a
                                                   >> 586        corresponding BLK-aperture.  This equates to a DIMM that can only accept
                                                   >> 587        control/configuration messages, but no i/o through a descendant block
                                                   >> 588        device.  Again, this "type" is advertised in the attributes ('mappings'
                                                   >> 589        == 0) and the name does not tell you much.
                                                   >> 590 
                                                   >> 591     3. What if a third major interface type arises in the future?  Outside
                                                   >> 592        of vendor specific implementations, it's not difficult to envision a
                                                   >> 593        third class of interface type beyond BLK and PMEM.  With a generic name
                                                   >> 594        for the REGION level of the device-hierarchy old userspace
                                                   >> 595        implementations can still make sense of new kernel advertised
                                                   >> 596        region-types.  Userspace can always rely on the generic region
                                                   >> 597        attributes like "mappings", "size", etc and the expected child devices
                                                   >> 598        named "namespace".  This generic format of the device-model hierarchy
                                                   >> 599        allows the LIBNVDIMM and LIBNDCTL implementations to be more uniform and
                                                   >> 600        future-proof.
                                                   >> 601 
                                                   >> 602     4. There are more robust mechanisms for determining the major type of a
                                                   >> 603        region than a device name.  See the next section, How Do I Determine the
                                                   >> 604        Major Type of a Region?
                                                   >> 605 
                                                   >> 606 How Do I Determine the Major Type of a Region?
                                                   >> 607 ----------------------------------------------
                                                   >> 608 
                                                   >> 609 Outside of the blanket recommendation of "use libndctl", or simply
                                                   >> 610 looking at the kernel header (/usr/include/linux/ndctl.h) to decode the
                                                   >> 611 "nstype" integer attribute, here are some other options.
                                                   >> 612 
                                                   >> 613 1. module alias lookup
                                                   >> 614 ^^^^^^^^^^^^^^^^^^^^^^
                                                   >> 615 
                                                   >> 616     The whole point of region/namespace device type differentiation is to
                                                   >> 617     decide which block-device driver will attach to a given LIBNVDIMM namespace.
                                                   >> 618     One can simply use the modalias to lookup the resulting module.  It's
                                                   >> 619     important to note that this method is robust in the presence of a
                                                   >> 620     vendor-specific driver down the road.  If a vendor-specific
                                                   >> 621     implementation wants to supplant the standard nd_blk driver it can with
                                                   >> 622     minimal impact to the rest of LIBNVDIMM.
                                                   >> 623 
                                                   >> 624     In fact, a vendor may also want to have a vendor-specific region-driver
                                                   >> 625     (outside of nd_region).  For example, if a vendor defined its own LABEL
                                                   >> 626     format it would need its own region driver to parse that LABEL and emit
                                                   >> 627     the resulting namespaces.  The output from module resolution is more
                                                   >> 628     accurate than a region-name or region-devtype.
                                                   >> 629 
                                                   >> 630 2. udev
                                                   >> 631 ^^^^^^^
                                                   >> 632 
                                                   >> 633     The kernel "devtype" is registered in the udev database::
                                                   >> 634 
                                                   >> 635         # udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region0
                                                   >> 636         P: /devices/platform/nfit_test.0/ndbus0/region0
                                                   >> 637         E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region0
                                                   >> 638         E: DEVTYPE=nd_pmem
                                                   >> 639         E: MODALIAS=nd:t2
                                                   >> 640         E: SUBSYSTEM=nd
                                                   >> 641 
                                                   >> 642         # udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region4
                                                   >> 643         P: /devices/platform/nfit_test.0/ndbus0/region4
                                                   >> 644         E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region4
                                                   >> 645         E: DEVTYPE=nd_blk
                                                   >> 646         E: MODALIAS=nd:t3
                                                   >> 647         E: SUBSYSTEM=nd
                                                   >> 648 
                                                   >> 649     ...and is available as a region attribute, but keep in mind that the
                                                   >> 650     "devtype" does not indicate sub-type variations and scripts should
                                                   >> 651     really be understanding the other attributes.
                                                   >> 652 
                                                   >> 653 3. type specific attributes
                                                   >> 654 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                   >> 655 
                                                   >> 656     As it currently stands a BLK-aperture region will never have a
                                                   >> 657     "nfit/spa_index" attribute, but neither will a non-NFIT PMEM region.  A
                                                   >> 658     BLK region with a "mappings" value of 0 is, as mentioned above, a DIMM
                                                   >> 659     that does not allow I/O.  A PMEM region with a "mappings" value of zero
                                                   >> 660     is a simple system-physical-address range.
                                                   >> 661 
454                                                   662 
455 LIBNVDIMM/LIBNDCTL: Namespace                     663 LIBNVDIMM/LIBNDCTL: Namespace
456 -----------------------------                     664 -----------------------------
457                                                   665 
458 A REGION, after resolving DPA aliasing and LAB !! 666 A REGION, after resolving DPA aliasing and LABEL specified boundaries,
459 one or more "namespace" devices.  The arrival  !! 667 surfaces one or more "namespace" devices.  The arrival of a "namespace"
460 triggers the nd_pmem driver to load and regist !! 668 device currently triggers either the nd_blk or nd_pmem driver to load
                                                   >> 669 and register a disk/block device.
461                                                   670 
462 LIBNVDIMM: namespace                              671 LIBNVDIMM: namespace
463 ^^^^^^^^^^^^^^^^^^^^                              672 ^^^^^^^^^^^^^^^^^^^^
464                                                   673 
465 Here is a sample layout from the 2 major types !! 674 Here is a sample layout from the three major types of NAMESPACE where
466 represents DIMM-info-backed PMEM (note that it !! 675 namespace0.0 represents DIMM-info-backed PMEM (note that it has a 'uuid'
467 namespace1.0 represents an anonymous PMEM name !! 676 attribute), namespace2.0 represents a BLK namespace (note it has a
468 attribute due to not support a LABEL)          !! 677 'sector_size' attribute) that, and namespace6.0 represents an anonymous
469                                                !! 678 PMEM namespace (note that has no 'uuid' attribute due to not support a
470 ::                                             !! 679 LABEL)::
471                                                   680 
472         /sys/devices/platform/nfit_test.0/ndbu    681         /sys/devices/platform/nfit_test.0/ndbus0/region0/namespace0.0
473         |-- alt_name                              682         |-- alt_name
474         |-- devtype                               683         |-- devtype
475         |-- dpa_extents                           684         |-- dpa_extents
476         |-- force_raw                             685         |-- force_raw
477         |-- modalias                              686         |-- modalias
478         |-- numa_node                             687         |-- numa_node
479         |-- resource                              688         |-- resource
480         |-- size                                  689         |-- size
481         |-- subsystem -> ../../../../../../bus    690         |-- subsystem -> ../../../../../../bus/nd
482         |-- type                                  691         |-- type
483         |-- uevent                                692         |-- uevent
484         `-- uuid                                  693         `-- uuid
485         /sys/devices/platform/nfit_test.1/ndbu !! 694         /sys/devices/platform/nfit_test.0/ndbus0/region2/namespace2.0
                                                   >> 695         |-- alt_name
                                                   >> 696         |-- devtype
                                                   >> 697         |-- dpa_extents
                                                   >> 698         |-- force_raw
                                                   >> 699         |-- modalias
                                                   >> 700         |-- numa_node
                                                   >> 701         |-- sector_size
                                                   >> 702         |-- size
                                                   >> 703         |-- subsystem -> ../../../../../../bus/nd
                                                   >> 704         |-- type
                                                   >> 705         |-- uevent
                                                   >> 706         `-- uuid
                                                   >> 707         /sys/devices/platform/nfit_test.1/ndbus1/region6/namespace6.0
486         |-- block                                 708         |-- block
487         |   `-- pmem0                             709         |   `-- pmem0
488         |-- devtype                               710         |-- devtype
489         |-- driver -> ../../../../../../bus/nd    711         |-- driver -> ../../../../../../bus/nd/drivers/pmem
490         |-- force_raw                             712         |-- force_raw
491         |-- modalias                              713         |-- modalias
492         |-- numa_node                             714         |-- numa_node
493         |-- resource                              715         |-- resource
494         |-- size                                  716         |-- size
495         |-- subsystem -> ../../../../../../bus    717         |-- subsystem -> ../../../../../../bus/nd
496         |-- type                                  718         |-- type
497         `-- uevent                                719         `-- uevent
498                                                   720 
499 LIBNDCTL: namespace enumeration example           721 LIBNDCTL: namespace enumeration example
500 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^           722 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
501 Namespaces are indexed relative to their paren    723 Namespaces are indexed relative to their parent region, example below.
502 These indexes are mostly static from boot to b    724 These indexes are mostly static from boot to boot, but subsystem makes
503 no guarantees in this regard.  For a static na    725 no guarantees in this regard.  For a static namespace identifier use its
504 'uuid' attribute.                                 726 'uuid' attribute.
505                                                   727 
506 ::                                                728 ::
507                                                   729 
508   static struct ndctl_namespace                   730   static struct ndctl_namespace
509   *get_namespace_by_id(struct ndctl_region *re    731   *get_namespace_by_id(struct ndctl_region *region, unsigned int id)
510   {                                               732   {
511           struct ndctl_namespace *ndns;           733           struct ndctl_namespace *ndns;
512                                                   734 
513           ndctl_namespace_foreach(region, ndns    735           ndctl_namespace_foreach(region, ndns)
514                   if (ndctl_namespace_get_id(n    736                   if (ndctl_namespace_get_id(ndns) == id)
515                           return ndns;            737                           return ndns;
516                                                   738 
517           return NULL;                            739           return NULL;
518   }                                               740   }
519                                                   741 
520 LIBNDCTL: namespace creation example              742 LIBNDCTL: namespace creation example
521 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^              743 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
522                                                   744 
523 Idle namespaces are automatically created by t    745 Idle namespaces are automatically created by the kernel if a given
524 region has enough available capacity to create    746 region has enough available capacity to create a new namespace.
525 Namespace instantiation involves finding an id    747 Namespace instantiation involves finding an idle namespace and
526 configuring it.  For the most part the setting    748 configuring it.  For the most part the setting of namespace attributes
527 can occur in any order, the only constraint is    749 can occur in any order, the only constraint is that 'uuid' must be set
528 before 'size'.  This enables the kernel to tra    750 before 'size'.  This enables the kernel to track DPA allocations
529 internally with a static identifier::             751 internally with a static identifier::
530                                                   752 
531   static int configure_namespace(struct ndctl_    753   static int configure_namespace(struct ndctl_region *region,
532                   struct ndctl_namespace *ndns    754                   struct ndctl_namespace *ndns,
533                   struct namespace_parameters     755                   struct namespace_parameters *parameters)
534   {                                               756   {
535           char devname[50];                       757           char devname[50];
536                                                   758 
537           snprintf(devname, sizeof(devname), "    759           snprintf(devname, sizeof(devname), "namespace%d.%d",
538                           ndctl_region_get_id(    760                           ndctl_region_get_id(region), paramaters->id);
539                                                   761 
540           ndctl_namespace_set_alt_name(ndns, d    762           ndctl_namespace_set_alt_name(ndns, devname);
541           /* 'uuid' must be set prior to setti    763           /* 'uuid' must be set prior to setting size! */
542           ndctl_namespace_set_uuid(ndns, param    764           ndctl_namespace_set_uuid(ndns, paramaters->uuid);
543           ndctl_namespace_set_size(ndns, param    765           ndctl_namespace_set_size(ndns, paramaters->size);
544           /* unlike pmem namespaces, blk names    766           /* unlike pmem namespaces, blk namespaces have a sector size */
545           if (parameters->lbasize)                767           if (parameters->lbasize)
546                   ndctl_namespace_set_sector_s    768                   ndctl_namespace_set_sector_size(ndns, parameters->lbasize);
547           ndctl_namespace_enable(ndns);           769           ndctl_namespace_enable(ndns);
548   }                                               770   }
549                                                   771 
550                                                   772 
551 Why the Term "namespace"?                         773 Why the Term "namespace"?
552 ^^^^^^^^^^^^^^^^^^^^^^^^^                         774 ^^^^^^^^^^^^^^^^^^^^^^^^^
553                                                   775 
554     1. Why not "volume" for instance?  "volume    776     1. Why not "volume" for instance?  "volume" ran the risk of confusing
555        ND (libnvdimm subsystem) to a volume ma    777        ND (libnvdimm subsystem) to a volume manager like device-mapper.
556                                                   778 
557     2. The term originated to describe the sub    779     2. The term originated to describe the sub-devices that can be created
558        within a NVME controller (see the nvme     780        within a NVME controller (see the nvme specification:
559        https://www.nvmexpress.org/specificatio !! 781        http://www.nvmexpress.org/specifications/), and NFIT namespaces are
560        meant to parallel the capabilities and     782        meant to parallel the capabilities and configurability of
561        NVME-namespaces.                           783        NVME-namespaces.
562                                                   784 
563                                                   785 
564 LIBNVDIMM/LIBNDCTL: Block Translation Table "b    786 LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
565 ----------------------------------------------    787 -------------------------------------------------
566                                                   788 
567 A BTT (design document: https://pmem.io/2014/0 !! 789 A BTT (design document: http://pmem.io/2014/09/23/btt.html) is a stacked
568 personality driver for a namespace that fronts !! 790 block device driver that fronts either the whole block device or a
569 'address abstraction'.                         !! 791 partition of a block device emitted by either a PMEM or BLK NAMESPACE.
570                                                   792 
571 LIBNVDIMM: btt layout                             793 LIBNVDIMM: btt layout
572 ^^^^^^^^^^^^^^^^^^^^^                             794 ^^^^^^^^^^^^^^^^^^^^^
573                                                   795 
574 Every region will start out with at least one     796 Every region will start out with at least one BTT device which is the
575 seed device.  To activate it set the "namespac    797 seed device.  To activate it set the "namespace", "uuid", and
576 "sector_size" attributes and then bind the dev    798 "sector_size" attributes and then bind the device to the nd_pmem or
577 nd_blk driver depending on the region type::      799 nd_blk driver depending on the region type::
578                                                   800 
579         /sys/devices/platform/nfit_test.1/ndbu    801         /sys/devices/platform/nfit_test.1/ndbus0/region0/btt0/
580         |-- namespace                             802         |-- namespace
581         |-- delete                                803         |-- delete
582         |-- devtype                               804         |-- devtype
583         |-- modalias                              805         |-- modalias
584         |-- numa_node                             806         |-- numa_node
585         |-- sector_size                           807         |-- sector_size
586         |-- subsystem -> ../../../../../bus/nd    808         |-- subsystem -> ../../../../../bus/nd
587         |-- uevent                                809         |-- uevent
588         `-- uuid                                  810         `-- uuid
589                                                   811 
590 LIBNDCTL: btt creation example                    812 LIBNDCTL: btt creation example
591 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                    813 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
592                                                   814 
593 Similar to namespaces an idle BTT device is au    815 Similar to namespaces an idle BTT device is automatically created per
594 region.  Each time this "seed" btt device is c    816 region.  Each time this "seed" btt device is configured and enabled a new
595 seed is created.  Creating a BTT configuration    817 seed is created.  Creating a BTT configuration involves two steps of
596 finding and idle BTT and assigning it to consu !! 818 finding and idle BTT and assigning it to consume a PMEM or BLK namespace::
597                                                << 
598 ::                                             << 
599                                                   819 
600         static struct ndctl_btt *get_idle_btt(    820         static struct ndctl_btt *get_idle_btt(struct ndctl_region *region)
601         {                                         821         {
602                 struct ndctl_btt *btt;            822                 struct ndctl_btt *btt;
603                                                   823 
604                 ndctl_btt_foreach(region, btt)    824                 ndctl_btt_foreach(region, btt)
605                         if (!ndctl_btt_is_enab    825                         if (!ndctl_btt_is_enabled(btt)
606                                         && !nd    826                                         && !ndctl_btt_is_configured(btt))
607                                 return btt;       827                                 return btt;
608                                                   828 
609                 return NULL;                      829                 return NULL;
610         }                                         830         }
611                                                   831 
612         static int configure_btt(struct ndctl_    832         static int configure_btt(struct ndctl_region *region,
613                         struct btt_parameters     833                         struct btt_parameters *parameters)
614         {                                         834         {
615                 btt = get_idle_btt(region);       835                 btt = get_idle_btt(region);
616                                                   836 
617                 ndctl_btt_set_uuid(btt, parame    837                 ndctl_btt_set_uuid(btt, parameters->uuid);
618                 ndctl_btt_set_sector_size(btt,    838                 ndctl_btt_set_sector_size(btt, parameters->sector_size);
619                 ndctl_btt_set_namespace(btt, p    839                 ndctl_btt_set_namespace(btt, parameters->ndns);
620                 /* turn off raw mode device */    840                 /* turn off raw mode device */
621                 ndctl_namespace_disable(parame    841                 ndctl_namespace_disable(parameters->ndns);
622                 /* turn on btt access */          842                 /* turn on btt access */
623                 ndctl_btt_enable(btt);            843                 ndctl_btt_enable(btt);
624         }                                         844         }
625                                                   845 
626 Once instantiated a new inactive btt seed devi    846 Once instantiated a new inactive btt seed device will appear underneath
627 the region.                                       847 the region.
628                                                   848 
629 Once a "namespace" is removed from a BTT that     849 Once a "namespace" is removed from a BTT that instance of the BTT device
630 will be deleted or otherwise reset to default     850 will be deleted or otherwise reset to default values.  This deletion is
631 only at the device model level.  In order to d    851 only at the device model level.  In order to destroy a BTT the "info
632 block" needs to be destroyed.  Note, that to d    852 block" needs to be destroyed.  Note, that to destroy a BTT the media
633 needs to be written in raw mode.  By default,     853 needs to be written in raw mode.  By default, the kernel will autodetect
634 the presence of a BTT and disable raw mode.  T    854 the presence of a BTT and disable raw mode.  This autodetect behavior
635 can be suppressed by enabling raw mode for the    855 can be suppressed by enabling raw mode for the namespace via the
636 ndctl_namespace_set_raw_mode() API.               856 ndctl_namespace_set_raw_mode() API.
637                                                   857 
638                                                   858 
639 Summary LIBNDCTL Diagram                          859 Summary LIBNDCTL Diagram
640 ------------------------                          860 ------------------------
641                                                   861 
642 For the given example above, here is the view     862 For the given example above, here is the view of the objects as seen by the
643 LIBNDCTL API::                                    863 LIBNDCTL API::
644                                                   864 
645               +---+                               865               +---+
646               |CTX|                            !! 866               |CTX|    +---------+   +--------------+  +---------------+
647               +-+-+                            !! 867               +-+-+  +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
648                 |                              !! 868                 |    | +---------+   +--------------+  +---------------+
649   +-------+     |                              !! 869   +-------+     |    | +---------+   +--------------+  +---------------+
650   | DIMM0 <-+   |      +---------+   +-------- !! 870   | DIMM0 <-+   |    +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" |
651   +-------+ |   |    +-> REGION0 +---> NAMESPA !! 871   +-------+ |   |    | +---------+   +--------------+  +---------------+
652   | DIMM1 <-+ +-v--+ | +---------+   +--------    872   | DIMM1 <-+ +-v--+ | +---------+   +--------------+  +---------------+
653   +-------+ +-+BUS0+-| +---------+   +-------- !! 873   +-------+ +-+BUS0+---> REGION2 +-+-> NAMESPACE2.0 +--> ND6  "blk2.0" |
654   | DIMM2 <-+ +----+ +-> REGION1 +---> NAMESPA !! 874   | DIMM2 <-+ +----+ | +---------+ | +--------------+  +----------------------+
655   +-------+ |        | +---------+   +-------- !! 875   +-------+ |        |             +-> NAMESPACE2.1 +--> ND5  "blk2.1" | BTT2 |
656   | DIMM3 <-+                                  !! 876   | DIMM3 <-+        |               +--------------+  +----------------------+
657   +-------+                                    !! 877   +-------+          | +---------+   +--------------+  +---------------+
                                                   >> 878                      +-> REGION3 +-+-> NAMESPACE3.0 +--> ND4  "blk3.0" |
                                                   >> 879                      | +---------+ | +--------------+  +----------------------+
                                                   >> 880                      |             +-> NAMESPACE3.1 +--> ND3  "blk3.1" | BTT1 |
                                                   >> 881                      |               +--------------+  +----------------------+
                                                   >> 882                      | +---------+   +--------------+  +---------------+
                                                   >> 883                      +-> REGION4 +---> NAMESPACE4.0 +--> ND2  "blk4.0" |
                                                   >> 884                      | +---------+   +--------------+  +---------------+
                                                   >> 885                      | +---------+   +--------------+  +----------------------+
                                                   >> 886                      +-> REGION5 +---> NAMESPACE5.0 +--> ND1  "blk5.0" | BTT0 |
                                                   >> 887                        +---------+   +--------------+  +---------------+------+
                                                      

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php