~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

TOMOYO Linux Cross Reference
Linux/Documentation/security/credentials.rst

Version: ~ [ linux-6.11.5 ] ~ [ linux-6.10.14 ] ~ [ linux-6.9.12 ] ~ [ linux-6.8.12 ] ~ [ linux-6.7.12 ] ~ [ linux-6.6.58 ] ~ [ linux-6.5.13 ] ~ [ linux-6.4.16 ] ~ [ linux-6.3.13 ] ~ [ linux-6.2.16 ] ~ [ linux-6.1.114 ] ~ [ linux-6.0.19 ] ~ [ linux-5.19.17 ] ~ [ linux-5.18.19 ] ~ [ linux-5.17.15 ] ~ [ linux-5.16.20 ] ~ [ linux-5.15.169 ] ~ [ linux-5.14.21 ] ~ [ linux-5.13.19 ] ~ [ linux-5.12.19 ] ~ [ linux-5.11.22 ] ~ [ linux-5.10.228 ] ~ [ linux-5.9.16 ] ~ [ linux-5.8.18 ] ~ [ linux-5.7.19 ] ~ [ linux-5.6.19 ] ~ [ linux-5.5.19 ] ~ [ linux-5.4.284 ] ~ [ linux-5.3.18 ] ~ [ linux-5.2.21 ] ~ [ linux-5.1.21 ] ~ [ linux-5.0.21 ] ~ [ linux-4.20.17 ] ~ [ linux-4.19.322 ] ~ [ linux-4.18.20 ] ~ [ linux-4.17.19 ] ~ [ linux-4.16.18 ] ~ [ linux-4.15.18 ] ~ [ linux-4.14.336 ] ~ [ linux-4.13.16 ] ~ [ linux-4.12.14 ] ~ [ linux-4.11.12 ] ~ [ linux-4.10.17 ] ~ [ linux-4.9.337 ] ~ [ linux-4.4.302 ] ~ [ linux-3.10.108 ] ~ [ linux-2.6.32.71 ] ~ [ linux-2.6.0 ] ~ [ linux-2.4.37.11 ] ~ [ unix-v6-master ] ~ [ ccs-tools-1.8.9 ] ~ [ policy-sample ] ~
Architecture: ~ [ i386 ] ~ [ alpha ] ~ [ m68k ] ~ [ mips ] ~ [ ppc ] ~ [ sparc ] ~ [ sparc64 ] ~

  1 ====================
  2 Credentials in Linux
  3 ====================
  4 
  5 By: David Howells <dhowells@redhat.com>
  6 
  7 .. contents:: :local:
  8 
  9 Overview
 10 ========
 11 
 12 There are several parts to the security check performed by Linux when one
 13 object acts upon another:
 14 
 15  1. Objects.
 16 
 17      Objects are things in the system that may be acted upon directly by
 18      userspace programs.  Linux has a variety of actionable objects, including:
 19 
 20         - Tasks
 21         - Files/inodes
 22         - Sockets
 23         - Message queues
 24         - Shared memory segments
 25         - Semaphores
 26         - Keys
 27 
 28      As a part of the description of all these objects there is a set of
 29      credentials.  What's in the set depends on the type of object.
 30 
 31  2. Object ownership.
 32 
 33      Amongst the credentials of most objects, there will be a subset that
 34      indicates the ownership of that object.  This is used for resource
 35      accounting and limitation (disk quotas and task rlimits for example).
 36 
 37      In a standard UNIX filesystem, for instance, this will be defined by the
 38      UID marked on the inode.
 39 
 40  3. The objective context.
 41 
 42      Also amongst the credentials of those objects, there will be a subset that
 43      indicates the 'objective context' of that object.  This may or may not be
 44      the same set as in (2) - in standard UNIX files, for instance, this is the
 45      defined by the UID and the GID marked on the inode.
 46 
 47      The objective context is used as part of the security calculation that is
 48      carried out when an object is acted upon.
 49 
 50  4. Subjects.
 51 
 52      A subject is an object that is acting upon another object.
 53 
 54      Most of the objects in the system are inactive: they don't act on other
 55      objects within the system.  Processes/tasks are the obvious exception:
 56      they do stuff; they access and manipulate things.
 57 
 58      Objects other than tasks may under some circumstances also be subjects.
 59      For instance an open file may send SIGIO to a task using the UID and EUID
 60      given to it by a task that called ``fcntl(F_SETOWN)`` upon it.  In this case,
 61      the file struct will have a subjective context too.
 62 
 63  5. The subjective context.
 64 
 65      A subject has an additional interpretation of its credentials.  A subset
 66      of its credentials forms the 'subjective context'.  The subjective context
 67      is used as part of the security calculation that is carried out when a
 68      subject acts.
 69 
 70      A Linux task, for example, has the FSUID, FSGID and the supplementary
 71      group list for when it is acting upon a file - which are quite separate
 72      from the real UID and GID that normally form the objective context of the
 73      task.
 74 
 75  6. Actions.
 76 
 77      Linux has a number of actions available that a subject may perform upon an
 78      object.  The set of actions available depends on the nature of the subject
 79      and the object.
 80 
 81      Actions include reading, writing, creating and deleting files; forking or
 82      signalling and tracing tasks.
 83 
 84  7. Rules, access control lists and security calculations.
 85 
 86      When a subject acts upon an object, a security calculation is made.  This
 87      involves taking the subjective context, the objective context and the
 88      action, and searching one or more sets of rules to see whether the subject
 89      is granted or denied permission to act in the desired manner on the
 90      object, given those contexts.
 91 
 92      There are two main sources of rules:
 93 
 94      a. Discretionary access control (DAC):
 95 
 96          Sometimes the object will include sets of rules as part of its
 97          description.  This is an 'Access Control List' or 'ACL'.  A Linux
 98          file may supply more than one ACL.
 99 
100          A traditional UNIX file, for example, includes a permissions mask that
101          is an abbreviated ACL with three fixed classes of subject ('user',
102          'group' and 'other'), each of which may be granted certain privileges
103          ('read', 'write' and 'execute' - whatever those map to for the object
104          in question).  UNIX file permissions do not allow the arbitrary
105          specification of subjects, however, and so are of limited use.
106 
107          A Linux file might also sport a POSIX ACL.  This is a list of rules
108          that grants various permissions to arbitrary subjects.
109 
110      b. Mandatory access control (MAC):
111 
112          The system as a whole may have one or more sets of rules that get
113          applied to all subjects and objects, regardless of their source.
114          SELinux and Smack are examples of this.
115 
116          In the case of SELinux and Smack, each object is given a label as part
117          of its credentials.  When an action is requested, they take the
118          subject label, the object label and the action and look for a rule
119          that says that this action is either granted or denied.
120 
121 
122 Types of Credentials
123 ====================
124 
125 The Linux kernel supports the following types of credentials:
126 
127  1. Traditional UNIX credentials.
128 
129         - Real User ID
130         - Real Group ID
131 
132      The UID and GID are carried by most, if not all, Linux objects, even if in
133      some cases it has to be invented (FAT or CIFS files for example, which are
134      derived from Windows).  These (mostly) define the objective context of
135      that object, with tasks being slightly different in some cases.
136 
137         - Effective, Saved and FS User ID
138         - Effective, Saved and FS Group ID
139         - Supplementary groups
140 
141      These are additional credentials used by tasks only.  Usually, an
142      EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
143      will be used as the objective.  For tasks, it should be noted that this is
144      not always true.
145 
146  2. Capabilities.
147 
148         - Set of permitted capabilities
149         - Set of inheritable capabilities
150         - Set of effective capabilities
151         - Capability bounding set
152 
153      These are only carried by tasks.  They indicate superior capabilities
154      granted piecemeal to a task that an ordinary task wouldn't otherwise have.
155      These are manipulated implicitly by changes to the traditional UNIX
156      credentials, but can also be manipulated directly by the ``capset()``
157      system call.
158 
159      The permitted capabilities are those caps that the process might grant
160      itself to its effective or permitted sets through ``capset()``.  This
161      inheritable set might also be so constrained.
162 
163      The effective capabilities are the ones that a task is actually allowed to
164      make use of itself.
165 
166      The inheritable capabilities are the ones that may get passed across
167      ``execve()``.
168 
169      The bounding set limits the capabilities that may be inherited across
170      ``execve()``, especially when a binary is executed that will execute as
171      UID 0.
172 
173  3. Secure management flags (securebits).
174 
175      These are only carried by tasks.  These govern the way the above
176      credentials are manipulated and inherited over certain operations such as
177      execve().  They aren't used directly as objective or subjective
178      credentials.
179 
180  4. Keys and keyrings.
181 
182      These are only carried by tasks.  They carry and cache security tokens
183      that don't fit into the other standard UNIX credentials.  They are for
184      making such things as network filesystem keys available to the file
185      accesses performed by processes, without the necessity of ordinary
186      programs having to know about security details involved.
187 
188      Keyrings are a special type of key.  They carry sets of other keys and can
189      be searched for the desired key.  Each process may subscribe to a number
190      of keyrings:
191 
192         Per-thread keying
193         Per-process keyring
194         Per-session keyring
195 
196      When a process accesses a key, if not already present, it will normally be
197      cached on one of these keyrings for future accesses to find.
198 
199      For more information on using keys, see ``Documentation/security/keys/*``.
200 
201  5. LSM
202 
203      The Linux Security Module allows extra controls to be placed over the
204      operations that a task may do.  Currently Linux supports several LSM
205      options.
206 
207      Some work by labelling the objects in a system and then applying sets of
208      rules (policies) that say what operations a task with one label may do to
209      an object with another label.
210 
211  6. AF_KEY
212 
213      This is a socket-based approach to credential management for networking
214      stacks [RFC 2367].  It isn't discussed by this document as it doesn't
215      interact directly with task and file credentials; rather it keeps system
216      level credentials.
217 
218 
219 When a file is opened, part of the opening task's subjective context is
220 recorded in the file struct created.  This allows operations using that file
221 struct to use those credentials instead of the subjective context of the task
222 that issued the operation.  An example of this would be a file opened on a
223 network filesystem where the credentials of the opened file should be presented
224 to the server, regardless of who is actually doing a read or a write upon it.
225 
226 
227 File Markings
228 =============
229 
230 Files on disk or obtained over the network may have annotations that form the
231 objective security context of that file.  Depending on the type of filesystem,
232 this may include one or more of the following:
233 
234  * UNIX UID, GID, mode;
235  * Windows user ID;
236  * Access control list;
237  * LSM security label;
238  * UNIX exec privilege escalation bits (SUID/SGID);
239  * File capabilities exec privilege escalation bits.
240 
241 These are compared to the task's subjective security context, and certain
242 operations allowed or disallowed as a result.  In the case of execve(), the
243 privilege escalation bits come into play, and may allow the resulting process
244 extra privileges, based on the annotations on the executable file.
245 
246 
247 Task Credentials
248 ================
249 
250 In Linux, all of a task's credentials are held in (uid, gid) or through
251 (groups, keys, LSM security) a refcounted structure of type 'struct cred'.
252 Each task points to its credentials by a pointer called 'cred' in its
253 task_struct.
254 
255 Once a set of credentials has been prepared and committed, it may not be
256 changed, barring the following exceptions:
257 
258  1. its reference count may be changed;
259 
260  2. the reference count on the group_info struct it points to may be changed;
261 
262  3. the reference count on the security data it points to may be changed;
263 
264  4. the reference count on any keyrings it points to may be changed;
265 
266  5. any keyrings it points to may be revoked, expired or have their security
267     attributes changed; and
268 
269  6. the contents of any keyrings to which it points may be changed (the whole
270     point of keyrings being a shared set of credentials, modifiable by anyone
271     with appropriate access).
272 
273 To alter anything in the cred struct, the copy-and-replace principle must be
274 adhered to.  First take a copy, then alter the copy and then use RCU to change
275 the task pointer to make it point to the new copy.  There are wrappers to aid
276 with this (see below).
277 
278 A task may only alter its _own_ credentials; it is no longer permitted for a
279 task to alter another's credentials.  This means the ``capset()`` system call
280 is no longer permitted to take any PID other than the one of the current
281 process. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no
282 longer permit attachment to process-specific keyrings in the requesting
283 process as the instantiating process may need to create them.
284 
285 
286 Immutable Credentials
287 ---------------------
288 
289 Once a set of credentials has been made public (by calling ``commit_creds()``
290 for example), it must be considered immutable, barring two exceptions:
291 
292  1. The reference count may be altered.
293 
294  2. While the keyring subscriptions of a set of credentials may not be
295     changed, the keyrings subscribed to may have their contents altered.
296 
297 To catch accidental credential alteration at compile time, struct task_struct
298 has _const_ pointers to its credential sets, as does struct file.  Furthermore,
299 certain functions such as ``get_cred()`` and ``put_cred()`` operate on const
300 pointers, thus rendering casts unnecessary, but require to temporarily ditch
301 the const qualification to be able to alter the reference count.
302 
303 
304 Accessing Task Credentials
305 --------------------------
306 
307 A task being able to alter only its own credentials permits the current process
308 to read or replace its own credentials without the need for any form of locking
309 -- which simplifies things greatly.  It can just call::
310 
311         const struct cred *current_cred()
312 
313 to get a pointer to its credentials structure, and it doesn't have to release
314 it afterwards.
315 
316 There are convenience wrappers for retrieving specific aspects of a task's
317 credentials (the value is simply returned in each case)::
318 
319         uid_t current_uid(void)         Current's real UID
320         gid_t current_gid(void)         Current's real GID
321         uid_t current_euid(void)        Current's effective UID
322         gid_t current_egid(void)        Current's effective GID
323         uid_t current_fsuid(void)       Current's file access UID
324         gid_t current_fsgid(void)       Current's file access GID
325         kernel_cap_t current_cap(void)  Current's effective capabilities
326         struct user_struct *current_user(void)  Current's user account
327 
328 There are also convenience wrappers for retrieving specific associated pairs of
329 a task's credentials::
330 
331         void current_uid_gid(uid_t *, gid_t *);
332         void current_euid_egid(uid_t *, gid_t *);
333         void current_fsuid_fsgid(uid_t *, gid_t *);
334 
335 which return these pairs of values through their arguments after retrieving
336 them from the current task's credentials.
337 
338 
339 In addition, there is a function for obtaining a reference on the current
340 process's current set of credentials::
341 
342         const struct cred *get_current_cred(void);
343 
344 and functions for getting references to one of the credentials that don't
345 actually live in struct cred::
346 
347         struct user_struct *get_current_user(void);
348         struct group_info *get_current_groups(void);
349 
350 which get references to the current process's user accounting structure and
351 supplementary groups list respectively.
352 
353 Once a reference has been obtained, it must be released with ``put_cred()``,
354 ``free_uid()`` or ``put_group_info()`` as appropriate.
355 
356 
357 Accessing Another Task's Credentials
358 ------------------------------------
359 
360 While a task may access its own credentials without the need for locking, the
361 same is not true of a task wanting to access another task's credentials.  It
362 must use the RCU read lock and ``rcu_dereference()``.
363 
364 The ``rcu_dereference()`` is wrapped by::
365 
366         const struct cred *__task_cred(struct task_struct *task);
367 
368 This should be used inside the RCU read lock, as in the following example::
369 
370         void foo(struct task_struct *t, struct foo_data *f)
371         {
372                 const struct cred *tcred;
373                 ...
374                 rcu_read_lock();
375                 tcred = __task_cred(t);
376                 f->uid = tcred->uid;
377                 f->gid = tcred->gid;
378                 f->groups = get_group_info(tcred->groups);
379                 rcu_read_unlock();
380                 ...
381         }
382 
383 Should it be necessary to hold another task's credentials for a long period of
384 time, and possibly to sleep while doing so, then the caller should get a
385 reference on them using::
386 
387         const struct cred *get_task_cred(struct task_struct *task);
388 
389 This does all the RCU magic inside of it.  The caller must call put_cred() on
390 the credentials so obtained when they're finished with.
391 
392 .. note::
393    The result of ``__task_cred()`` should not be passed directly to
394    ``get_cred()`` as this may race with ``commit_cred()``.
395 
396 There are a couple of convenience functions to access bits of another task's
397 credentials, hiding the RCU magic from the caller::
398 
399         uid_t task_uid(task)            Task's real UID
400         uid_t task_euid(task)           Task's effective UID
401 
402 If the caller is holding the RCU read lock at the time anyway, then::
403 
404         __task_cred(task)->uid
405         __task_cred(task)->euid
406 
407 should be used instead.  Similarly, if multiple aspects of a task's credentials
408 need to be accessed, RCU read lock should be used, ``__task_cred()`` called,
409 the result stored in a temporary pointer and then the credential aspects called
410 from that before dropping the lock.  This prevents the potentially expensive
411 RCU magic from being invoked multiple times.
412 
413 Should some other single aspect of another task's credentials need to be
414 accessed, then this can be used::
415 
416         task_cred_xxx(task, member)
417 
418 where 'member' is a non-pointer member of the cred struct.  For instance::
419 
420         uid_t task_cred_xxx(task, suid);
421 
422 will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
423 magic.  This may not be used for pointer members as what they point to may
424 disappear the moment the RCU read lock is dropped.
425 
426 
427 Altering Credentials
428 --------------------
429 
430 As previously mentioned, a task may only alter its own credentials, and may not
431 alter those of another task.  This means that it doesn't need to use any
432 locking to alter its own credentials.
433 
434 To alter the current process's credentials, a function should first prepare a
435 new set of credentials by calling::
436 
437         struct cred *prepare_creds(void);
438 
439 this locks current->cred_replace_mutex and then allocates and constructs a
440 duplicate of the current process's credentials, returning with the mutex still
441 held if successful.  It returns NULL if not successful (out of memory).
442 
443 The mutex prevents ``ptrace()`` from altering the ptrace state of a process
444 while security checks on credentials construction and changing is taking place
445 as the ptrace state may alter the outcome, particularly in the case of
446 ``execve()``.
447 
448 The new credentials set should be altered appropriately, and any security
449 checks and hooks done.  Both the current and the proposed sets of credentials
450 are available for this purpose as current_cred() will return the current set
451 still at this point.
452 
453 When replacing the group list, the new list must be sorted before it
454 is added to the credential, as a binary search is used to test for
455 membership.  In practice, this means groups_sort() should be
456 called before set_groups() or set_current_groups().
457 groups_sort() must not be called on a ``struct group_list`` which
458 is shared as it may permute elements as part of the sorting process
459 even if the array is already sorted.
460 
461 When the credential set is ready, it should be committed to the current process
462 by calling::
463 
464         int commit_creds(struct cred *new);
465 
466 This will alter various aspects of the credentials and the process, giving the
467 LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
468 actually commit the new credentials to ``current->cred``, it will release
469 ``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
470 will notify the scheduler and others of the changes.
471 
472 This function is guaranteed to return 0, so that it can be tail-called at the
473 end of such functions as ``sys_setresuid()``.
474 
475 Note that this function consumes the caller's reference to the new credentials.
476 The caller should _not_ call ``put_cred()`` on the new credentials afterwards.
477 
478 Furthermore, once this function has been called on a new set of credentials,
479 those credentials may _not_ be changed further.
480 
481 
482 Should the security checks fail or some other error occur after
483 ``prepare_creds()`` has been called, then the following function should be
484 invoked::
485 
486         void abort_creds(struct cred *new);
487 
488 This releases the lock on ``current->cred_replace_mutex`` that
489 ``prepare_creds()`` got and then releases the new credentials.
490 
491 
492 A typical credentials alteration function would look something like this::
493 
494         int alter_suid(uid_t suid)
495         {
496                 struct cred *new;
497                 int ret;
498 
499                 new = prepare_creds();
500                 if (!new)
501                         return -ENOMEM;
502 
503                 new->suid = suid;
504                 ret = security_alter_suid(new);
505                 if (ret < 0) {
506                         abort_creds(new);
507                         return ret;
508                 }
509 
510                 return commit_creds(new);
511         }
512 
513 
514 Managing Credentials
515 --------------------
516 
517 There are some functions to help manage credentials:
518 
519  - ``void put_cred(const struct cred *cred);``
520 
521      This releases a reference to the given set of credentials.  If the
522      reference count reaches zero, the credentials will be scheduled for
523      destruction by the RCU system.
524 
525  - ``const struct cred *get_cred(const struct cred *cred);``
526 
527      This gets a reference on a live set of credentials, returning a pointer to
528      that set of credentials.
529 
530  - ``struct cred *get_new_cred(struct cred *cred);``
531 
532      This gets a reference on a set of credentials that is under construction
533      and is thus still mutable, returning a pointer to that set of credentials.
534 
535 
536 Open File Credentials
537 =====================
538 
539 When a new file is opened, a reference is obtained on the opening task's
540 credentials and this is attached to the file struct as ``f_cred`` in place of
541 ``f_uid`` and ``f_gid``.  Code that used to access ``file->f_uid`` and
542 ``file->f_gid`` should now access ``file->f_cred->fsuid`` and
543 ``file->f_cred->fsgid``.
544 
545 It is safe to access ``f_cred`` without the use of RCU or locking because the
546 pointer will not change over the lifetime of the file struct, and nor will the
547 contents of the cred struct pointed to, barring the exceptions listed above
548 (see the Task Credentials section).
549 
550 To avoid "confused deputy" privilege escalation attacks, access control checks
551 during subsequent operations on an opened file should use these credentials
552 instead of "current"'s credentials, as the file may have been passed to a more
553 privileged process.
554 
555 Overriding the VFS's Use of Credentials
556 =======================================
557 
558 Under some circumstances it is desirable to override the credentials used by
559 the VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a
560 different set of credentials.  This is done in the following places:
561 
562  * ``sys_faccessat()``.
563  * ``do_coredump()``.
564  * nfs4recover.c.

~ [ source navigation ] ~ [ diff markup ] ~ [ identifier search ] ~

kernel.org | git.kernel.org | LWN.net | Project Home | SVN repository | Mail admin

Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.

sflogo.php