A Universally Unique IDentifier (UUID) URN Namespace
RFC 4122

Abstract

This specification defines a Uniform Resource Name namespace for
UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally
Unique IDentifier). A UUID is 128 bits long, and can
guarantee uniqueness across space and time. UUIDs were originally
used in the Apollo Network Computing System and later in the Open
Software Foundation's (OSF) Distributed Computing Environment (DCE),
and then in Microsoft Windows platforms.

This specification is derived from the DCE specification with the
kind permission of the OSF (now known as The Open Group). Information from earlier versions of the DCE specification have been
incorporated into this document.

Introduction

This specification defines a Uniform Resource Name namespace for
UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally
Unique IDentifier). A UUID is 128 bits long, and requires no central
registration process.

The information here is meant to be a concise guide for those wishing
to implement services using UUIDs as URNs. Nothing in this document
should be construed to override the DCE standards that defined UUIDs.

Motivation

One of the main reasons for using UUIDs is that no centralized
authority is required to administer them (although one format uses
IEEE 802 node identifiers, others do not). As a result, generation
on demand can be completely automated, and used for a
variety of purposes. The UUID generation algorithm described here
supports very high allocation rates of up to 10 million per second per machine
if necessary, so that they could even be used as transaction IDs.

UUIDs are of a fixed size (128 bits) which is reasonably small
compared to other alternatives. This lends itself well to sorting,
ordering, and hashing of all sorts, storing in databases, simple
allocation, and ease of programming in general.

Since UUIDs are unique and persistent, they make excellent Uniform
Resource Names. The unique ability to generate a new UUID without a
registration process allows for UUIDs to be one of the URNs with the
lowest minting cost.

Namespace Registration Template

Namespace ID: UUID

Registration Information:

Registration date: 2003-10-01

Declared registrant of the namespace:

JTC 1/SC6 (ASN.1 Rapporteur Group)

Declaration of syntactic structure:

A UUID is an identifier that is unique across both space and time,
with respect to the space of all UUIDs. Since a UUID is a fixed
size and contains a time field, it is possible for values to
rollover (around A.D. 3400, depending on the specific algorithm
used). A UUID can be used for multiple purposes, from tagging
objects with an extremely short lifetime, to reliably identifying
very persistent objects across a network.

The internal representation of a UUID is a specific sequence of
bits in memory, as described in Specification.
To accurately represent a UUID as a URN, it is necessary
to convert the bit sequence to a string representation.

Each field is treated as an integer and has its value printed as a
zero-filled hexadecimal digit string with the most significant
digit first. The hexadecimal values "a" through "f" are
output as lower case characters and are case insensitive on
input.

The formal definition of the UUID string representation is
provided by the following ABNF RFC 2234:

This document specifies three algorithms to generate UUIDs: the
first leverages the unique values of 802 MAC addresses to
guarantee uniqueness, the second uses pseudo-random number
generators, and the third uses cryptographic hashing and
application-provided text strings. As a result, the UUIDs
generated according to the mechanisms here will be unique from all
other UUIDs that have been or will be assigned.

Identifier persistence considerations:

UUIDs are inherently very difficult to resolve in a global sense.
This, coupled with the fact that UUIDs are temporally unique
within their spatial context, ensures that UUIDs will remain as
persistent as possible.

Process of identifier assignment:

Generating a UUID does not require that a registration
authority be contacted. One algorithm requires a unique value over
space for each generator. This value is typically an IEEE 802 MAC
address, usually already available on network-connected hosts. The
address can be assigned from an address block obtained from the
IEEE registration authority. If no such address is available, or
privacy concerns make its use undesirable,
Node IDs that Do Not Identify the Host specifies two
alternatives. Another approach is to use version 3 or version 4
UUIDs as defined below.

Process for identifier resolution:

Since UUIDs are not globally resolvable, this is not
applicable.

Rules for Lexical Equivalence:

Consider each field of the UUID to be an unsigned integer as
shown in the table in section Layout and Byte Order. Then,
to compare a pair of UUIDs, arithmetically compare the
corresponding fields from each UUID in order of significance and
according to their data type. Two UUIDs are equal if and only if
all the corresponding fields are equal.

As an implementation note, equality comparison can
be performed on many systems by doing the appropriate byte-order
canonicalization, and then treating the two UUIDs as 128-bit
unsigned integers.

UUIDs, as defined in this document, can also be ordered
lexicographically. For a pair of UUIDs, the first one follows the
second if the most significant field in which the UUIDs differ is
greater for the first UUID. The second precedes the first if the
most significant field in which the UUIDs differ is greater for
the second UUID.

Conformance with URN Syntax:

The string representation of a UUID is fully compatible with the
URN syntax. When converting from a bit-oriented, in-memory
representation of a UUID into a URN, care must be taken to
strictly adhere to the byte order issues mentioned in the string
representation section.

Validation mechanism:

Apart from determining whether the timestamp portion of the UUID is in
the future and therefore not yet assignable, there is no mechanism
for determining whether a UUID is 'valid'.

Scope:

UUIDs are global in scope.

Specification

Format

The UUID format is 16 octets; some bits of the eight octet variant
field specified below determine finer structure.

Variant

The variant field determines the layout of the UUID. That is, the
interpretation of all other bits in the UUID depends on the
setting of the bits in the variant field. As such, it could more
accurately be called a type field; we retain the original term for
compatibility. The variant field consists of a variable number of
the most significant bits of octet 8 of the UUID.

The following table lists the contents of the variant field,
where the letter "x" indicates a "don't-care" value.

Interoperability, in any form, with variants other than the one
defined here is not guaranteed, and is not likely to be an issue
in practice.

Layout and Byte Order

To minimize confusion about bit assignments within octets, the
UUID record definition is defined only in terms of fields that
are integral numbers of octets. The fields are presented with the
most significant one first.

Field Data Type Octet Note
#
time_low unsigned 32 0-3 The low field of the
bit integer timestamp
time_mid unsigned 16 4-5 The middle field of the
bit integer timestamp
time_hi_and_version unsigned 16 6-7 The high field of the
bit integer timestamp multiplexed
with the version number
clock_seq_hi_and_rese unsigned 8 8 The high field of the
rved bit integer clock sequence
multiplexed with the
variant
clock_seq_low unsigned 8 9 The low field of the
bit integer clock sequence
node unsigned 48 10-15 The spatially unique
bit integer node identifier

In the absence of explicit application or presentation protocol
specification to the contrary, a UUID is encoded as a 128-bit
object, as follows:

The fields are encoded as 16 octets, with the
sizes and order of the fields defined above, and with each field
encoded with the Most Significant Byte first (known as
network byte order). Note that the field names, particularly
for multiplexed fields, follow historical
practice.

The version is more accurately a sub-type; again, we retain the
term for compatibility.

Timestamp

The timestamp is a 60-bit value. For UUID version 1, this is
represented by Coordinated Universal Time (UTC) as a count of
100-nanosecond intervals since 00:00:00.00, 15 October 1582 (the
date of Gregorian reform to the Christian calendar).

For systems that do not have UTC available, but do have the
local time, they may use that instead of UTC, as long as they do
so consistently throughout the system. However, this is not recommended
since generating the UTC from local time only needs a time zone offset.

Clock Sequence

For UUID version 1, the clock sequence is used to help avoid
duplicates that could arise when the clock is set backwards in
time or if the node ID changes.

If the clock is set backwards, or might have been set
backwards (e.g., while the system was powered off), and the UUID
generator can not be sure that no UUIDs were generated with
timestamps larger than the value to which the clock was set,
then the clock sequence has to be changed. If the previous value
of the clock sequence is known, it can just be incremented;
otherwise it should be set to a random or high-quality pseudo-random
value.

Similarly, if the node ID changes (e.g., because a network card
has been moved between machines), setting the clock sequence to
a random number minimizes the probability of a duplicate due to
slight differences in the clock settings of the machines. If
the value of clock sequence associated with the changed node ID
were known, then the clock sequence could just be incremented,
but that is unlikely.

The clock sequence MUST be originally (i.e., once in the
lifetime of a system) initialized to a random number to minimize
the correlation across systems. This provides maximum protection
against node identifiers that may move or switch from system to
system rapidly. The initial value MUST NOT be correlated to the
node identifier.

Node

For UUID version 1, the node field consists of an IEEE 802
MAC address, usually the host address. For systems with multiple
IEEE 802 addresses, any available one can be used. The lowest
addressed octet (octet number 10) contains the global/local bit
and the unicast/multicast bit, and is the first octet of the
address transmitted on an 802.3 LAN.

For systems with no IEEE address, a randomly or pseudo-randomly
generated value may be used; see Node IDs that Do Not Identify the Host.
The multicast bit must be set in such addresses, in order that
they will never conflict with addresses obtained from network
cards.

Nil UUID

The nil UUID is special form of UUID that is specified to have
all 128 bits set to zero.

Algorithms for Creating a Time-Based UUID

Various aspects of the algorithm for creating a version 1 UUID are
discussed in the following sections.

Basic Algorithm

The following algorithm is simple, correct, and inefficient:
compact="no"

Obtain a system-wide global lock

From a system-wide shared stable store (e.g., a file), read
the UUID generator state: the values of the timestamp, clock
sequence, and node ID used to generate the last UUID.

Get the current time as a 60-bit count of 100-nanosecond
intervals since 00:00:00.00, 15 October 1582.

Get the current node ID.

If the state was unavailable (e.g., non-existent or
corrupted), or the saved node ID is different than the current
node ID, generate a random clock sequence value.

If the state was available, but the saved time stamp is later
than the current timestamp, increment the clock sequence
value.

Save the state (current time stamp, clock sequence, and node
ID) back to the stable store.

Release the global lock.

Format a UUID from the current timestamp, clock sequence,
and node ID values according to the steps in
Generation Details.

compact="yes"

If UUIDs do not need to be frequently generated, the above
algorithm may be perfectly adequate. For higher performance
requirements, however, issues with the basic algorithm include:
compact="no"

Reading the state from stable storage each time is
inefficient.

The resolution of the system clock may not be
100-nanoseconds.

Writing the state to stable storage each time is
inefficient.

Sharing the state across process boundaries may be
inefficient.

compact="yes"

Each of these issues can be addressed in a modular fashion by
local improvements in the functions that read and write the state
and read the clock. We address each of them in turn in the
following sections.

Reading Stable Storage

The state only needs to be read from stable storage once at boot
time, if it is read into a system-wide shared volatile store (and
updated whenever the stable store is updated).

If an implementation does not have any stable store available,
then it can always say that the values were unavailable. This is
the least desirable implementation because it will increase the
frequency of creation of new clock sequence numbers, which
increases the probability of duplicates.

If the node ID can never change (e.g., the net card is inseparable
from the system), or if any change also reinitializes the clock
sequence to a random value, then instead of keeping it in stable
store, the current node ID may be returned.

System Clock Resolution

The timestamp is generated from the system time, whose resolution
may be less than the resolution of the UUID timestamp.

If UUIDs do not need to be frequently generated, the time stamp
can simply be the system time multiplied by the number of
100-nanosecond intervals per system time interval.

If a system overruns the generator by requesting too many UUIDs
within a single system time interval, the UUID service MUST either
return an error, or stall the UUID generator until the system clock
catches up.

A high resolution timestamp can be simulated by keeping a count of
the number of UUIDs that have been generated with the same value of the
system time, and using it to construct the low order bits of the
timestamp. The count will range between zero and the number of
100-nanosecond intervals per system time interval.

Note: If the processors overrun the UUID generation frequently,
additional node identifiers can be allocated to the system, which
will permit higher speed allocation by making multiple UUIDs
potentially available for each time stamp value.

Writing Stable Storage

The state does not always need to be written to stable store
every time a UUID is generated. The timestamp in the stable store
can be periodically set to a value larger than any yet used in a
UUID. As long as the generated UUIDs have timestamps less than
that value, and the clock sequence and node ID remain unchanged,
only the shared volatile copy of the state needs to be updated.
Furthermore, if the time stamp value in stable store is in the
future by less than the typical time it takes the system to
reboot, a crash will not cause a reinitialization of the clock
sequence.

Sharing State Across Processes

If it is too expensive to access shared state each time a UUID is
generated, then the system-wide generator can be implemented to
allocate a block of time stamps each time it is called; a
per-process generator can allocate from that block until it is
exhausted.

Generation Details

Version 1 UUIDs are generated according to the following
algorithm:
compact="no"

Determine the values for the UTC-based timestamp and clock
sequence to be used in the UUID, as described in
Basic Algorithm.

For the purposes of this algorithm, consider the timestamp
to be a 60-bit unsigned integer and the clock sequence to be
a 14-bit unsigned integer. Sequentially number the bits in a
field, starting with zero for the least significant bit.

Set the time_low field equal to the least significant 32
bits (bits zero through 31) of the timestamp in the same
order of significance.

Set the time_mid field equal to bits 32 through 47 from the
timestamp in the same order of significance.

Set the 12 least significant bits (bits zero through 11) of
the time_hi_and_version field equal to bits 48 through 59
from the timestamp in the same order of significance.

Set the four most significant bits (bits 12 through 15) of
the time_hi_and_version field to the 4-bit version number
corresponding to the UUID version being created, as shown in
the table above.

Set the clock_seq_low field to the eight least significant
bits (bits zero through 7) of the clock sequence in the
same order of significance.

Set the 6 least significant bits (bits zero through 5)
of the clock_seq_hi_and_reserved field to the 6 most
significant bits (bits 8 through 13) of the clock
sequence in the same order of significance.

Set the two most significant bits (bits 6 and 7) of
the clock_seq_hi_and_reserved to zero and one,
respectively.

Set the node field to the 48-bit IEEE address in the same
order of significance as the address.

compact="yes"

Algorithm for Creating a Name-Based UUID

The version 3 or 5 UUID is meant for generating UUIDs from "names" that
are drawn from, and unique within, some "name space". The concept
of name and name space should be broadly construed, and not limited
to textual names. For example, some name spaces are the domain
name system, URLs, ISO Object IDs (OIDs), X.500 Distinguished Names
(DNs), and reserved words in a programming language. The mechanisms
or conventions used for allocating names and ensuring their
uniqueness within their name spaces are beyond the scope of this
specification.

The requirements for these types of UUIDs are as follows:
compact="no"

The UUIDs generated at different times from the same name in
the same namespace MUST be equal.

The UUIDs generated from two different names in the same
namespace should be different (with very high probability).

The UUIDs generated from the same name in two different
namespaces should be different with (very high probability).

If two UUIDs that were generated from names are equal, then
they were generated from the same name in the same namespace
(with very high probability).

compact="yes"

The algorithm for generating a UUID from a name and a name
space are as follows:
compact="no"

Allocate a UUID to use as a "name space ID" for all UUIDs
generated from names in that name space; see
Appendix C - Some Name Space IDs for some pre-defined values.

Choose either MD5 or SHA-1 as the hash algorithm;
If backward compatibility is not an issue, SHA-1 is
preferred.

Convert the name to a canonical sequence of octets (as defined
by the standards or conventions of its name space); put the name
space ID in network byte order.

Compute the hash of the name
space ID concatenated with the name.

Set octets zero through 3 of the time_low field to octets
zero through 3 of the hash.

Set octets zero and one of the time_mid field to octets 4
and 5 of the hash.

Set octets zero and one of the time_hi_and_version field to
octets 6 and 7 of the hash.

Set the four most significant bits (bits 12 through 15) of the
time_hi_and_version field to the appropriate 4-bit version
number from Version.

Set the clock_seq_hi_and_reserved field to octet 8 of the
hash.

Set the two most significant bits (bits 6 and 7) of the
clock_seq_hi_and_reserved to zero and one, respectively.

Set the clock_seq_low field to octet 9 of the hash.

Set octets zero through five of the node field to octets 10
through 15 of the hash.

Convert the resulting UUID to local byte order.

compact="yes"

Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers

The version 4 UUID is meant for generating UUIDs from truly-random or
pseudo-random numbers.

The algorithm is as follows:
compact="no"

Set the two most significant bits (bits 6 and 7) of the
clock_seq_hi_and_reserved to zero and one, respectively.

Set the four most significant bits (bits 12 through 15) of the
time_hi_and_version field to the 4-bit version number from
Version.

Set all the other bits to randomly (or pseudo-randomly) chosen
values.

A better solution is to obtain a 47-bit cryptographic quality
random number and use it as the low 47 bits of the node ID, with the
least significant bit of the first octet of the node ID set to one.
This bit is the unicast/multicast bit, which will never be set in
IEEE 802 addresses obtained from network cards. Hence, there can
never be a conflict between UUIDs generated by machines with and
without network cards. (Recall that the IEEE 802 spec talks
about transmission order, which is the opposite of the in-memory
representation that is discussed in this document.)

For compatibility with earlier specifications, note that this
document uses the unicast/multicast bit, instead of the arguably
more correct local/global bit.

Advice on generating cryptographic-quality random numbers can be
found in RFC1750.

In addition, items such as the computer's name and the name of the
operating system, while not strictly speaking random, will help
differentiate the results from those obtained by other systems.

The exact algorithm to generate a node ID using these data is system
specific, because both the data available and the functions to obtain
them are often very system specific. A generic approach, however, is
to accumulate as many sources as possible into a buffer, use
a message digest such as MD5 or
SHA-1, take an arbitrary 6
bytes from the hash value, and set the multicast bit as described
above.

Community Considerations

The use of UUIDs is extremely pervasive in computing. They comprise
the core identifier infrastructure for many operating systems
(Microsoft Windows) and applications (the Mozilla browser) and in many
cases, become exposed to the Web in many non-standard ways. This
specification attempts to standardize that practice as openly as
possible and in a way that attempts to benefit the entire
Internet.

Security Considerations

Do not assume that UUIDs are hard to guess; they should not be used
as security capabilities (identifiers whose mere possession grants
access), for example. A predictable random number source will
exacerbate the situation.

Do not assume that it is easy to determine if a UUID has been
slightly transposed in order to redirect a reference to another
object. Humans do not have the ability to easily check the integrity
of a UUID by simply glancing at it.

Distributed applications generating UUIDs at a variety of hosts must
be willing to rely on the random number source at all hosts. If this
is not feasible, the namespace variant should be used.

Acknowledgments

This document draws heavily on the OSF DCE specification for UUIDs.
Ted Ts'o provided helpful comments, especially on the byte ordering
section which we mostly plagiarized from a proposed wording he
supplied (all errors in that section are our responsibility,
however).

We are also grateful to the careful reading and bit-twiddling of
Ralf S. Engelschall, John Larmouth, and Paul Thorpe. Professor
Larmouth was also invaluable in achieving coordination with ISO/IEC.

References

Procedures for the operation of OSI Registration Authorities:
Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components
(ITU-T Rec. X.687),
2004

Appendix

Appendix A - Sample Implementation

This implementation consists of 5 files: uuid.h, uuid.c, sysdep.h,
sysdep.c and utest.c. The uuid.* files are the system independent
implementation of the UUID generation algorithms described above, with
all the optimizations described above except efficient state sharing
across processes included. The code has been tested on Linux (Red Hat
4.0) with GCC (2.7.2), and Windows NT 4.0 with VC++ 5.0. The code
assumes 64-bit integer support, which makes it much clearer.

All the following source files should have the
following copyright notice included: