Network Working Group F. Gont
Internet-Draft SI6 Networks / UTN-FRH
Intended status: Best Current Practice I. Arce
Expires: September 1, 2018 Quarkslab
February 28, 2018
Security and Privacy Implications of Numeric Identifiers Employed inNetwork Protocolsdraft-gont-predictable-numeric-ids-02
Abstract
This document performs an analysis of the security and privacy
implications of different types of "numeric identifiers" used in IETF
protocols, and tries to categorize them based on their
interoperability requirements and the associated failure severity
when such requirements are not met. It describes a number of
algorithms that have been employed in real implementations to meet
such requirements and analyzes their security and privacy properties.
Additionally, it provides advice on possible algorithms that could be
employed to satisfy the interoperability requirements of each
identifier type, while minimizing the security and privacy
implications, thus providing guidance to protocol designers and
protocol implementers. Finally, it provides recommendations for
future protocol specifications regarding the specification of the
aforementioned numeric identifiers.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 1, 2018.
Gont & Arce Expires September 1, 2018 [Page 1]

Internet-Draft Predictable Numeric IDs February 2018
Recent history indicates that when new protocols are standardized or
new protocol implementations are produced, the security and privacy
properties of the associated identifiers tend to be overlooked and
inappropriate algorithms to generate identifier values are either
suggested in the specification or selected by implementators. As a
result, we believe that advice in this area is warranted.
This document contains a non-exhaustive survey of identifiers
employed in various IETF protocols, and aims to categorize such
identifiers based on their interoperability requirements, and the
associated failure severity when such requirements are not met.
Subsequently, it analyzes several algorithms that have been employed
in real implementation to meet such requirements and analyzes their
security and privacy properties, and provides advice on possible
algorithms that could be employed to satisfy the interoperability
requirements of each category, while minimizing the associated
security and privacy implications. Finally, it provides
recommendations for future protocol specifications regarding the
specification of the aforementioned numeric identifiers.
2. Terminology
Identifier:
A data object in a protocol specification that can be used to
definetely distinguish a protocol object (a datagram, network
interface, transport protocol endpoint, session, etc) from all
other objects of the same type, in a given context. Identifiers
are usually defined as a series of bits and represented using
integer values. We note that different identifiers may have
additional requirements or properties depending on their specific
use in a protocol. We use the term "identifier" as a generic term
to refer to any data object in a protocol specification that
satisfies the identification property stated above.
Failure Severity:
The consequences of a failure to comply with the interoperability
requirements of a given identifier. Severity considers the worst
potential consequence of a failure, determined by the system
damage and/or time lost to repair the failure. In this document
we define two types of failure severity: "soft" and "hard".
Hard Failure:
A hard failure is a non-recoverable condition in which a protocol
does not operate in the prescribed manner or it operates with
excessive degradation of service. For example, an established TCP
connection that is aborted due to an error condition constitutes,
from the point of view of the transport protocol, a hard failure,
Gont & Arce Expires September 1, 2018 [Page 4]

Internet-Draft Predictable Numeric IDs February 2018
since it enters a state from which normal operation cannot be
recovered.
Soft Failure:
A soft failure is a recoverable condition in which a protocol does
not operate in the prescribed manner but normal operation can be
resumed automatically in a short period of time. For example, a
simple packet-loss event that is subsequently recovered with a
retransmission can be considered a soft failure.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
3. Threat Model
Throughout this document, we assume an attacker does not have
physical or logical device to the device(s) being attacked. We
assume the attacker can simply send any traffic to the target
devices, to e.g. sample identifiers employed by such devices.
4. Issues with the Specification of Identifiers
While assessing protocol specifications regarding the use of
identifiers, we found that most of the issues discussed in this
document arise as a result of one of the following:
o Protocol specifications which under-specify the requirements for
their identifiers
o Protocol specifications that over-specify their identifiers
o Protocol implementations that simply fail to comply with the
specified requirements
A number of protocol implementations (too many of them) simply
overlook the security and privacy implications of identifiers.
Examples of them are the specification of TCP port numbers in
[RFC0793], the specification of TCP sequence numbers in [RFC0793], or
the specification of the DNS TxID in [RFC1035].
On the other hand, there are a number of protocol specifications that
over-specify some of their associated protocol identifiers. For
example, [RFC4291] essentially results in link-layer addresses being
embedded in the IPv6 Interface Identifiers (IIDs) when the
interoperability requirement of uniqueness could be achieved in other
ways that do not result in negative security and privacy implications
[RFC7721]. Similarly, [RFC2460] suggests the use of a global counter
Gont & Arce Expires September 1, 2018 [Page 5]

Internet-Draft Predictable Numeric IDs February 2018
for the generation of Fragment Identification values, when the
interoperability properties of uniqueness per {Src IP, Dst IP} could
be achieved with other algorithms that do not result in negative
security and privacy implications.
Finally, there are protocol implementations that simply fail to
comply with existing protocol specifications. For example, some
popular operating systems (notably Microsoft Windows) still fail to
implement randomization of transport protocol ephemeral ports, as
specified in [RFC6056].
5. Timeline of Vulnerability Disclosures Related to Some Sample Identifiers
This section contains a non-exhaustive timeline of vulnerability
disclosures related to some sample identifiers and other work that
has led to advances in this area. The goal of this timeline is to
illustrate:
o That vulnerabilities related to how the values for some
identifiers are generated and assigned have affected
implementations for an extremely long period of time.
o That such vulnerabilities, even when addressed for a given
protocol version, were later reintroduced in new versions or new
implementations of the same protocol.
o That standardization efforts that discuss and provide advice in
this area can have a positive effect on protocol specifications
and protocol implementations.
5.1. IPv4/IPv6 Identification
December 1998:
[Sanfilippo1998a] finds that predictable IPv4 Identification
values can be leveraged to count the number of packets sent by a
target node. [Sanfilippo1998b] explains how to leverage the same
vulnerability to implement a port-scanning technique known as
dumb/idle scan. A tool that implements this attack is publicly
released.
November 1999:
[Sanfilippo1999] discusses how to leverage predictable IPv4
Identification to uncover the rules of a number of firewalls.
November 1999:
[Bellovin2002] explains how the IPv4 Identification field can be
exploited to count the number of systems behind a NAT.
Gont & Arce Expires September 1, 2018 [Page 6]

Internet-Draft Predictable Numeric IDs February 2018
December 2003:
[Zalewski2003] explains a technique to perform TCP data injection
attack based on predictable IPv4 identification values which
requires less effort than TCP injection attacks performed with
bare TCP packets.
November 2005:
[Silbersack2005] discusses shortcoming in a number of techniques
to mitigate predictable IPv4 Identification values.
October 2007:
[Klein2007] describes a weakness in the pseudo random number
generator (PRNG) in use for the generation of the IP
Identification by a number of operating systems.
June 2011:
[Gont2011] describes how to perform idle scan attacks in IPv6.
November 2011:
Linux mitigates predictable IPv6 Identification values
[RedHat2011] [SUSE2011] [Ubuntu2011].
December 2011:
[I-D.ietf-6man-predictable-fragment-id-08] describes the security
implications of predictable IPv6 Identification values, and
possible mitigations.
May 2012:
[Gont2012] notes that some major IPv6 implementations still employ
predictable IPv6 Identification values.
June 2015:
[I-D.ietf-6man-predictable-fragment-id-08] notes that some popular
host and router implementations still employ predictable IPv6
Identification values.
5.2. TCP Initial Sequence Numbers (ISNs)
September 1981:
[RFC0793], suggests the use of a global 32-bit ISN generator,
whose lower bit is incremented roughly every 4 microseconds.
However, such an ISN generator makes it trivial to predict the ISN
that a TCP will use for new connections, thus allowing a variety
of attacks against TCP.
February 1985:
Gont & Arce Expires September 1, 2018 [Page 7]

Internet-Draft Predictable Numeric IDs February 2018
[Morris1985] was the first to describe how to exploit predictable
TCP ISNs for forging TCP connections that could then be leveraged
for trust relationship exploitation.
April 1989:
[Bellovin1989] discussed the security implications of predictable
ISNs (along with a range of other protocol-based vulnerabilities).
February 1995:
[Shimomura1995] reported a real-world exploitation of the attack
described in 1985 (ten years before) in [Morris1985].
May 1996:
[RFC1948] was the first IETF effort, authored by Steven Bellovin,
to address predictable TCP ISNs. The same concept specified in
this document for TCP ISNs was later proposed for TCP ephemeral
ports [RFC6056], TCP Timestamps, and eventually even IPv6
Interface Identifiers [RFC7217].
March 2001:
[Zalewski2001] provides a detailed analysis of statistical
weaknesses in some ISN generators, and includes a survey of the
algorithms in use by popular TCP implementations.
May 2001:
Vulnerability advisories [CERT2001] [USCERT2001] are released
regarding statistical weaknesses in some ISN generators, affecting
popular TCP/IP implementations.
March 2002:
[Zalewski2002] updates and complements [Zalewski2001]. It
concludes that "while some vendors [...] reacted promptly and
tested their solutions properly, many still either ignored the
issue and never evaluated their implementations, or implemented a
flawed solution that apparently was not tested using a known
approach". [Zalewski2002].
February 2012:
[RFC6528], after 27 years of Morris' original work [Morris1985],
formally updates [RFC0793] to mitigate predictable TCP ISNs.
August 2014:
[I-D.eddy-rfc793bis-04], the upcoming revision of the core TCP
protocol specification, incorporates the algorithm specified in
[RFC6528] as the recommended algorithm for TCP ISN generation.
Gont & Arce Expires September 1, 2018 [Page 8]

Internet-Draft Predictable Numeric IDs February 20186. Protocol Failure SeveritySection 2 defines the concept of "Failure Severity" and two types of
failures that we employ throughout this document: soft and hard.
Our analysis of the severity of a failure is performed from the point
of view of the protocol in question. However, the corresponding
severity on the upper application or protocol may not be the same as
that of the protocol in question. For example, a TCP connection that
is aborted may or may not result in a hard failure of the upper
application: if the upper application can establish a new TCP
connection without any impact on the application, a hard failure at
the TCP protocol may have no severity at the application level. On
the other hand, if a hard failure of a TCP connection results in
excessive degradation of service at the application layer, it will
also result in a hard failure at the application.
7. Categorizing Identifiers
This section includes a non-exhaustive survey of identifiers, and
proposes a number of categories that can accommodate these
identifiers based on their interoperability requirements and their
failure modes (soft or hard)
+------------+--------------------------------------+---------------+
| Identifier | Interoperability Requirements | Failure |
| | | Severity |
+------------+--------------------------------------+---------------+
| IPv6 Frag | Uniqueness (for IP address pair) | Soft/Hard (1) |
| ID | | |
+------------+--------------------------------------+---------------+
| IPv6 IID | Uniqueness (and constant within IPv6 | Soft (3) |
| | prefix) (2) | |
+------------+--------------------------------------+---------------+
| TCP SEQ | Monotonically-increasing | Hard (4) |
+------------+--------------------------------------+---------------+
| TCP eph. | Uniqueness (for connection ID) | Hard |
| port | | |
+------------+--------------------------------------+---------------+
| IPv6 Flow | Uniqueness | None (5) |
| L. | | |
+------------+--------------------------------------+---------------+
| DNS TxID | Uniqueness | None (6) |
+------------+--------------------------------------+---------------+
Table 1: Survey of Identifiers
Notes:
Gont & Arce Expires September 1, 2018 [Page 9]

Internet-Draft Predictable Numeric IDs February 2018
(1)
While a single collision of Fragment ID values would simply lead
to a single packet drop (and hence a "soft" failure), repeated
collisions at high data rates might trash the Fragment ID space,
leading to a hard failure [RFC4963].
(2)
While the interoperability requirements are simply that the
Interface ID results in a unique IPv6 address, for operational
reasons it is typically desirable that the resulting IPv6 address
(and hence the corresponding Interface ID) be constant within each
network [I-D.ietf-6man-default-iids] [RFC7217].
(3)
While IPv6 Interface IDs must result in unique IPv6 addresses,
IPv6 Duplicate Address Detection (DAD) [RFC4862] allows for the
detection of duplicate Interface IDs/addresses, and hence such
Interface ID collisions can be recovered.
(4)
In theory there are no interoperability requirements for TCP
sequence numbers, since the TIME-WAIT state and TCP's "quiet time"
take care of old segments from previous incarnations of the
connection. However, a widespread optimization allows for a new
incarnation of a previous connection to be created if the Initial
Sequence Number (ISN) of the incoming SYN is larger than the last
sequence number seen in that direction for the previous
incarnation of the connection. Thus, monotonically-increasing TCP
sequence numbers allow for such optimization to work as expected
[RFC6528].
(5)
The IPv6 Flow Label is typically employed for load sharing
[RFC7098], along with the Source and Destination IPv6 addresses.
Reuse of a Flow Label value for the same set {Source Address,
Destination Address} would typically cause both flows to be
multiplexed into the same link. However, as long as this does not
occur deterministically, it will not result in any negative
implications.
(6)
DNS TxIDs are employed, together with the Source Address,
Destination Address, Source Port, and Destination Port, to match
DNS requests and responses. However, since an implementation
knows which DNS requests were sent for that set of {Source
Address, Destination Address, Source Port, and Destination Port,
DNS TxID}, a collision of TxID would result, if anything, in a
small performance penalty (the response would be discarded when it
Gont & Arce Expires September 1, 2018 [Page 10]

Internet-Draft Predictable Numeric IDs February 2018
/* Ephemeral port selection function */
id_range = max_id - min_id + 1;
next_id = min_id + (random() % id_range);
count = next_id;
do {
if(check_suitable_id(next_id))
return next_id;
if (next_id == max_id) {
next_id = min_id;
} else {
next_id++;
}
count--;
} while (count > 0);
return ERROR;
Note:
random() is a function that returns a pseudo-random unsigned
integer number of appropriate size. Note that the output needs to
be unpredictable, and typical implementations of POSIX random()
function do not necessarily meet this requirement. See [RFC4086]
for randomness requirements for security.
The function check_suitable_id() can check, when possible, whether
this identifier is e.g. already in use. When already used, this
algorithm selects the next available protocol ID.
All the variables (in this and all the algorithms discussed in
this document) are unsigned integers.
8.1.2. Another Simple Randomization Algorithm
The following pseudo-code illustrates another algorithm for selecting
a random identifier in which, in the event the identifier is found to
be not suitable (e.g., already in use), another identifier is
selected randomly:
Gont & Arce Expires September 1, 2018 [Page 12]

Internet-Draft Predictable Numeric IDs February 2018
id_range = max_id - min_id + 1;
next_id = min_id + (random() % id_range);
count = id_range;
do {
if(check_suitable_id(next_id))
return next_id;
next_id = min_id + (random() % id_range);
count--;
} while (count > 0);
return ERROR;
This algorithm might be unable to select an identifier (i.e., return
"ERROR") even if there are suitable identifiers available, when there
are a large number of identifiers "in use".
8.2. Category #2: Uniqueness (hard failure)
One of the most trivial approaches for achieving uniqueness for an
identifier (with a hard failure mode) is to implement a linear
function. As a result, all of the algorithms described in
Section 8.4 are of use for complying the requirements of this
identifier category.
8.3. Category #3: Uniqueness, constant within context (soft-failure)
The goal of this algorithm is to produce identifiers that are
constant for a given context, but that change when the aforementioned
context changes.
Keeping one value for each possible "context" may in many cases be
considered too onerous in terms of memory requirements. As a
workaround, the following algorithm employs a calculated technique
(as opposed to keeping state in memory) to maintain the constant
identifier for each given context.
In the following algorithm, the function F() provides (statelessly) a
constant identifier for each given context.
Gont & Arce Expires September 1, 2018 [Page 13]

Internet-Draft Predictable Numeric IDs February 2018
/* Protocol ID selection function */
id_range = max_id - min_id + 1;
counter = 0;
do {
offset = F(CONTEXT, counter, secret_key);
next_id = min_id + (offset % id_range);
if(check_suitable_id(next_id))
return next_id;
counter++;
} while (counter <= MAX_RETRIES);
return ERROR;
The function F() provides a "per-CONTEXT" constant identifier for a
given context. 'offset' may take any value within the storage type
range since we are restricting the resulting identifier to be in the
range [min_id, max_id] in a similar way as in the algorithm described
in Section 8.1.1. Collisions can be recovered by incrementing the
'counter' variable and recomputing F().
The function F() should be a cryptographic hash function like SHA-256
[FIPS-SHS]. Note: MD5 [RFC1321] is considered unacceptable for F()
[RFC6151]. CONTEXT is the concatenation of all the elements that
define a given context. For example, if this algorithm is expected
to produce identifiers that are unique per network interface card
(NIC) and SLAAC autoconfiguration prefix, the CONTEXT should be the
concatenation of e.g. the interface index and the SLAAC
autoconfiguration prefix (please see [RFC7217] for an implementation
of this algorithm for the generation of IPv6 IIDs).
The secret should be chosen to be as random as possible (see
[RFC4086] for recommendations on choosing secrets).
8.4. Category #4: Uniqueness, monotonically increasing within context
(hard failure)
8.4.1. Predictable Linear Identifiers Algorithm
One of the most trivial ways to achieve uniqueness with a low
identifier reuse frequency is to produce a linear sequence. This
obviously assumes that each identifier will be used for a similar
period of time.
Gont & Arce Expires September 1, 2018 [Page 14]

Internet-Draft Predictable Numeric IDs February 2018
For example, the following algorithm has been employed in a number of
operating systems for selecting IP fragment IDs, TCP ephemeral ports,
etc.
/* Initialization at system boot time. Could be random */
next_id = min_id;
id_inc= 1;
/* Identifier selection function */
count = max_id - min_id + 1;
do {
if (next_id == max_id) {
next_id = min_id;
}
else {
next_id = next_id + id_inc;
}
if (check_suitable_id(next_id))
return next_id;
count--;
} while (count > 0);
return ERROR;
Note:
check_suitable_id() is a function that checks whether the
resulting identifier is acceptable (e.g., whether its in use,
etc.).
For obvious reasons, this algorithm results in predicable sequences.
If a global counter is used (such as "next_id" in the example above),
a node that learns one protocol identifier can also learn or guess
values employed by past and future protocol instances. On the other
hand, when the value of increments is known (such as "1" in this
case), an attacker can sample two values, and learn the number of
identifiers that were generated in-between.
Where identifier reuse would lead to a hard failure, one typical
approach to generate unique identifiers (while minimizing the
security and privacy implications of predictable identifiers) is to
obfuscate the resulting protocol IDs by either:
o Replace the global counter with multiple counters (initialized to
a random value)
Gont & Arce Expires September 1, 2018 [Page 15]

Internet-Draft Predictable Numeric IDs February 2018
o Randomizing the "increments"
Avoiding global counters essentially means that learning one
identifier for a given context (e.g., one TCP ephemeral port for a
given {src IP, Dst IP, Dst Port}) is of no use for learning or
guessing identifiers for a different context (e.g., TCP ephemeral
ports that involve other peers). However, this may imply keeping one
additional variable/counter per context, which may be prohibitive in
some environments. The choice of id_inc has implications on both the
security and privacy properties of the resulting identifiers, but
also on the corresponding interoperability properties. On one hand,
minimizing the increments (as in "id_inc = 1" in our case) generally
minimizes the identifier reuse frequency, albeit at increased
predictability. On the other hand, if the increments are randomized
predictability of the resulting identifiers is reduced, and the
information leakage produced by global constant increments is
mitigated.
8.4.2. Per-context Counter Algorithm
One possible way to achieve similar (or even lower) identifier reuse
frequency while still avoiding predictable sequences would be to
employ a per-context counter, as opposed to a global counter. Such
an algorithm could be described as follows:
Gont & Arce Expires September 1, 2018 [Page 16]

Internet-Draft Predictable Numeric IDs February 2018
/* Initialization at system boot time. Could be random */
id_inc= 1;
/* Identifier selection function */
count = max_id - min_id + 1;
if(lookup_counter(CONTEXT) == ERROR){
create_counter(CONTEXT);
}
next_id= lookup_counter(CONTEXT);
do {
if (next_id == max_id) {
next_id = min_id;
}
else {
next_id = next_id + id_inc;
}
if (check_suitable_id(next_id)){
store_counter(CONTEXT, next_id);
return next_id;
}
count--;
} while (count > 0);
store_counter(CONTEXT, next_id);
return ERROR;
NOTE:
lookup_counter() returns the current counter for a given context,
or an error condition if such a counter does not exist.
create_counter() creates a counter for a given context, and
initializes such counter to a random value.
store_counter() saves (updates) the current counter for a given
context.
check_suitable_id() is a function that checks whether the
resulting identifier is acceptable (e.g., whether its in use,
etc.).
Essentially, whenever a new identifier is to be selected, the
algorithm checks whether there there is a counter for the
Gont & Arce Expires September 1, 2018 [Page 17]

Internet-Draft Predictable Numeric IDs February 2018
corresponding context. If there is, such counter is incremented to
obtain the new identifier, and the new identifier updates the
corresponding counter. If there is no counter for such context, a
new counter is created an initialized to a random value, and used as
the new identifier.
This algorithm produces a per-context counter, which results in one
linear function for each context. Since the origin of each "line" is
a random value, the resulting values are unknown to an off-path
attacker.
This algorithm has the following drawbacks:
o If, as a result of resource management, the counter for a given
context must be removed, the last identifier value used for that
context will be lost. Thus, if subsequently an identifier needs
to be generated for such context, that counter will need to be
recreated and reinitialized to random value, thus possibly leading
to reuse/collistion of identifiers.
o If the identifiers are predictable by the destination system
(e.g., the destination host represents the context), a vulnerable
host might possibly leak to third parties the identifiers used by
other hosts to send traffic to it (i.e., a vulnerable Host B could
leak to Host C the identifier values that Host A is using to send
packets to Host B). Appendix A of [RFC7739] describes one
possible scenario for such leakage in detail.
8.4.3. Simple Hash-Based Algorithm
The goal of this algorithm is to produce monotonically-increasing
sequences, with a randomized initial value, for each given context.
For example, if the identifiers being generated must be unique for
each {src IP, dst IP} set, then each possible combination of {src IP,
dst IP} should have a corresponding "next_id" value.
Keeping one value for each possible "context" may in many cases be
considered too onerous in terms of memory requirements. As a
workaround, the following algorithm employs a calculated technique
(as opposed to keeping state in memory) to maintain the random offset
for each possible context.
In the following algorithm, the function F() provides (statelessly) a
random offset for each given context.
Gont & Arce Expires September 1, 2018 [Page 18]

Internet-Draft Predictable Numeric IDs February 2018
/* Initialization at system boot time. Could be random. */
counter = 0;
/* Protocol ID selection function */
id_range = max_id - min_id + 1;
offset = F(CONTEXT, secret_key);
count = id_range;
do {
next_id = min_id +
(counter + offset) % id_range;
counter++;
if(check_suitable_id(next_id))
return next_id;
count--;
} while (count > 0);
return ERROR;
The function F() provides a "per-CONTEXT" fixed offset within the
identifier space. Both the 'offset' and 'counter' variables may take
any value within the storage type range since we are restricting the
resulting identifier to be in the range [min_id, max_id] in a similar
way as in the algorithm described in Section 8.1.1. This allows us
to simply increment the 'counter' variable and rely on the unsigned
integer to wrap around.
The function F() should be a cryptographic hash function like SHA-256
[FIPS-SHS]. Note: MD5 [RFC1321] is considered unacceptable for F()
[RFC6151]. CONTEXT is the concatenation of all the elements that
define a given context. For example, if this algorithm is expected
to produce identifiers that are monotonically-increasing for each set
(Source IP Address, Destination IP Address), the CONTEXT should be
the concatenation of these two values.
The secret should be chosen to be as random as possible (see
[RFC4086] for recommendations on choosing secrets).
It should be noted that, since this algorithm uses a global counter
("counter") for selecting identifiers, if an attacker could, e.g.,
force a client to periodically establish a new TCP connection to an
attacker-controlled machine (or through an attacker-observable
routing path), the attacker could substract consecutive source port
Gont & Arce Expires September 1, 2018 [Page 19]

Internet-Draft Predictable Numeric IDs February 2018
values to obtain the number of outgoing TCP connections established
globally by the target host within that time period (up to wrap-
around issues and five-tuple collisions, of course).
8.4.4. Double-Hash Algorithm
A trade-off between maintaining a single global 'counter' variable
and maintaining 2**N 'counter' variables (where N is the width of the
result of F()) could be achieved as follows. The system would keep
an array of TABLE_LENGTH integers, which would provide a separation
of the increment of the 'counter' variable. This improvement could
be incorporated into the algorithm from Section 8.4.3 as follows:
/* Initialization at system boot time */
for(i = 0; i < TABLE_LENGTH; i++)
table[i] = random();
id_inc = 1;
/* Protocol ID selection function */
id_range = max_id - min_id + 1;
offset = F(CONTEXT, secret_key1);
index = G(CONTEXT, secret_key2);
count = id_range;
do {
next_id = min_id + (offset + table[index]) % id_range;
table[index] = table[index] + id_inc;
if(check_suitable_id(next_id))
return next_id;
count--;
} while (count > 0);
return ERROR;
'table[]' could be initialized with random values, as indicated by
the initialization code in pseudo-code above. The function G()
should be a cryptographic hash function. It should use the same
CONTEXT as F(), and a secret key value to compute a value between 0
and (TABLE_LENGTH-1). Alternatively, G() could take an "offset" as
input, and perform the exclusive-or (XOR) operation between all the
bytes in 'offset'.
Gont & Arce Expires September 1, 2018 [Page 20]

Internet-Draft Predictable Numeric IDs February 2018
The array 'table[]' assures that successive identifiers for a given
context will be monotonically-increasing. However, the increments
space is separated into TABLE_LENGTH different spaces, and thus
identifier reuse frequency will be (probabilistically) lower than
that of the algorithm in Section 8.4.3. That is, the generation of
identifier for one given context will not necessarily result in
increments in the identifiers for other contexts.
It is interesting to note that the size of 'table[]' does not limit
the number of different identifier sequences, but rather separates
the *increments* into TABLE_LENGTH different spaces. The identifier
sequence will result from adding the corresponding entry of 'table[]'
to the variable 'offset', which selects the actual identifier
sequence (as in the algorithm from Section 8.4.3).
An attacker can perform traffic analysis for any "increment space"
into which the attacker has "visibility" -- namely, the attacker can
force a node to generate identifiers where G(offset) identifies the
target "increment space". However, the attacker's ability to perform
traffic analysis is very reduced when compared to the predictable
linear identifiers (described in Section 8.4.1) and the hash-based
identifiers (described in Section 8.4.3). Additionally, an
implementation can further limit the attacker's ability to perform
traffic analysis by further separating the increment space (that is,
using a larger value for TABLE_LENGTH) and/or by randomizing the
increments.
8.4.5. Random-Increments Algorithm
This algorithm offers a middle ground between the algorithms that
select ephemeral ports randomly (such as those described in
Section 8.1.1 and Section 8.1.2), and those that offer obfuscation
but no randomization (such as those described in Section 8.4.3 and
Section 8.4.4).
Gont & Arce Expires September 1, 2018 [Page 21]

Internet-Draft Predictable Numeric IDs February 2018
/* Initialization code at system boot time. */
next_id = random(); /* Initialization value */
id_inc = 500; /* Determines the trade-off */
/* Identifier selection function */
id_range = max_id - min_id + 1;
count = id_range;
do {
/* Random increment */
next_id = next_id + (random() % id_inc) + 1;
/* Keep the identifier within acceptable range */
next_id = min_id + (next_id % id_range);
if(check_suitable_id(next_id))
return next_id;
count--;
} while (count > 0);
return ERROR;
This algorithm aims at producing a monotonically increasing sequence
of identifiers, while avoiding the use of fixed increments, which
would lead to trivially predictable sequences. The value "id_inc"
allows for direct control of the trade-off between the level of
obfuscation and the ID reuse frequency. The smaller the value of
"id_inc", the more similar this algorithm is to a predicable, global
monotonically-increasing ID generation algorithm. The larger the
value of "id_inc", the more similar this algorithm is to the
algorithm described in Section 8.1.1 of this document.
When the identifiers wrap, there is the risk of collisions of
identifiers (i.e., identifier reuse). Therefore, "id_inc" should be
selected according to the following criteria:
o It should maximize the wrapping time of the identifier space.
o It should minimize identifier reuse frequency.
o It should maximize obfuscation.
Clearly, these are competing goals, and the decision of which value
of "id_inc" to use is a trade-off. Therefore, the value of "id_inc"
Gont & Arce Expires September 1, 2018 [Page 22]

Internet-Draft Predictable Numeric IDs February 2018
should be configurable so that system administrators can make the
trade-off for themselves.
9. Common Vulnerabilities Associated with Identifiers
This section analyzes common vulnerabilities associated with the
generation of identifiers for each of the categories identified in
Section 7.
9.1. Category #1: Uniqueness (soft failure)
Possible vulnerabilities associated with identifiers of this category
are:
o Use of trivial algorithms (e.g. global counters) that generate
predictable identifiers
o Use of flawed PRNGs.
Since the only interoperability requirement for these identifiers is
uniqueness, the obvious approach to generate them is to employ a
PRNG. An implementer should consult [RFC4086] regarding randomness
requirements for security, and consult relevant documentation when
employing a PRNG provided by the underlying system.
Use algorithms other than PRNGs for generating identifiers of this
category is discouraged.
9.2. Category #2: Uniqueness (hard failure)
As noted in Section 8.2 this category typically employs the same
algorithms as Category #4, since a monotonically-increasing sequence
tends to minimize the identifier reuse frequency. Therefore, the
vulnerability analysis of Section 9.4 applies to this case.
9.3. Category #3: Uniqueness, constant within context (soft failure)
There are two main vulnerabilities that may be associated with
identifiers of this category:
1. Use algorithms or sources that result in predictable identifiers
2. Employing the same identifier across contexts in which constantcy
is not required
At times, an implementation or specification may be tempted to employ
a source for the identifier which is known to provide unique values.
However, while unique, the associated identifiers may have other
Gont & Arce Expires September 1, 2018 [Page 23]

Internet-Draft Predictable Numeric IDs February 2018
properties such as being predictable or leaking information about the
node in question. For example, as noted in [RFC7721], embedding
link-layer addresses for generating IPv6 IIDs not only results in
predictable values, but also leaks information about the manufacturer
of the network interface card.
On the other hand, using an identifier across contexts where
constantcy is not required can be leveraged for correlation of
activities. On of the most trivial examples of this is the use of
IPv6 IIDs that are constant across networks (such as IIDs that embed
the underlying link-layer address).
9.4. Category #4: Uniqueness, monotonically increasing within context
(hard failure)
A simple way to generalize algorithms employed for generating
identifiers of Category #4 would be as follows:
/* Identifier selection function */
count = max_id - min_id + 1;
do {
linear(CONTEXT)= linear(CONTEXT) + increment();
next_id= offset(CONTEXT) + linear(CONTEXT);
if(check_suitable_id(next_id))
return next_id;
count--;
} while (count > 0);
return ERROR;
Essentially, an identifier (next_id) is generated by adding a linear
function (linear()) to an offset value, which is unknown to the
attacker, and constant for given context.
The following aspects of the algorithm should be considered:
o For the most part, it is the offset() function that results in
identifiers that are unpredictable by an off-path attacker. While
the resulting sequence will be monotonically-increasing, the use
of an offset value that is unknown to the attacker makes the
resulting values unknown to the attacker.
o The most straightforward "stateless" implementation of offset
would be that in which offset() is the result of a
Gont & Arce Expires September 1, 2018 [Page 24]

Internet-Draft Predictable Numeric IDs February 2018
cryptographically-secure hash-function that takes the values that
identify the context and a "secret" (not shown in the figure
above) as arguments.
o Another possible (but stateful) approach would be to simply
generate a random offset and store it in memory, and then look-up
the corresponding context when a new identifier is to be selected.
The algorithm in Section 8.4.2 is essentially an implementation of
this type.
o The linear function is incremented according to increment(). In
the most trivial case increment() could always return the constant
"1". But it could also possibly return small integers such the
increments are randomized.
Considering the generic algorithm illustrated above we can identify
the following possible vulnerabilities:
o If the offset value spans more than the necessary context,
identifiers could be unnecessarily predictable by other parties,
since the offset value would be unnecessarily leaked to them. For
example, an implementation that means to produce a per-destination
counter but replaces offset() with a constant number (i.e.,
employs a global counter), will unnecessarily result in
predictable identifiers.
o The function linear() could be seen as representing the number of
identifiers that have so far been generated for a given context.
If linear() spans more than the necessary context, the
"increments" could be leaked to other parties, thus disclosing
information about the number of identifiers that have so far been
generated. For example, an implementation in which linear() is
implemented as a single global counter will unnecessarily leak
information the number of identifiers that have been produced.
o increment() determines how the linear() is incremented for each
identifier that is selected. In the most trivial case,
increment() will return the integer "1". However, an
implementation may have increment() return a "small" integer value
such that even if the current value employed by the generator is
guessed (see Appendix A of [RFC7739]), the exact next identifier
to be selected will be slightly harder to identify.
10. Security and Privacy Requirements for Identifiers
Protocol specifications that specify identifiers should:
Gont & Arce Expires September 1, 2018 [Page 25]