¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† The lost segment is probably a sign of congestion, and in
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† that situation the sender should be conservative about
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† retransmission. Furthermore, it is better to overestimate
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† than underestimate the RTT. An ACK for an out-of-order
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† segment should therefore contain the timestamp from the most
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† recent segment that advanced the window.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† The same situation occurs if segments are re-ordered by the
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† network.

¬†¬†¬†¬†¬†¬† (C) A filled hole in the sequence space.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† The segment that fills the hole represents the most recent
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† measurement of the network characteristics. On the other
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† hand, an RTT computed from an earlier segment would probably
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† include the sender's retransmit time-out, badly biasing the
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† sender's average RTT estimate. Thus, the timestamp from the
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† latest segment (which filled the hole) must be echoed.

¬†¬†¬†¬†¬†¬† An algorithm that covers all three cases is described in the
¬†¬†¬†¬†¬†¬† following rules for Timestamps option processing on a synchronized
¬†¬†¬†¬†¬†¬† connection:

¬†¬†¬†¬†¬†¬† (1) The connection state is augmented with two 32-bit slots:
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† TS.Recent holds a timestamp to be echoed in TSecr whenever a
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† segment is sent, and Last.ACK.sent holds the ACK field from
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† the last segment sent. Last.ACK.sent will equal RCV.NXT
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† except when ACKs have been delayed.
Jacobson, Braden, & Borman [Page 15]
¬†
RFC 1323 TCP Extensions for High Performance May 1992
¬†¬†¬†¬†¬†¬† (2) If Last.ACK.sent falls within the range of sequence numbers
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† of an incoming segment:

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† then the TSval from the segment is copied to TS.Recent;
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† otherwise, the TSval is ignored.

¬†¬†¬†¬†¬†¬† (3) When a TSopt is sent, its TSecr field is set to the current
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† TS.Recent value.

¬†¬†¬†¬†¬†¬† The following examples illustrate these rules. Here A, B, C...
¬†¬†¬†¬†¬†¬† represent data segments occupying successive blocks of sequence
¬†¬†¬†¬†¬†¬† numbers, and ACK(A),... represent the corresponding
¬†¬†¬†¬†¬†¬† acknowledgment segments. Note that ACK(A) has the same sequence
¬†¬†¬†¬†¬†¬† number as B. We show only one direction of timestamp echoing, for
¬†¬†¬†¬†¬†¬† clarity.
¬†¬†¬†¬†¬†¬† o Packets arrive in sequence, and some of the ACKs are delayed.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† By Case (A), the timestamp from the oldest unacknowledged
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† segment is echoed.

¬†¬†¬†¬†¬†¬† o Packets arrive out of order, and every packet is
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† acknowledged.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† By Case (B), the timestamp from the last segment that
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† advanced the left window edge is echoed, until the missing
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† segment arrives; it is echoed according to Case (C). The
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† same sequence would occur if segments B and D were lost and
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† retransmitted..

¬†¬†¬†¬†¬†¬† Section 4.2 describes a simple mechanism to reject old duplicate
¬†¬†¬†¬†¬†¬† segments that might corrupt an open TCP connection; we call this
¬†¬†¬†¬†¬†¬† mechanism PAWS (Protect Against Wrapped Sequence numbers). PAWS
¬†¬†¬†¬†¬†¬† operates within a single TCP connection, using state that is saved
¬†¬†¬†¬†¬†¬† in the connection control block. Section 4.3 and Appendix C
¬†¬†¬†¬†¬†¬† discuss the implications of the PAWS mechanism for avoiding old
¬†¬†¬†¬†¬†¬† duplicates from previous incarnations of the same connection.

¬†¬†¬† 4.2 The PAWS Mechanism

¬†¬†¬†¬†¬†¬† PAWS uses the same TCP Timestamps option as the RTTM mechanism
¬†¬†¬†¬†¬†¬† described earlier, and assumes that every received TCP segment
¬†¬†¬†¬†¬†¬† (including data and ACK segments) contains a timestamp SEG.TSval
¬†¬†¬†¬†¬†¬† whose values are monotone non-decreasing in time. The basic idea
¬†¬†¬†¬†¬†¬† is that a segment can be discarded as an old duplicate if it is
¬†¬†¬†¬†¬†¬† received with a timestamp SEG.TSval less than some timestamp
¬†¬†¬†¬†¬†¬† recently received on this connection.

¬†¬†¬†¬†¬†¬† In both the PAWS and the RTTM mechanism, the "timestamps" are 32-

¬†¬†¬†¬†¬†¬† The choice of incoming timestamps to be saved for this comparison
¬†¬†¬†¬†¬†¬† must guarantee a value that is monotone increasing. For example,
¬†¬†¬†¬†¬†¬† we might save the timestamp from the segment that last advanced
¬†¬†¬†¬†¬†¬† the left edge of the receive window, ie, the most recent in-
¬†¬†¬†¬†¬†¬† sequence segment. Instead, we choose the value TS.Recent
¬†¬†¬†¬†¬†¬† introduced in Section 3.4 for the RTTM mechanism, since using a
¬†¬†¬†¬†¬†¬† common value for both PAWS and RTTM simplifies the implementation
¬†¬†¬†¬†¬†¬† of both. As Section 3.4 explained, TS.Recent differs from the
¬†¬†¬†¬†¬†¬† timestamp from the last in-sequence segment only in the case of
¬†¬†¬†¬†¬†¬† delayed ACKs, and therefore by less than one window. Either
¬†¬†¬†¬†¬†¬† choice will therefore protect against sequence number wrap-around.

¬†¬†¬†¬†¬†¬† RTTM was specified in a symmetrical manner, so that TSval
¬†¬†¬†¬†¬†¬† timestamps are carried in both data and ACK segments and are
¬†¬†¬†¬†¬†¬† echoed in TSecr fields carried in returning ACK or data segments.
¬†¬†¬†¬†¬†¬† PAWS submits all incoming segments to the same test, and therefore
¬†¬†¬†¬†¬†¬† protects against duplicate ACK segments as well as data segments.
¬†¬†¬†¬†¬†¬† (An alternative un-symmetric algorithm would protect against old
¬†¬†¬†¬†¬†¬† duplicate ACKs: the sender of data would reject incoming ACK
¬†¬†¬†¬†¬†¬† segments whose TSecr values were less than the TSecr saved from
¬†¬†¬†¬†¬†¬† the last segment whose ACK field advanced the left edge of the
¬†¬†¬†¬†¬†¬† send window. This algorithm was deemed to lack economy of
¬†¬†¬†¬†¬†¬† mechanism and symmetry.)

¬†¬†¬†¬†¬†¬† TSval timestamps sent on {SYN} and {SYN,ACK} segments are used to
¬†¬†¬†¬†¬†¬† initialize PAWS. PAWS protects against old duplicate non-SYN
¬†¬†¬†¬†¬†¬† segments, and duplicate SYN segments received while there is a
¬†¬†¬†¬†¬†¬† synchronized connection. Duplicate {SYN} and {SYN,ACK} segments
¬†¬†¬†¬†¬†¬† received when there is no connection will be discarded by the
¬†¬†¬†¬†¬†¬† normal 3-way handshake and sequence number checks of TCP.

¬†¬†¬†¬†¬†¬† It is recommended that RST segments NOT carry timestamps, and that
¬†¬†¬†¬†¬†¬† RST segments be acceptable regardless of their timestamp. Old
¬†¬†¬†¬†¬†¬† duplicate RST segments should be exceedingly unlikely, and their
¬†¬†¬†¬†¬†¬† cleanup function should take precedence over timestamps.

¬†¬†¬†¬†¬†¬† 4.2.1 Basic PAWS Algorithm

¬†¬†¬†¬†¬†¬†¬†¬†¬† The PAWS algorithm requires the following processing to be
¬†¬†¬†¬†¬†¬†¬†¬†¬† performed on all incoming segments for a synchronized
¬†¬†¬†¬†¬†¬†¬†¬†¬† connection:
Jacobson, Braden, & Borman [Page 18]
¬†
RFC 1323 TCP Extensions for High Performance May 1992
¬†¬†¬†¬†¬†¬†¬†¬†¬† R1) If there is a Timestamps option in the arriving segment
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† and SEG.TSval < TS.Recent and if TS.Recent is valid (see
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† later discussion), then treat the arriving segment as not
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† acceptable:

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Send an acknowledgement in reply as specified in
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† RFC-793 page 69 and drop the segment.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Note: it is necessary to send an ACK segment in order
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† to retain TCP's mechanisms for detecting and
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† recovering from half-open connections. For example,
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† see Figure 10 of RFC-793.

¬†¬†¬†¬†¬†¬†¬†¬†¬† R2) If the segment is outside the window, reject it (normal
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† TCP processing)

¬†¬†¬†¬†¬†¬†¬†¬†¬† R3) If an arriving segment satisfies: SEG.SEQ <= Last.ACK.sent
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† (see Section 3.4), then record its timestamp in TS.Recent.

¬†¬†¬†¬†¬†¬†¬†¬†¬† R4) If an arriving segment is in-sequence (ie, at the left
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† window edge), then accept it normally.

¬†¬†¬†¬†¬†¬†¬†¬†¬† R5) Otherwise, treat the segment as a normal in-window, out-
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† of-sequence TCP segment (eg, queue it for later delivery
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† to the user).

¬†¬†¬†¬†¬†¬†¬†¬†¬† It is important to note that the timestamp is checked only when
¬†¬†¬†¬†¬†¬†¬†¬†¬† a segment first arrives at the receiver, regardless of whether
¬†¬†¬†¬†¬†¬†¬†¬†¬† it is in-sequence or it must be queued for later delivery.
¬†¬†¬†¬†¬†¬†¬†¬†¬† Consider the following example.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Suppose the segment sequence: A.1, B.1, C.1, ..., Z.1 has
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† been sent, where the letter indicates the sequence number
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† and the digit represents the timestamp. Suppose also that
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† segment B.1 has been lost. The timestamp in TS.TStamp is
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† 1 (from A.1), so C.1, ..., Z.1 are considered acceptable
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† and are queued. When B is retransmitted as segment B.2
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† (using the latest timestamp), it fills the hole and causes
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† all the segments through Z to be acknowledged and passed
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† to the user. The timestamps of the queued segments are
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† *not* inspected again at this time, since they have
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† already been accepted. When B.2 is accepted, TS.Stamp is
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† set to 2.

¬†¬†¬†¬†¬†¬†¬†¬†¬† This rule allows reasonable performance under loss. A full

Jacobson, Braden, & Borman [Page 19]
¬†
RFC 1323 TCP Extensions for High Performance May 1992
¬†¬†¬†¬†¬†¬†¬†¬†¬† window of data is in transit at all times, and after a loss a
¬†¬†¬†¬†¬†¬†¬†¬†¬† full window less one packet will show up out-of-sequence to be
¬†¬†¬†¬†¬†¬†¬†¬†¬† queued at the receiver (eg, up to ~2**30 bytes of data); the
¬†¬†¬†¬†¬†¬†¬†¬†¬† timestamp option must not result in discarding this data.

¬†¬†¬†¬†¬†¬†¬†¬†¬† In certain unlikely circumstances, the algorithm of rules R1-R4
¬†¬†¬†¬†¬†¬†¬†¬†¬† could lead to discarding some segments unnecessarily, as shown
¬†¬†¬†¬†¬†¬†¬†¬†¬† in the following example:

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Suppose again that segments: A.1, B.1, C.1, ..., Z.1 have
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† been sent in sequence and that segment B.1 has been lost.
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Furthermore, suppose delivery of some of C.1, ... Z.1 is
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† delayed until AFTER the retransmission B.2 arrives at the
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† receiver. These delayed segments will be discarded
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† unnecessarily when they do arrive, since their timestamps
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† are now out of date.

¬†¬†¬†¬†¬†¬†¬†¬†¬† This case is very unlikely to occur. If the retransmission was
¬†¬†¬†¬†¬†¬†¬†¬†¬† triggered by a timeout, some of the segments C.1, ... Z.1 must
¬†¬†¬†¬†¬†¬†¬†¬†¬† have been delayed longer than the RTO time. This is presumably
¬†¬†¬†¬†¬†¬†¬†¬†¬† an unlikely event, or there would be many spurious timeouts and
¬†¬†¬†¬†¬†¬†¬†¬†¬† retransmissions. If B's retransmission was triggered by the
¬†¬†¬†¬†¬†¬†¬†¬†¬† "fast retransmit" algorithm, ie, by duplicate ACKs, then the
¬†¬†¬†¬†¬†¬†¬†¬†¬† queued segments that caused these ACKs must have been received
¬†¬†¬†¬†¬†¬†¬†¬†¬† already.

¬†¬†¬†¬†¬†¬†¬†¬†¬† Even if a segment were delayed past the RTO, the Fast
¬†¬†¬†¬†¬†¬†¬†¬†¬† Retransmit mechanism [Jacobson90c] will cause the delayed
¬†¬†¬†¬†¬†¬†¬†¬†¬† packets to be retransmitted at the same time as B.2, avoiding
¬†¬†¬†¬†¬†¬†¬†¬†¬† an extra RTT and therefore causing a very small performance
¬†¬†¬†¬†¬†¬†¬†¬†¬† penalty.

¬†¬†¬†¬†¬†¬†¬†¬†¬† We know of no case with a significant probability of occurrence
¬†¬†¬†¬†¬†¬†¬†¬†¬† in which timestamps will cause performance degradation by
¬†¬†¬†¬†¬†¬†¬†¬†¬† unnecessarily discarding segments.

¬†¬†¬†¬†¬†¬† 4.2.2 Timestamp Clock

¬†¬†¬†¬†¬†¬†¬†¬†¬† It is important to understand that the PAWS algorithm does not
¬†¬†¬†¬†¬†¬†¬†¬†¬† require clock synchronization between sender and receiver. The
¬†¬†¬†¬†¬†¬†¬†¬†¬† sender's timestamp clock is used to stamp the segments, and the
¬†¬†¬†¬†¬†¬†¬†¬†¬† sender uses the echoed timestamp to measure RTT's. However,
¬†¬†¬†¬†¬†¬†¬†¬†¬† the receiver treats the timestamp as simply a monotone-
¬†¬†¬†¬†¬†¬†¬†¬†¬† increasing serial number, without any necessary connection to
¬†¬†¬†¬†¬†¬†¬†¬†¬† its clock. From the receiver's viewpoint, the timestamp is
¬†¬†¬†¬†¬†¬†¬†¬†¬† acting as a logical extension of the high-order bits of the
¬†¬†¬†¬†¬†¬†¬†¬†¬† sequence number.
Jacobson, Braden, & Borman [Page 20]
¬†
RFC 1323 TCP Extensions for High Performance May 1992
¬†¬†¬†¬†¬†¬†¬†¬†¬† The receiver algorithm does place some requirements on the
¬†¬†¬†¬†¬†¬†¬†¬†¬† frequency of the timestamp clock.

¬†¬†¬†¬†¬†¬†¬†¬†¬† (a) The timestamp clock must not be "too slow".

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† It must tick at least once for each 2**31 bytes sent. In
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† fact, in order to be useful to the sender for round trip
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† timing, the clock should tick at least once per window's
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† worth of data, and even with the RFC-1072 window
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† extension, 2**31 bytes must be at least two windows.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† To make this more quantitative, any clock faster than 1
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† tick/sec will reject old duplicate segments for link
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† speeds of ~8 Gbps. A 1ms timestamp clock will work at
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† link speeds up to 8 Tbps (8*10**12) bps!

¬†¬†¬†¬†¬†¬†¬†¬†¬† (b) The timestamp clock must not be "too fast".

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Its recycling time must be greater than MSL seconds.
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Since the clock (timestamp) is 32 bits and the worst-case
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† MSL is 255 seconds, the maximum acceptable clock frequency
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† is one tick every 59 ns.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† However, it is desirable to establish a much longer
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† recycle period, in order to handle outdated timestamps on
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† idle connections (see Section 4.2.3), and to relax the MSL
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† requirement for preventing sequence number wrap-around.
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† With a 1 ms timestamp clock, the 32-bit timestamp will
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† wrap its sign bit in 24.8 days. Thus, it will reject old
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† duplicates on the same connection if MSL is 24.8 days or
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† less. This appears to be a very safe figure; an MSL of
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† 24.8 days or longer can probably be assumed by the gateway
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† system without requiring precise MSL enforcement by the
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† TTL value in the IP layer.

¬†¬†¬†¬†¬†¬†¬†¬†¬† Based upon these considerations, we choose a timestamp clock
¬†¬†¬†¬†¬†¬†¬†¬†¬† frequency in the range 1 ms to 1 sec per tick. This range also
¬†¬†¬†¬†¬†¬†¬†¬†¬† matches the requirements of the RTTM mechanism, which does not
¬†¬†¬†¬†¬†¬†¬†¬†¬† need much more resolution than the granularity of the
¬†¬†¬†¬†¬†¬†¬†¬†¬† retransmit timer, eg, tens or hundreds of milliseconds.

¬†¬†¬†¬†¬†¬†¬†¬†¬† The PAWS mechanism also puts a strong monotonicity requirement
¬†¬†¬†¬†¬†¬†¬†¬†¬† on the sender's timestamp clock. The method of implementation
¬†¬†¬†¬†¬†¬†¬†¬†¬† of the timestamp clock to meet this requirement depends upon
¬†¬†¬†¬†¬†¬†¬†¬†¬† the system hardware and software.

¬†¬†¬†¬†¬†¬†¬†¬†¬† * Some hosts have a hardware clock that is guaranteed to be
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† monotonic between hardware resets.

¬†¬†¬†¬†¬†¬†¬†¬†¬† * The timestamp clock may be derived from a system clock
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† that is subject to being abruptly changed, by adding a
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† variable offset value. This offset is initialized to
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† zero. When a new timestamp clock value is needed, the
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† offset can be adjusted as necessary to make the new value
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† equal to or larger than the previous value (which was
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† saved for this purpose).
¬†¬†¬†¬†¬†¬† 4.2.3 Outdated Timestamps

¬†¬†¬†¬†¬†¬†¬†¬†¬† If a connection remains idle long enough for the timestamp
¬†¬†¬†¬†¬†¬†¬†¬†¬† clock of the other TCP to wrap its sign bit, then the value
¬†¬†¬†¬†¬†¬†¬†¬†¬† saved in TS.Recent will become too old; as a result, the PAWS
¬†¬†¬†¬†¬†¬†¬†¬†¬† mechanism will cause all subsequent segments to be rejected,
¬†¬†¬†¬†¬†¬†¬†¬†¬† freezing the connection (until the timestamp clock wraps its
¬†¬†¬†¬†¬†¬†¬†¬†¬† sign bit again).

¬†¬†¬†¬†¬†¬†¬†¬†¬† With the chosen range of timestamp clock frequencies (1 sec to
¬†¬†¬†¬†¬†¬†¬†¬†¬† 1 ms), the time to wrap the sign bit will be between 24.8 days
¬†¬†¬†¬†¬†¬†¬†¬†¬† and 24800 days. A TCP connection that is idle for more than 24
¬†¬†¬†¬†¬†¬†¬†¬†¬† days and then comes to life is exceedingly unusual. However,
¬†¬†¬†¬†¬†¬†¬†¬†¬† it is undesirable in principle to place any limitation on TCP
¬†¬†¬†¬†¬†¬†¬†¬†¬† connection lifetimes.

¬†¬†¬†¬†¬†¬†¬†¬†¬† We therefore require that an implementation of PAWS include a
¬†¬†¬†¬†¬†¬†¬†¬†¬† mechanism to "invalidate" the TS.Recent value when a connection
¬†¬†¬†¬†¬†¬†¬†¬†¬† is idle for more than 24 days. (An alternative solution to the
¬†¬†¬†¬†¬†¬†¬†¬†¬† problem of outdated timestamps would be to send keepalive
¬†¬†¬†¬†¬†¬†¬†¬†¬† segments at a very low rate, but still more often than the
¬†¬†¬†¬†¬†¬†¬†¬†¬† wrap-around time for timestamps, eg, once a day. This would
¬†¬†¬†¬†¬†¬†¬†¬†¬† impose negligible overhead. However, the TCP specification has
¬†¬†¬†¬†¬†¬†¬†¬†¬† never included keepalives, so the solution based upon
¬†¬†¬†¬†¬†¬†¬†¬†¬† invalidation was chosen.)

¬†¬†¬†¬†¬†¬†¬†¬†¬† Note that a TCP does not know the frequency, and therefore, the
¬†¬†¬†¬†¬†¬†¬†¬†¬† wraparound time, of the other TCP, so it must assume the worst.
¬†¬†¬†¬†¬†¬†¬†¬†¬† The validity of TS.Recent needs to be checked only if the basic
¬†¬†¬†¬†¬†¬†¬†¬†¬† PAWS timestamp check fails, ie, only if SEG.TSval <
¬†¬†¬†¬†¬†¬†¬†¬†¬† TS.Recent. If TS.Recent is found to be invalid, then the
¬†¬†¬†¬†¬†¬†¬†¬†¬† segment is accepted, regardless of the failure of the timestamp
¬†¬†¬†¬†¬†¬†¬†¬†¬† check, and rule R3 updates TS.Recent with the TSval from the
¬†¬†¬†¬†¬†¬†¬†¬†¬† new segment.

¬†¬†¬†¬†¬†¬†¬†¬†¬† To detect how long the connection has been idle, the TCP may

Jacobson, Braden, & Borman [Page 22]
¬†
RFC 1323 TCP Extensions for High Performance May 1992
¬†¬†¬†¬†¬†¬†¬†¬†¬† update a clock or timestamp value associated with the
¬†¬†¬†¬†¬†¬†¬†¬†¬† connection whenever TS.Recent is updated, for example. The
¬†¬†¬†¬†¬†¬†¬†¬†¬† details will be implementation-dependent.

¬†¬†¬†¬†¬†¬† 4.2.4 Header Prediction

¬†¬†¬†¬†¬†¬†¬†¬†¬† "Header prediction" [Jacobson90a] is a high-performance
¬†¬†¬†¬†¬†¬†¬†¬†¬† transport protocol implementation technique that is most
¬†¬†¬†¬†¬†¬†¬†¬†¬† important for high-speed links. This technique optimizes the
¬†¬†¬†¬†¬†¬†¬†¬†¬† code for the most common case, receiving a segment correctly
¬†¬†¬†¬†¬†¬†¬†¬†¬† and in order. Using header prediction, the receiver asks the
¬†¬†¬†¬†¬†¬†¬†¬†¬† question, "Is this segment the next in sequence?" This
¬†¬†¬†¬†¬†¬†¬†¬†¬† question can be answered in fewer machine instructions than the
¬†¬†¬†¬†¬†¬†¬†¬†¬† question, "Is this segment within the window?"

¬†¬†¬†¬†¬†¬†¬†¬†¬† H2) Do header prediction: if segment is next in sequence and
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† if there are no special conditions requiring additional
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† processing, accept the segment, record its timestamp, and
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† skip H3.

¬†¬†¬†¬†¬†¬†¬†¬†¬† H3) Process the segment normally, as specified in RFC-793.
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† This includes dropping segments that are outside the win-
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† dow and possibly sending acknowledgments, and queueing
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† in-window, out-of-sequence segments.

¬†¬†¬†¬†¬†¬†¬†¬†¬† Another possibility would be to interchange steps H1 and H2,
¬†¬†¬†¬†¬†¬†¬†¬†¬† ie, to perform the header prediction step H2 FIRST, and
¬†¬†¬†¬†¬†¬†¬†¬†¬† perform H1 and H3 only when header prediction fails. This
¬†¬†¬†¬†¬†¬†¬†¬†¬† could be a performance improvement, since the timestamp check
¬†¬†¬†¬†¬†¬†¬†¬†¬† in step H1 is very unlikely to fail, and it requires interval
¬†¬†¬†¬†¬†¬†¬†¬†¬† arithmetic on a finite field, a relatively expensive operation.
¬†¬†¬†¬†¬†¬†¬†¬†¬† To perform this check on every single segment is contrary to
¬†¬†¬†¬†¬†¬†¬†¬†¬† the philosophy of header prediction. We believe that this
¬†¬†¬†¬†¬†¬†¬†¬†¬† change might reduce CPU time for TCP protocol processing by up
¬†¬†¬†¬†¬†¬†¬†¬†¬† to 5-10% on high-speed networks.

¬†¬†¬†¬†¬†¬†¬†¬†¬† However, putting H2 first would create a hazard: a segment from
¬†¬†¬†¬†¬†¬†¬†¬†¬† 2**32 bytes in the past might arrive at exactly the wrong time
¬†¬†¬†¬†¬†¬†¬†¬†¬† and be accepted mistakenly by the header-prediction step. The
¬†¬†¬†¬†¬†¬†¬†¬†¬† following reasoning has been introduced [Jacobson90b] to show
¬†¬†¬†¬†¬†¬†¬†¬†¬† that the probability of this failure is negligible.
Jacobson, Braden, & Borman [Page 23]
¬†
RFC 1323 TCP Extensions for High Performance May 1992
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† If all segments are equally likely to show up as old
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† duplicates, then the probability of an old duplicate
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† exactly matching the left window edge is the maximum
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† segment size (MSS) divided by the size of the sequence
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† space. This ratio must be less than 2**-16, since MSS
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† must be < 2**16; for example, it will be (2**12)/(2**32) =
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† 2**-20 for an FDDI link. However, the older a segment is,
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† the less likely it is to be retained in the Internet, and
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† under any reasonable model of segment lifetime the
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† probability of an old duplicate exactly at the left window
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† edge must be much smaller than 2**-16.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† The 16 bit TCP checksum also allows a basic unreliability
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† of one part in 2**16. A protocol mechanism whose
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† reliability exceeds the reliability of the TCP checksum
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† should be considered "good enough", ie, it won't
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† contribute significantly to the overall error rate. We
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† therefore believe we can ignore the problem of an old
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† duplicate being accepted by doing header prediction before
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† checking the timestamp.

¬†¬†¬†¬†¬†¬†¬†¬†¬† However, this probabilistic argument is not universally
¬†¬†¬†¬†¬†¬†¬†¬†¬† accepted, and the consensus at present is that the performance
¬†¬†¬†¬†¬†¬†¬†¬†¬† gain does not justify the hazard in the general case. It is
¬†¬†¬†¬†¬†¬†¬†¬†¬† therefore recommended that H2 follow H1.

¬†¬†¬† 4.3. Duplicates from Earlier Incarnations of Connection

¬†¬†¬†¬†¬†¬† The PAWS mechanism protects against errors due to sequence number
¬†¬†¬†¬†¬†¬† wrap-around on high-speed connection. Segments from an earlier
¬†¬†¬†¬†¬†¬† incarnation of the same connection are also a potential cause of
¬†¬†¬†¬†¬†¬† old duplicate errors. In both cases, the TCP mechanisms to
¬†¬†¬†¬†¬†¬† prevent such errors depend upon the enforcement of a maximum
¬†¬†¬†¬†¬†¬† segment lifetime (MSL) by the Internet (IP) layer (see Appendix of
¬†¬†¬†¬†¬†¬† RFC-1185 for a detailed discussion). Unlike the case of sequence
¬†¬†¬†¬†¬†¬† space wrap-around, the MSL required to prevent old duplicate
¬†¬†¬†¬†¬†¬† errors from earlier incarnations does not depend upon the transfer
¬†¬†¬†¬†¬†¬† rate. If the IP layer enforces the recommended 2 minute MSL of
¬†¬†¬†¬†¬†¬† TCP, and if the TCP rules are followed, TCP connections will be
¬†¬†¬†¬†¬†¬† safe from earlier incarnations, no matter how high the network
¬†¬†¬†¬†¬†¬† speed. Thus, the PAWS mechanism is not required for this case.

¬†¬†¬†¬†¬†¬† We may still ask whether the PAWS mechanism can provide additional
¬†¬†¬†¬†¬†¬† security against old duplicates from earlier connections, allowing
¬†¬†¬†¬†¬†¬† us to relax the enforcement of MSL by the IP layer. Appendix B
¬†¬†¬†¬†¬†¬† explores this question, showing that further assumptions and/or
¬†¬†¬†¬†¬†¬† mechanisms are required, beyond those of PAWS. This is not part
¬†¬†¬†¬†¬†¬† of the current extension.

¬†¬†¬† This memo presented a set of extensions to TCP to provide efficient
¬†¬†¬† operation over large-bandwidth*delay-product paths and reliable
¬†¬†¬† operation over very high-speed paths. These extensions are designed
¬†¬†¬† to provide compatible interworking with TCP's that do not implement
¬†¬†¬† the extensions.

¬†¬†¬† These mechanisms are implemented using new TCP options for scaled
¬†¬†¬† windows and timestamps. The timestamps are used for two distinct
¬†¬†¬† mechanisms: RTTM (Round Trip Time Measurement) and PAWS (Protect
¬†¬†¬† Against Wrapped Sequences).

¬†¬†¬† The Window Scale option was originally suggested by Mike St. Johns of
¬†¬†¬† USAF/DCA. The present form of the option was suggested by Mike
¬†¬†¬† Karels of UC Berkeley in response to a more cumbersome scheme defined
¬†¬†¬† by Van Jacobson. Lixia Zhang helped formulate the PAWS mechanism
¬†¬†¬† description in RFC-1185.

¬†¬†¬† Finally, much of this work originated as the result of discussions
¬†¬†¬† within the End-to-End Task Force on the theoretical limitations of
¬†¬†¬† transport protocols in general and TCP in particular. More recently,
¬†¬†¬† task force members and other on the end2end-interest list have made
¬†¬†¬† valuable contributions by pointing out flaws in the algorithms and
¬†¬†¬† the documentation. The authors are grateful for all these
¬†¬†¬† contributions.

¬†¬†¬† There are two cases to be considered: (1) a system crashing (and
¬†¬†¬† losing connection state) and restarting, and (2) the same connection
¬†¬†¬† being closed and reopened without a loss of host state. These will
¬†¬†¬† be described in the following two sections.

¬†¬†¬† B.1 System Crash with Loss of State

¬†¬†¬†¬†¬†¬† TCP's quiet time of one MSL upon system startup handles the loss
¬†¬†¬†¬†¬†¬† of connection state in a system crash/restart. For an
¬†¬†¬†¬†¬†¬† explanation, see for example "When to Keep Quiet" in the TCP
¬†¬†¬†¬†¬†¬† protocol specification [Postel81]. The MSL that is required here
¬†¬†¬†¬†¬†¬† does not depend upon the transfer speed. The current TCP MSL of 2
¬†¬†¬†¬†¬†¬† minutes seems acceptable as an operational compromise, as many
¬†¬†¬†¬†¬†¬† host systems take this long to boot after a crash.

¬†¬†¬†¬†¬†¬† However, the timestamp option may be used to ease the MSL
¬†¬†¬†¬†¬†¬† requirements (or to provide additional security against data
¬†¬†¬†¬†¬†¬† corruption). If timestamps are being used and if the timestamp
¬†¬†¬†¬†¬†¬† clock can be guaranteed to be monotonic over a system
¬†¬†¬†¬†¬†¬† crash/restart, ie, if the first value of the sender's timestamp
¬†¬†¬†¬†¬†¬† clock after a crash/restart can be guaranteed to be greater than
¬†¬†¬†¬†¬†¬† the last value before the restart, then a quiet time will be
¬†¬†¬†¬†¬†¬† unnecessary.

¬†¬†¬†¬†¬†¬† To dispense totally with the quiet time would require that the
¬†¬†¬†¬†¬†¬† host clock be synchronized to a time source that is stable over
¬†¬†¬†¬†¬†¬† the crash/restart period, with an accuracy of one timestamp clock
¬†¬†¬†¬†¬†¬† tick or better. We can back off from this strict requirement to
¬†¬†¬†¬†¬†¬† take advantage of approximate clock synchronization. Suppose that
¬†¬†¬†¬†¬†¬† the clock is always re-synchronized to within N timestamp clock

Jacobson, Braden, & Borman [Page 27]
¬†
RFC 1323 TCP Extensions for High Performance May 1992
¬†¬†¬†¬†¬†¬† ticks and that booting (extended with a quiet time, if necessary)
¬†¬†¬†¬†¬†¬† takes more than N ticks. This will guarantee monotonicity of the
¬†¬†¬†¬†¬†¬† timestamps, which can then be used to reject old duplicates even
¬†¬†¬†¬†¬†¬† without an enforced MSL.

¬†¬†¬† B.2 Closing and Reopening a Connection

¬†¬†¬†¬†¬†¬† When a TCP connection is closed, a delay of 2*MSL in TIME-WAIT
¬†¬†¬†¬†¬†¬† state ties up the socket pair for 4 minutes (see Section 3.5 of
¬†¬†¬†¬†¬†¬† [Postel81]. Applications built upon TCP that close one connection
¬†¬†¬†¬†¬†¬† and open a new one (eg, an FTP data transfer connection using
¬†¬†¬†¬†¬†¬† Stream mode) must choose a new socket pair each time. The TIME-
¬†¬†¬†¬†¬†¬† WAIT delay serves two different purposes:

¬†¬†¬†¬†¬†¬† (a) Implement the full-duplex reliable close handshake of TCP.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† The proper time to delay the final close step is not really
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† related to the MSL; it depends instead upon the RTO for the
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† FIN segments and therefore upon the RTT of the path. (It
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† could be argued that the side that is sending a FIN knows
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† what degree of reliability it needs, and therefore it should
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† be able to determine the length of the TIME-WAIT delay for
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† the FIN's recipient. This could be accomplished with an
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† appropriate TCP option in FIN segments.)

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Although there is no formal upper-bound on RTT, common
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† network engineering practice makes an RTT greater than 1
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† minute very unlikely. Thus, the 4 minute delay in TIME-WAIT
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† state works satisfactorily to provide a reliable full-duplex
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† TCP close. Note again that this is independent of MSL
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† enforcement and network speed.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† The TIME-WAIT state could cause an indirect performance
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† problem if an application needed to repeatedly close one
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† connection and open another at a very high frequency, since
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† the number of available TCP ports on a host is less than
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† 2**16. However, high network speeds are not the major
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† contributor to this problem; the RTT is the limiting factor
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† in how quickly connections can be opened and closed.
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Therefore, this problem will be no worse at high transfer
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† speeds.

¬†¬†¬†¬†¬†¬† (b) Allow old duplicate segments to expire.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† To replace this function of TIME-WAIT state, a mechanism
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† would have to operate across connections. PAWS is defined
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† strictly within a single connection; the last timestamp is
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† TS.Recent is kept in the connection control block, and

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† An additional mechanism could be added to the TCP, a per-host
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† cache of the last timestamp received from any connection.
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† This value could then be used in the PAWS mechanism to reject
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† old duplicate segments from earlier incarnations of the
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† connection, if the timestamp clock can be guaranteed to have
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† ticked at least once since the old connection was open. This
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† would require that the TIME-WAIT delay plus the RTT together
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† must be at least one tick of the sender's timestamp clock.
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Such an extension is not part of the proposal of this RFC.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Note that this is a variant on the mechanism proposed by
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Garlick, Rom, and Postel [Garlick77], which required each
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† host to maintain connection records containing the highest
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† sequence numbers on every connection. Using timestamps
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† instead, it is only necessary to keep one quantity per remote
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† host, regardless of the number of simultaneous connections to
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† that host.

¬†¬†¬† The protocol extensions defined in this document differ in several
¬†¬†¬† important ways from those defined in RFC-1072 and RFC-1185.

¬†¬†¬† (a) SACK has been deferred to a later memo.

¬†¬†¬† (b) The detailed rules for sending timestamp replies (see Section
¬†¬†¬†¬†¬†¬†¬†¬† 3.4) differ in important ways. The earlier rules could result
¬†¬†¬†¬†¬†¬†¬†¬† in an under-estimate of the RTT in certain cases (packets
¬†¬†¬†¬†¬†¬†¬†¬† dropped or out of order).

¬†¬†¬† (c) The same value TS.Recent is now shared by the two distinct
¬†¬†¬†¬†¬†¬†¬†¬† mechanisms RTTM and PAWS. This simplification became possible
¬†¬†¬†¬†¬†¬†¬†¬† because of change (b).

¬†¬†¬† (d) An ambiguity in RFC-1185 was resolved in favor of putting
¬†¬†¬†¬†¬†¬†¬†¬† timestamps on ACK as well as data segments. This supports the
¬†¬†¬†¬†¬†¬†¬†¬† symmetry of the underlying TCP protocol.

¬†¬†¬† (e) The echo and echo reply options of RFC-1072 were combined into a
¬†¬†¬†¬†¬†¬†¬†¬† single Timestamps option, to reflect the symmetry and to
¬†¬†¬†¬†¬†¬†¬†¬† simplify processing.

¬†¬†¬† (f) The problem of outdated timestamps on long-idle connections,
¬†¬†¬†¬†¬†¬†¬†¬† discussed in Section 4.2.2, was realized and resolved.

¬†¬†¬† (g) RFC-1185 recommended that header prediction take precedence over
¬†¬†¬†¬†¬†¬†¬†¬† the timestamp check. Based upon some scepticism about the
¬†¬†¬†¬†¬†¬†¬†¬† probabilistic arguments given in Section 4.2.4, it was decided
¬†¬†¬†¬†¬†¬†¬†¬† to recommend that the timestamp check be performed first.

¬†¬†¬† (h) The spec was modified so that the extended options will be sent
¬†¬†¬†¬†¬†¬†¬†¬† on <SYN,ACK> segments only when they are received in the
¬†¬†¬†¬†¬†¬†¬†¬† corresponding <SYN> segments. This provides the most
¬†¬†¬†¬†¬†¬†¬†¬† conservative possible conditions for interoperation with
¬†¬†¬†¬†¬†¬†¬†¬† implementations without the extensions.

¬†¬†¬† In addition to these substantive changes, the present RFC attempts to
¬†¬†¬† specify the algorithms unambiguously by presenting modifications to
¬†¬†¬† the Event Processing rules of RFC-793; see Appendix E.

¬†¬†¬†¬†¬†¬† If the foreign socket is specified, then change the connection
¬†¬†¬†¬†¬†¬† from passive to active, select an ISS. Send a SYN segment
¬†¬†¬†¬†¬†¬† containing the options: <TSval=my.TSclock> and
¬†¬†¬†¬†¬†¬† <WSopt=Rcv.Wind.Scale>. Set SND.UNA to ISS, SND.NXT to ISS+1.
¬†¬†¬†¬†¬†¬† Enter SYN-SENT state. ...

¬†¬†¬†¬†¬†¬†¬†¬† if the SYN bit is set, check the security. If the ...

¬†¬†¬†¬†¬†¬†¬†¬†¬† ...

¬†¬†¬†¬†¬†¬†¬†¬† If the SEG.PRC is less than the TCB.PRC then continue.

¬†¬†¬†¬†¬†¬†¬†¬† Check for a Window Scale option (WSopt); if one is found, save
¬†¬†¬†¬†¬†¬†¬†¬† SEG.WSopt in Snd.Wind.Scale and set Snd.WS.OK flag on.
¬†¬†¬†¬†¬†¬†¬†¬† Otherwise, set both Snd.Wind.Scale and Rcv.Wind.Scale to zero
¬†¬†¬†¬†¬†¬†¬†¬† and clear Snd.WS.OK flag.

¬†¬†¬†¬†¬†¬†¬†¬† Check for a TSopt option; if one is found, save SEG.TSval in the
¬†¬†¬†¬†¬†¬†¬†¬† variable TS.Recent and turn on the Snd.TS.OK bit.

¬†¬†¬†¬†¬†¬†¬†¬† Set RCV.NXT to SEG.SEQ+1, IRS is set to SEG.SEQ and any other
¬†¬†¬†¬†¬†¬†¬†¬† control or text should be queued for processing later. ISS
¬†¬†¬†¬†¬†¬†¬†¬† should be selected and a SYN segment sent of the form:

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† <SEQ=ISS><ACK=RCV.NXT><CTL=SYN,ACK>

¬†¬†¬†¬†¬†¬†¬†¬† If the Snd.WS.OK bit is on, include a WSopt option
¬†¬†¬†¬†¬†¬†¬†¬† <WSopt=Rcv.Wind.Scale> in this segment. If the Snd.TS.OK bit is
¬†¬†¬†¬†¬†¬†¬†¬† on, include a TSopt <TSval=my.TSclock,TSecr=TS.Recent> in this
¬†¬†¬†¬†¬†¬†¬†¬† segment. Last.ACK.sent is set to RCV.NXT.

¬†¬†¬†¬†¬†¬†¬†¬† SND.NXT is set to ISS+1 and SND.UNA to ISS. The connection
¬†¬†¬†¬†¬†¬†¬†¬† state should be changed to SYN-RECEIVED. Note that any other
¬†¬†¬†¬†¬†¬†¬†¬† incoming control or data (combined with SYN) will be processed
¬†¬†¬†¬†¬†¬†¬†¬† in the SYN-RECEIVED state, but processing of SYN and ACK should
¬†¬†¬†¬†¬†¬†¬†¬† not be repeated. If the listen was not fully specified (i.e.,
¬†¬†¬†¬†¬†¬†¬†¬† the foreign socket was not fully specified), then the
¬†¬†¬†¬†¬†¬†¬†¬† unspecified fields should be filled in now.

¬†¬†¬†¬†¬†¬†¬†¬† If the SYN bit is on and the security/compartment and precedence
¬†¬†¬†¬†¬†¬†¬†¬† are acceptable then, RCV.NXT is set to SEG.SEQ+1, IRS is set to
¬†¬†¬†¬†¬†¬†¬†¬† SEG.SEQ, and any acknowledgements on the retransmission queue
¬†¬†¬†¬†¬†¬†¬†¬† which are thereby acknowledged should be removed.

¬†¬†¬†¬†¬†¬†¬†¬† Check for a Window Scale option (WSopt); if is found, save
¬†¬†¬†¬†¬†¬†¬†¬† SEG.WSopt in Snd.Wind.Scale; otherwise, set both Snd.Wind.Scale
¬†¬†¬†¬†¬†¬†¬†¬† and Rcv.Wind.Scale to zero.

¬†¬†¬†¬†¬†¬†¬†¬† Check for a TSopt option; if one is found, save SEG.TSval in
¬†¬†¬†¬†¬†¬†¬†¬† variable TS.Recent and turn on the Snd.TS.OK bit in the
¬†¬†¬†¬†¬†¬†¬†¬† connection control block. If the ACK bit is set, use my.TSclock
¬†¬†¬†¬†¬†¬†¬†¬† - SEG.TSecr as the initial RTT estimate.

¬†¬†¬†¬†¬†¬†¬†¬† If SND.UNA > ISS (our SYN has been ACKed), change the connection
¬†¬†¬†¬†¬†¬†¬†¬† state to ESTABLISHED, form an ACK segment:

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>

¬†¬†¬†¬†¬†¬†¬†¬† and send it. If the Snd.Echo.OK bit is on, include a TSopt
¬†¬†¬†¬†¬†¬†¬†¬† option <TSval=my.TSclock,TSecr=TS.Recent> in this ACK segment.
¬†¬†¬†¬†¬†¬†¬†¬† Last.ACK.sent is set to RCV.NXT.

¬†¬†¬†¬†¬†¬†¬†¬† Data or controls which were queued for transmission may be
¬†¬†¬†¬†¬†¬†¬†¬† included. If there are other controls or text in the segment
¬†¬†¬†¬†¬†¬†¬†¬† then continue processing at the sixth step below where the URG
¬†¬†¬†¬†¬†¬†¬†¬† bit is checked, otherwise return.

¬†¬†¬†¬†¬†¬†¬†¬† and send it. If the Snd.Echo.OK bit is on, include a TSopt
¬†¬†¬†¬†¬†¬†¬†¬† option <TSval=my.TSclock,TSecr=TS.Recent> in this segment. If

Jacobson, Braden, & Borman [Page 34]
¬†
RFC 1323 TCP Extensions for High Performance May 1992
¬†¬†¬†¬†¬†¬†¬†¬† the Snd.WS.OK bit is on, include a WSopt option
¬†¬†¬†¬†¬†¬†¬†¬† <WSopt=Rcv.Wind.Scale> in this segment. Last.ACK.sent is set to
¬†¬†¬†¬†¬†¬†¬†¬† RCV.NXT.

¬†¬†¬†¬†¬†¬†¬†¬† If there are other controls or text in the segment, queue them
¬†¬†¬†¬†¬†¬†¬†¬† for processing after the ESTABLISHED state has been reached,
¬†¬†¬†¬†¬†¬†¬†¬† return.

¬†¬†¬†¬†¬†¬† fifth, if neither of the SYN or RST bits is set then drop the
¬†¬†¬†¬†¬†¬† segment and return.
¬†¬†¬†¬† Otherwise,

¬†¬†¬†¬† First, check sequence number

¬†¬†¬†¬†¬†¬† SYN-RECEIVED STATE
¬†¬†¬†¬†¬†¬† ESTABLISHED STATE
¬†¬†¬†¬†¬†¬† FIN-WAIT-1 STATE
¬†¬†¬†¬†¬†¬† FIN-WAIT-2 STATE
¬†¬†¬†¬†¬†¬† CLOSE-WAIT STATE
¬†¬†¬†¬†¬†¬† CLOSING STATE
¬†¬†¬†¬†¬†¬† LAST-ACK STATE
¬†¬†¬†¬†¬†¬† TIME-WAIT STATE

¬†¬†¬†¬†¬†¬†¬†¬† Segments are processed in sequence. Initial tests on arrival
¬†¬†¬†¬†¬†¬†¬†¬† are used to discard old duplicates, but further processing is
¬†¬†¬†¬†¬†¬†¬†¬† done in SEG.SEQ order. If a segment's contents straddle the
¬†¬†¬†¬†¬†¬†¬†¬† boundary between old and new, only the new parts should be
¬†¬†¬†¬†¬†¬†¬†¬† processed.

¬†¬†¬†¬†¬†¬†¬†¬† Rescale the received window field:

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† TrueWindow = SEG.WND << Snd.Wind.Scale,

¬†¬†¬†¬†¬†¬†¬†¬† and use "TrueWindow" in place of SEG.WND in the following steps.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† If SEG.TSval < TS.Recent, then test whether connection has
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† been idle less than 24 days; if both are true, then the
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† segment is not acceptable; follow steps below for an
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† unacceptable segment.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† If SEG.SEQ is equal to Last.ACK.sent, then save SEG.ECopt in
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† variable TS.Recent.
Jacobson, Braden, & Borman [Page 35]
¬†
RFC 1323 TCP Extensions for High Performance May 1992
¬†¬†¬†¬†¬†¬†¬†¬† There are four cases for the acceptability test for an incoming
¬†¬†¬†¬†¬†¬†¬†¬† segment:

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† ...

¬†¬†¬†¬†¬†¬†¬†¬† If an incoming segment is not acceptable, an acknowledgment
¬†¬†¬†¬†¬†¬†¬†¬† should be sent in reply (unless the RST bit is set, if so drop
¬†¬†¬†¬†¬†¬†¬†¬† the segment and return):

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>

¬†¬†¬†¬†¬†¬†¬†¬† Last.ACK.sent is set to SEG.ACK of the acknowledgment. If the
¬†¬†¬†¬†¬†¬†¬†¬† Snd.Echo.OK bit is on, include the Timestamps option
¬†¬†¬†¬†¬†¬†¬†¬† <TSval=my.TSclock,TSecr=TS.Recent> in this ACK segment. Set
¬†¬†¬†¬†¬†¬†¬†¬† Last.ACK.sent to SEG.ACK and send the ACK segment. After
¬†¬†¬†¬†¬†¬†¬†¬† sending the acknowledgment, drop the unacceptable segment and
¬†¬†¬†¬†¬†¬†¬†¬† return.

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† ...

¬†¬†¬†¬† fifth check the ACK field.

¬†¬†¬†¬†¬†¬† if the ACK bit is off drop the segment and return.

¬†¬†¬†¬†¬†¬† if the ACK bit is on

¬†¬†¬†¬†¬†¬†¬†¬† ...

¬†¬†¬†¬†¬†¬†¬†¬† ESTABLISHED STATE

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† If SND.UNA < SEG.ACK =< SND.NXT then, set SND.UNA <- SEG.ACK.
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Also compute a new estimate of round-trip time. If Snd.TS.OK
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† bit is on, use my.TSclock - SEG.TSecr; otherwise use the
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† elapsed time since the first segment in the retransmission
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† queue was sent. Any segments on the retransmission queue
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† which are thereby entirely acknowledged...

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† ...

¬†¬†¬†¬† Seventh, process the segment text.

¬†¬†¬†¬†¬†¬† ESTABLISHED STATE
¬†¬†¬†¬†¬†¬† FIN-WAIT-1 STATE
¬†¬†¬†¬†¬†¬† FIN-WAIT-2 STATE

¬†¬†¬†¬†¬†¬†¬†¬† If the Snd.TS.OK bit is on, include Timestamps option
¬†¬†¬†¬†¬†¬†¬†¬† <TSval=my.TSclock,TSecr=TS.Recent> in this ACK segment. Set
¬†¬†¬†¬†¬†¬†¬†¬† Last.ACK.sent to SEG.ACK of the acknowledgment, and send it.
¬†¬†¬†¬†¬†¬†¬†¬† This acknowledgment should be piggy-backed on a segment being
¬†¬†¬†¬†¬†¬†¬†¬† transmitted if possible without incurring undue delay.
¬†¬†¬†¬†¬†¬†¬†¬†¬† ...
Security Considerations