Network Working Group J. Roatch
Internet-Draft
Intended status: Informational C. Bormann
Expires: August 31, 2019 Universitaet Bremen TZI
February 27, 2019
Concise Binary Object Representation (CBOR) Tags for Typed Arraysdraft-ietf-cbor-array-tags-01
Abstract
The Concise Binary Object Representation (CBOR, RFC 7049) is a data
format whose design goals include the possibility of extremely small
code size, fairly small message size, and extensibility without the
need for version negotiation.
The present document makes use of this extensibility to define a
number of CBOR tags for typed arrays of numeric data, as well as two
additional tags for multi-dimensional and homogeneous arrays. It is
intended as the reference document for the IANA registration of the
CBOR tags defined.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 31, 2019.
Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
Roatch & Bormann Expires August 31, 2019 [Page 1]

Internet-Draft CBOR tags for typed arrays February 20191.1. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
The term "byte" is used in its now customary sense as a synonym for
"octet". Where bit arithmetic is explained, this document uses the
notation familiar from the programming language C (including C++14's
0bnnn binary literals), except that the operator "**" stands for
exponentiation.
2. Typed Arrays
Typed arrays are homogeneous arrays of numbers, all of which are
encoded in a single form of binary representation. The concatenation
of these representations is encoded as a single CBOR byte string
(major type 2), enclosed by a single tag indicating the type and
encoding of all the numbers represented in the byte string.
2.1. Types of numbers
Three classes of numbers are of interest: unsigned integers (uint),
signed integers (two's complement, sint), and IEEE 754 binary
floating point numbers (which are always signed). For each of these
classes, there are multiple representation lengths in active use:
+-----------+--------+--------+-----------+
| Length ll | uint | sint | float |
+-----------+--------+--------+-----------+
| 0 | uint8 | sint8 | binary16 |
| 1 | uint16 | sint16 | binary32 |
| 2 | uint32 | sint32 | binary64 |
| 3 | uint64 | sint64 | binary128 |
+-----------+--------+--------+-----------+
Table 1: Length values
Here, sintN stands for a signed integer of exactly N bits (for
instance, sint16), and uintN stands for an unsigned integer of
exactly N bits (for instance, uint32). The name binaryN stands for
the number form of the same name defined in IEEE 754.
Since one objective of these tags is to be able to directly ship the
ArrayBuffers underlying the Typed Arrays without re-encoding them,
Roatch & Bormann Expires August 31, 2019 [Page 3]

Internet-Draft CBOR tags for typed arrays February 2019
and these may be either in big endian (network byte order) or in
little endian form, we need to define tags for both variants.
In total, this leads to 24 variants. In the tag, we need to express
the choice between integer and floating point, the signedness (for
integers), the endianness, and one of the four length values.
In order to simplify implementation, a range of tags is being
allocated that allows retrieving all this information from the bits
of the tag: Tag values from TBD64 to TBD87.
The value is split up into 5 bit fields: TBD0b010_f_s_e_ll, as
detailed in Table 2.
+----------+-------------------------------------------------------+
| Field | Use |
+----------+-------------------------------------------------------+
| TBD0b010 | a constant such as '010', to be defined |
| f | 0 for integer, 1 for float |
| s | 0 for unsigned integer or float, 1 for signed integer |
| e | 0 for big endian, 1 for little endian |
| ll | A number for the length (Table 1). |
+----------+-------------------------------------------------------+
Table 2: Bit fields in the low 8 bits of the tag
The number of bytes in each array element can then be calculated by
"2**(f + ll)" (or "1 << (f + ll)" in a typical programming language).
(Notice that f and ll are the lsb of each nibble (4bit) in the byte.)
In the CBOR representation, the total number of elements in the array
is not expressed explicitly, but implied from the length of the byte
string and the length of each representation. It can be computed
inversely to the previous formula: "bytelength >> (f + ll)".
For the uint8/sint8 values, the endianness is redundant. Only the
big endian variant is used. The little endian variant of sint8 MUST
NOT be used, its tag is marked as reserved. As a special case, the
tag number that would have been the little endian variant of uint8 is
used to signify that the numbers in the array are using clamped
conversion from integers, as described in more detail in Section 7.1
of [TypedArrayUpdate].
3. Additional Array Tags
This specification defines three additional array tags. The Multi-
dimensional Array tags can be combined with classical CBOR arrays as
well as with Typed Arrays in order to build multi-dimensional arrays
Roatch & Bormann Expires August 31, 2019 [Page 4]

Internet-Draft CBOR tags for typed arrays February 2019
with constant numbers of elements in the sub-arrays. The Homogeneous
Array tag can be used to facilitate the ingestion of homogeneous
classical CBOR arrays, providing performance advantages even when a
Typed Array does not apply.
3.1. Multi-dimensional Array
Tag: TBD40
Data Item: array (major type 4) of two arrays, one array (major type
4) of dimensions, and one array (major type 4, a Typed Array, or a
Homogeneous Array) of elements
A multi-dimensional array is represented as a tagged array that
contains two (one-dimensional) arrays. The first array defines the
dimensions of the multi-dimensional array (in the sequence of outer
dimensions towards inner dimensions) while the second array
represents the contents of the multi-dimensional array. If the
second array is itself tagged as a Typed Array then the element type
of the multi-dimensional array is known to be the same type as that
of the Typed Array. Data in the Typed Array byte string consists of
consecutive values where the last dimension is considered contiguous
(row-major order).
Figure 1 shows a declaration of a two-dimensional array in the C
language, a representation of that in CBOR using both a
multidimensional array tag and a typed array tag.
uint16_t a[2][3] = {
{2, 4, 8}, /* row 0 */
{4, 16, 256},
};
<Tag TBD40> # multi-dimensional array tag
82 # array(2)
82 # array(2)
02 # unsigned(2) 1st Dimension
03 # unsigned(3) 2nd Dimension
<Tag TBD65> # uint16 array
4c # byte string(12)
0002 # unsigned(2)
0004 # unsigned(4)
0008 # unsigned(8)
0004 # unsigned(4)
0010 # unsigned(16)
0100 # unsigned(256)
Figure 1: Multi-dimensional array in C and CBOR
Roatch & Bormann Expires August 31, 2019 [Page 5]

Internet-Draft CBOR tags for typed arrays February 20194. Discussion
Support for both little- and big-endian representation may seem out
of character with CBOR, which is otherwise fully big endian. This
support is in line with the intended use of the typed arrays and the
objective not to require conversion of each array element.
This specification allocates a sizable chunk out of the single-byte
tag space. This use of code point space is justified by the wide use
of typed arrays in data interchange.
Providing a column-major order variant of the multi-dimensional array
may seem superfluous to some, and useful to others. It is cheap to
define the additional tag so it is available when actually needed.
Allocating it out of a different number space makes the preference
for row-major evident.
Applying a Homogeneous Array tag to a Typed Array would be redundant
and is therefore not provided by the present specification.
Roatch & Bormann Expires August 31, 2019 [Page 8]

Internet-Draft CBOR tags for typed arrays February 20196. IANA Considerations
IANA is requested to allocate the tags in Table 3, with the present
document as the specification reference. (The reserved value is
reserved for a future revision of typed array tags.)
The allocations come out of the "specification required" space
(24..255), with the exception of TBD1040, which comes out of the
"first come first served" space (256..).
Roatch & Bormann Expires August 31, 2019 [Page 10]

Internet-Draft CBOR tags for typed arrays February 2019
to the binary representation TBD0b010 in Section 2.1, which becomes
0b010 if the numbers are allocated as proposed. IANA note: To make
the calculations work, TDB64 to TBD87 need to come from a contiguous
range the start of which is divisible by 32, which they do if the
"TBD" is simply removed.
7. Security Considerations
The security considerations of RFC 7049 apply; special attention is
drawn to the second paragraph of Section 8 of RFC 7049. The tags
introduced here are not expected to raise security considerations
beyond those.
Roatch & Bormann Expires August 31, 2019 [Page 12]