Very similarly to Lucene41StoredFieldsFormat, this format is based
on compressed chunks of data, with document-level granularity so that a
document can never span across distinct chunks. Moreover, data is made as
compact as possible:

textual data is compressed using the very light,
LZ4 compression algorithm,

a data file where terms, frequencies, positions, offsets and payloads
are stored,

an index file, loaded into memory, used to locate specific documents in
the data file.

Looking up term vectors for any document requires at most 1 disk seek.

File formats

A vector data file (extension .tvd). This file stores terms,
frequencies, positions, offsets and payloads for every document. Upon writing
a new segment, it accumulates data into memory until the buffer used to store
terms and payloads grows beyond 4KB. Then it flushes all metadata, terms
and positions to disk using LZ4
compression for terms and payloads and
blocks of packed ints for positions.

TotalOffsets is the sum of frequencies of terms of all fields that have offsets

AvgCharsPerTerm: average number of chars per term, encoded as a float on 4 bytes. They are not present if no field has both positions and offsets enabled.

StartOffsetDelta: (startOffset - previousStartOffset - AvgCharsPerTerm * PositionDelta). previousStartOffset is 0 for the first offset and AvgCharsPerTerm is 0 if the field has no positions using blocks of 64 packed ints