Compact

Introduced in: v3.4.5

Compacts the data of a collection
collection.compact()

Compacts the data of a collection in order to reclaim disk space. For the
MMFiles storage engine, the operation will reset the collection’s last
compaction timestamp, so it will become a candidate for compaction. For the
RocksDB storage engine, the operation will compact the document and index
data by rewriting the underlying .sst files and only keeping the relevant
entries.

Under normal circumstances running a compact operation is not necessary,
as the collection data will eventually get compacted anyway. However, in
some situations, e.g. after running lots of update/replace or remove
operations, the disk data for a collection may contain a lot of outdated data
for which the space shall be reclaimed. In this case the compaction operation
can be used.

Properties

gets or sets the properties of a collectioncollection.properties()

Returns an object containing all collection properties.

waitForSync: If true creating a document will only return
after the data was synced to disk.

journalSize : The size of the journal in bytes.
This option is meaningful for the MMFiles storage engine only.

isVolatile: If true then the collection data will be
kept in memory only and ArangoDB will not write or sync the data
to disk.
This option is meaningful for the MMFiles storage engine only.

keyOptions (optional) additional options for key generation. This is
a JSON array containing the following attributes (note: some of the
attributes are optional):

type: the type of the key generator used for the collection.

allowUserKeys: if set to true, then it is allowed to supply
own key values in the _key attribute of a document. If set to
false, then the key generator will solely be responsible for
generating keys and supplying own key values in the _key attribute
of documents is considered an error.

increment: increment value for autoincrement key generator.
Not used for other key generator types.

offset: initial offset value for autoincrement key generator.
Not used for other key generator types.

indexBuckets: number of buckets into which indexes using a hash
table are split. The default is 16 and this number has to be a
power of 2 and less than or equal to 1024.
This option is meaningful for the MMFiles storage engine only.

For very large collections one should increase this to avoid long pauses
when the hash table has to be initially built or resized, since buckets
are resized individually and can be initially built in parallel. For
example, 64 might be a sensible value for a collection with 100
000 000 documents. Currently, only the edge index respects this
value, but other index types might follow in future ArangoDB versions.
Changes (see below) are applied when the collection is loaded the next
time.

In a cluster setup, the result will also contain the following attributes:

numberOfShards: the number of shards of the collection.

shardKeys: contains the names of document attributes that are used to
determine the target shard for documents.

replicationFactor: determines how many copies of each shard are kept
on different DBServers. Has to be in the range of 1-10 (Cluster only)

minReplicationFactor : determines the number of minimal shard copies kept on
different DBServers, a shard will refuse to write if less than this amount
of copies are in sync. Has to be in the range of 1-replicationFactor (Cluster only)

shardingStrategy: the sharding strategy selected for the collection.
This attribute will only be populated in cluster mode and is not populated
in single-server mode.

collection.properties(properties)

Changes the collection properties. properties must be an object with
one or more of the following attribute(s):

waitForSync: If true creating a document will only return
after the data was synced to disk.

journalSize : The size of the journal in bytes.
This option is meaningful for the MMFiles storage engine only.

indexBuckets : See above, changes are only applied when the
collection is loaded the next time.
This option is meaningful for the MMFiles storage engine only.

replicationFactor : Change the number of shard copies kept on
different DBServers, valid values are integer numbers
in the range of 1-10 (Cluster only)

minReplicationFactor : Change the number of minimal shard copies to be in sync on
different DBServers, a shard will refuse to write if less than this amount
of copies are in sync. Has to be in the range of 1-replicationFactor (Cluster only)

Note: some other collection properties, such as type, isVolatile,
keyOptions, numberOfShards or shardingStrategy cannot be changed once
the collection is created.

Figures

alive.count: The number of currently active documents in all datafiles and
journals of the collection. Documents that are contained in the
write-ahead log only are not reported in this figure.

alive.size: The total size in bytes used by all active documents of the
collection. Documents that are contained in the write-ahead log only are
not reported in this figure.

dead.count: The number of dead documents. This includes document
versions that have been deleted or replaced by a newer version. Documents
deleted or replaced that are contained in the write-ahead log only are not
reported in this figure.

dead.size: The total size in bytes used by all dead documents.

dead.deletion: The total number of deletion markers. Deletion markers
only contained in the write-ahead log are not reporting in this figure.

datafiles.count: The number of datafiles.

datafiles.fileSize: The total filesize of datafiles (in bytes).

journals.count: The number of journal files.

journals.fileSize: The total filesize of the journal files
(in bytes).

compactors.count: The number of compactor files.

compactors.fileSize: The total filesize of the compactor files
(in bytes).

shapefiles.count: The number of shape files. This value is
deprecated and kept for compatibility reasons only. The value will always
be 0 since ArangoDB 2.0 and higher.

shapefiles.fileSize: The total filesize of the shape files. This
value is deprecated and kept for compatibility reasons only. The value will
always be 0 in ArangoDB 2.0 and higher.

shapes.count: The total number of shapes used in the collection.
This includes shapes that are not in use anymore. Shapes that are contained
in the write-ahead log only are not reported in this figure.

shapes.size: The total size of all shapes (in bytes). This includes
shapes that are not in use anymore. Shapes that are contained in the
write-ahead log only are not reported in this figure.

attributes.count: The total number of attributes used in the
collection. Note: the value includes data of attributes that are not in use
anymore. Attributes that are contained in the write-ahead log only are
not reported in this figure.

attributes.size: The total size of the attribute data (in bytes).
Note: the value includes data of attributes that are not in use anymore.
Attributes that are contained in the write-ahead log only are not
reported in this figure.

indexes.count: The total number of indexes defined for the
collection, including the pre-defined indexes (e.g. primary index).

indexes.size: The total memory allocated for indexes in bytes.

lastTick: The tick of the last marker that was stored in a journal
of the collection. This might be 0 if the collection does not yet have
a journal.

uncollectedLogfileEntries: The number of markers in the write-ahead
log for this collection that have not been transferred to journals or
datafiles.

documentReferences: The number of references to documents in datafiles
that JavaScript code currently holds. This information can be used for
debugging compaction and unload issues.

waitingFor: An optional string value that contains information about
which object type is at the head of the collection’s cleanup queue. This
information can be used for debugging compaction and unload issues.

compactionStatus.time: The point in time the compaction for the collection
was last executed. This information can be used for debugging compaction
issues.

compactionStatus.message: The action that was performed when the compaction
was last run for the collection. This information can be used for debugging
compaction issues.

Note: collection data that are stored in the write-ahead log only are
not reported in the results. When the write-ahead log is collected, documents
might be added to journals and datafiles of the collection, which may modify
the figures of the collection. Also note that waitingFor and compactionStatus
may be empty when called on a coordinator in a cluster.

Additionally, the filesizes of collection and index parameter JSON files are
not reported. These files should normally have a size of a few bytes
each. Please also note that the fileSize values are reported in bytes
and reflect the logical file sizes. Some filesystems may use optimizations
(e.g. sparse files) so that the actual physical file size is somewhat
different. Directories and sub-directories may also require space in the
file system, but this space is not reported in the fileSize results.

That means that the figures reported do not reflect the actual disk
usage of the collection with 100% accuracy. The actual disk usage of
a collection is normally slightly higher than the sum of the reported
fileSize values. Still the sum of the fileSize values can still be
used as a lower bound approximation of the disk usage.

GetResponsibleShard

returns the responsible shard for the given document.
collection.getResponsibleShard(document)

Returns a string with the responsible shard’s ID. Note that the
returned shard ID is the ID of responsible shard for the document’s
shard key values, and it will be returned even if no such document exists.

The getResponsibleShard() method can only be used on Coordinators
in clusters.

Shards

returns the available shards for the collection.
collection.shards(details)

If details is not set, or set to false, returns an array with the names of
the available shards of the collection.

If details is set to true, returns an object with the shard names as
object attribute keys, and the responsible servers as an array mapped to each
shard attribute key.

The leader shards are always first in the arrays of responsible servers.

Revision

The revision id is updated when the document data is modified, either by
inserting, deleting, updating or replacing documents in it.

The revision id of a collection can be used by clients to check whether
data in a collection has changed or if it is still unmodified since a
previous fetch of the revision id.

The revision id returned is a string value. Clients should treat this value
as an opaque string, and only use it for equality/non-equality comparisons.

Path

returns the physical path of the collection
collection.path()

The path operation returns a string with the physical storage path for
the collection data.

The path() method will return nothing meaningful in a cluster.
In a single-server ArangoDB, this method will only return meaningful data
for the MMFiles storage engine.

Checksum

calculates a checksum for the data in a collection
collection.checksum(withRevisions, withData)

The checksum operation calculates an aggregate hash value for all document
keys contained in collection collection.

If the optional argument withRevisions is set to true, then the
revision ids of the documents are also included in the hash calculation.

If the optional argument withData is set to true, then all user-defined
document attributes are also checksummed. Including the document data in
checksumming will make the calculation slower, but is more accurate.

The checksum calculation algorithm changed in ArangoDB 3.0, so checksums from
3.0 and earlier versions for the same data will differ.

The checksum() method can not be used in clusters.

Unload

unloads a collection
collection.unload()

Starts unloading a collection from memory. Note that unloading is deferred
until all query have finished.

Rename

renames a collection
collection.rename(new-name)

Renames a collection using the new-name. The new-name must not
already be used for a different collection. new-name must also be a
valid collection name. For more information on valid collection names please
refer to the naming conventions.

If renaming fails for any reason, an error is thrown.
If renaming the collection succeeds, then the collection is also renamed in
all graph definitions inside the _graphs collection in the current
database.

Rotate

rotates the current journal of a collection
collection.rotate()

Rotates the current journal of a collection. This operation makes the
current journal of the collection a read-only datafile so it may become a
candidate for garbage collection. If there is currently no journal available
for the collection, the operation will fail with an error.

The rotate() method is specific to the MMFiles storage engine and can
not be used in clusters.

You need appropriate user permissions to execute this.

To do the rename collections in first place you need to have
administrative rights on the database.

To have access to the resulting renamed collection you either need
to have access to all collections of that database (*) or a main
system administrator has to give you access to the newly named one.