The next problem that needed to be solved was how to make an inverted index
updatable without losing the benefits of immutability? The answer turned out
to be: use more than one index.

Instead of rewriting the whole inverted index, add new supplementary indices
to reflect more-recent changes. Each inverted index can be queried in turn—starting with the oldest—and the results combined.

To add to the confusion, a Lucene index is what we call a shard in
Elasticsearch, while an index in Elasticsearch is a collection of shards.
When Elasticsearch searches an index, it sends the query out to a copy of
every shard (Lucene index) that belongs to the index, and then reduces the
per-shards results to a global result set, as described in
Distributed Search Execution.

A new commit point is written to disk, which includes the name of the new
segment.

The disk is fsync’ed—all writes waiting in the filesystem cache are
flushed to disk, to ensure that they have been physically written.

The new segment is opened, making the documents it contains visible to search.

The in-memory buffer is cleared, and is ready to accept new documents.

Figure 17. A Lucene index with new documents in the in-memory buffer, ready to commit

Figure 18. After a commit, a new segment is added to the commit point and the buffer is cleared

When a query is issued, all known segments are queried in turn. Term
statistics are aggregated across all segments to ensure that the relevance of
each term and each document is calculated accurately. In this way, new
documents can be added to the index relatively cheaply.

Segments are immutable, so documents cannot be removed from older segments,
nor can older segments be updated to reflect a newer version of a document.
Instead, every commit point includes a .del file that lists which documents
in which segments have been deleted.

When a document is “deleted,” it is actually just marked as deleted in the
.del file. A document that has been marked as deleted can still match a
query, but it is removed from the results list before the final query results
are returned.

Document updates work in a similar way: when a document is updated, the old
version of the document is marked as deleted, and the new version of the
document is indexed in a new segment. Perhaps both versions of the document
will match a query, but the older deleted version is removed before the query
results are returned.

In Segment Merging, we show how deleted documents are purged from
the filesystem.