Note

end_document can be called multiple times without an intervening begin_document call. For example, if one of the documents
to be indexed is empty, the database server may call end_document for that document without calling begin_document.

The get_next_piece function should filter out the unnecessary data such as formatting information and images from the incoming
byte stream and return the next chunk of filtered data in a self-allocated buffer.