Joe McDonnell has posted comments on this change.
Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
......................................................................
Patch Set 2:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/8034/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:
PS2, Line 213: dict_decoder_(parent_->dictionary_pool_.get()),
We should think about having a separate MemPool for parts of the dictionary that will not
be referenced by the RowBatch and can be freed at the end of the row group.
Since some datatypes are copied by value rather than pointing into the dictionary, this might
be used for those as well.
http://gerrit.cloudera.org:8080/#/c/8034/2/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:
PS2, Line 229: memcpy(val_ptr, dict_[index], sizeof(T));
The memcpy is doing a dereference here and is not different from *val_ptr = *dict_[index];
PS2, Line 238: std::vector<T*> dict_;
Using a T* means that GetValue() must do a dereference to obtain the value. This is a very
performance sensitive codepath, so we need to avoid this extra dereference. This should maintain
the value directly in the vector.
--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-HasComments: Yes