If using ORC format and you want improve the split computation time, you can set
the value of this parameter to match the number of available processors. By default,
this parameter is set to 10.

This parameter controls the number of parallel
threads involved in computing splits. For Parquet computing splits is still
single-threaded, so split computations can take longer with Parquet and
S3/ADLS/WASB.

hive.orc.splits.include.file.footer

If using ORC format with ETL file split strategy, you can set this parameter to
"true" in order to use existing file footer information in split payload.

You can set these parameters using --hiveconf option in Hive CLI or using the
set command in Beeline.

Query launches can be slightly slower if there are no stats available or when
hive.stats.fetch.partition.stats=false. In such cases, Hive ends up
looking at file sizes for every file that it tries to access.

Tuning hive.metastore.fshandler.threads helps reduce the overall time taken for
the metastore operation.

fs.trash.interval

Drop table can be slow in object stores such as S3 because the action involves
moving files to trash (a copy + delete). To remedy this, you can set
fs.trash.interval=0 to completely skip trash.

You can set these parameters using --hiveconf option in Hive CLI or using the
set command in Beeline.

Accelerating Inserts in Hive

When inserting data, Hive moves data from a temporary folder to the final location. This
move operation is actually a copy+delete action, which is expensive in object stores such as
S3; the more data is being written out to the object store, the more expensive the operation
is.

To accelerate the process, you can tune hive.mv.files.thread, depending on
the size of your dataset (default is 15). You can set it in hive-site.xml.