Most oftenly they were severe enough to cause the entire job to fail. This indicated that I needed to raise hbase.regionserver.lease.period, which says how long a scanner lives between calls to scanner.next(). This however didn’t help, apparently you also need to raise hbase.rpc.timeout (the Exception that indicated this was hidden in log level DEBUG, so took a while to realise that).