Abstract: Recently, Deep Neural Networks (DNNs) have been achieving impressive results
on wide range of tasks. However, they suffer from being well-calibrated. In
decision-making applications, such as autonomous driving or medical diagnosing,
the confidence of deep networks plays an important role to bring the trust and
reliability to the system. To calibrate the deep networks' confidence, many
probabilistic and measure-based approaches are proposed. Temperature Scaling
(TS) is a state-of-the-art among measure-based calibration methods which has
low time and memory complexity as well as effectiveness. In this paper, we
study TS and show it does not work properly when the validation set that TS
uses for calibration has small size or contains noisy-labeled samples. TS also
cannot calibrate highly accurate networks as well as non-highly accurate ones.
Accordingly, we propose Attended Temperature Scaling (ATS) which preserves the
advantages of TS while improves calibration in aforementioned challenging
situations. We provide theoretical justifications for ATS and assess its
effectiveness on wide range of deep models and datasets. We also compare the
calibration results of TS and ATS on skin lesion detection application as a
practical problem where well-calibrated system can play important role in
making a decision.