Purpose :
To identify evidence-based criteria for assessing whether a visual field (VF) is likely to be reliable.

Methods :
10262 VFs from 1538 eyes of 909 subjects with manifest or suspected glaucoma and ≥ 5 tests were used to predict mean deviation (MD) with linear mixed-effects regression models. Differences between observed and predicted MD (ΔMD) were calculated as a reliability measure. Logistic regression analyses were used to identify factors that predicted unexpectedly low (ΔMD<-1dB, “worse” field) or high (ΔMD>1dB, “better” field) sensitivity.

Results :
Around 20% of VFs had abnormally high or low sensitivity in the absence of any false positives (FPs), false negatives (FNs) or fixation losses (FLs). VFs with moderate (-6≥MD>-12) or severe damage (MD≤-12) had a greater likelihood of both unexpectedly high and low sensitivity compared to VFs with mild damage (MD>-6) (OR>9.0, p<0.001 for all). Longer test duration (TD) predicted a higher likelihood of unexpectedly low sensitivity at all levels of VF damage (OR=1.37, 1.75, and 2.56 per 1 minute increase in TD for mild, moderate and severe damage, respectively; p<0.001). Unexpectedly low sensitivity was also more common in late afternoon tests (2-5pm) compared to early morning tests (7-10am) (OR=1.26, p=0.024). FNs predicted a higher likelihood of unexpectedly low sensitivity in eyes with mild or moderate VF damage (OR=1.72 and 1.35 dB per 10% increase in false negatives, p<0.001), but not for eyes with severe VF damage (p=0.53). FPs predicted a greater likelihood of unexpectedly high sensitivity (OR=1.85, 2.69, and 2.44 per 10% increase in false positive responses for mild, moderate and severe damage; p<0.001). FLs were not significantly associated with a higher likelihood of unexpectedly high or low sensitivity at any level of visual field damage.

Conclusions :
FLs are a poor method to assess whether a VF is likely to be reliable. FNs, FPs, TD, and time of testing considered in conjunction with the severity of VF damage can help predict if a VF is likely to be unreliable, though one out of five tests that are within machine-defined normal limits for FPs, FNs and FLs still have poor reliability. Our results provide a framework for gauging the likelihood that a given VF is reliable, though the imperfect nature of the models highlight the need to integrate VF data with other testing and clinical data when making treatment decisions.

This is an abstract that was submitted for the 2018 ARVO Annual Meeting, held in Honolulu, Hawaii, April 29 - May 3, 2018.