Purpose :
Glaucoma is a challenging and usually silent disease process that often presents with advanced optic nerve damage and visual field defects. Glaucoma and its various sub-types constitute a group of diseases which are the leading cause of irreversible blindness worldwide. In the current study, we present a novel population-based data mining analysis to gain a deeper understanding of glaucoma risk factors through innovations in big data science.

Methods :
Patient data from large-scale electronic health records database (Health Facts® database) was processed using two distinct groups: patients with a formal diagnosis of glaucoma, and those without glaucoma. Data from a total of 830,125 ophthalmology health records were acquired, from which a subset of 134,545 glaucoma patients were taken and separately analyzed. We applied a novel, contrast pattern data mining method using big data technologies to identify differential growth pattern between groups. The degree of their differences was measured via the ratio of support between these two classes.

Results :
The percentage distribution analysis showed no difference between gender among the patients with glaucoma versus patients without glaucoma. However, using an Apriori association rule mining algorithm, there was a different growth rate between patients who were diagnosed with glaucoma versus the control group. The glaucoma patients' group showed highly contrasting patterns, with a 20.4% occurrence rate with hypertension, compared to the control group who had only a 1.9% occurrence rate. Furthermore, in the sub-group analysis, glaucoma patients with hypertension in combination with a smoking history displayed a high contrasting pattern (growth 22.602; confidence 0.772; support 0.102) when compared to non-glaucoma control patients (growth 0.030; confidence 0.145; support 0.003), suggesting that this sub-group of patients has a much higher risk of developing glaucoma.

Conclusions :
The present study is one of the first of its kind to use big data analysis to analyze glaucoma risk factors in a large-scale population based retrospective study. Sub-group analysis showed a significant trend toward the prevalence of glaucoma in hypertensive smokers. Without a doubt, this study re-affirms the vast potential of applying data mining methods for glaucoma risk factor assessment using large-scale electronic health record data.

This is an abstract that was submitted for the 2018 ARVO Annual Meeting, held in Honolulu, Hawaii, April 29 - May 3, 2018.