Automatic QSO Selection Algorithm Using Time Series Analysis and Machine Learning

Kim, Dae-Won

We present a new QSO selection algorithm using time series analysis and supervised machine learning. To characterize the lightcurves, we extracted multiple times series features such as period, amplitude, autocorrelation function, etc. We then used Support Vector Machine (SVM), a supervised machine learning algorithm, to separate QSOs from other types of variable stars (e.g. RR Lyraes, Cepheids, eclipsing binaries, long period variables and Be stars), microlensing events and non-variable stars.

In order to train the QSO SVM model, we used 58 known QSOs, 1,600 variable stars and 4,300 non-variable stars from the MAssive Compact Halo Objects (MACHO) database. Cross-validation test shows that the model identifies 80% of known QSOs and have 25% false positive rate. Most of the false positives during the cross-validation are Be stars, known to show similar variability characteristic with QSOs.

We applied the trained QSO SVM model to the MACHO Large Magellanic Cloud (LMC) dataset, which consists of total 40million time series, and found 1,097 QSO candidates. We crossmatched the candidates with several astronomical catalogs including the Spizter SAGE (Surveying the Agents of a Galaxy's Evolution) LMC catalog and various X-ray catalogs. The results suggest that the most of the candidates are likely true QSOs.