1State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, Research Center on Flood
and Drought Disaster Reduction of the Ministry of Water Resources, China Institute of Water Resources
and Hydropower Research, Beijing 100038, P.R. China2College of Hydrology and Water Resources, Hohai University, Nanjing 210098, P.R. China3State Key Laboratory of Hydroscience and Engineering, Department of Hydraulic Engineering, Tsinghua University,
Beijing 100084, P.R. China4Department of Civil Engineering and Environmental Science, University of Oklahoma, Norman, OK, USA

With the development of the economy and industrial construction, air quality deteriorates dramatically in China and seriously threatens people’s health. To investigate which factors most affect air quality and provide a useful tool to assist the prediction and early warning of air pollution in urban areas, we applied a sensor that observed air quality big data, information theory-based predictor significance identification, and PEK-based machine learning to air quality index (AQI) analysis and prediction in this paper. We found that the stability of air quality has a high relationship with absolute air quality, and that improvement of air quality can also improve stability. Air quality in southern and western cities is better than that of northern and eastern cities. AQI time series of cities with closer geophysical locations have a closer relationship with others. PM2.5, PM10, and SO2 are the most important impact factors. The machine learning-based prediction is useful for AQI prediction and early warning. This tool could be applied to other city’s air quality monitoring and early warning to further verify its effectiveness and robustness. Finally, we suggested the use of a training data sample with better quality and representatives to further improve AQI prediction model performance in future research.