Accurate road detection and centerline extraction from very high resolution (VHR) remote sensing imagery are of central importance in a wide range of applications. Due to the complex backgrounds and occlusions of trees and cars, most road detection methods bring in the heterogeneous segments; besides for the centerline extraction task, most current approaches fail to extract a wonderful centerline network that appears smooth, complete, as well as single-pixel width. To address the above-mentioned complex issues, we propose a novel deep model, i.e., a cascaded end-to-end convolutional neural network (CasNet), to simultaneously cope with the road detection and centerline extraction tasks. Specifically, CasNet consists of two networks. One aims at the road detection task, whose strong representation ability is well able to tackle the complex backgrounds and occlusions of trees and cars. The other is cascaded to the former one, making full use of the feature maps produced formerly, to obtain the good centerline extraction. Finally, a thinning algorithm is proposed to obtain smooth, complete, and single-pixel width road centerline network. Extensive experiments demonstrate that CasNet outperforms the state-of-the-art methods greatly in learning quality and learning speed. That is, CasNet exceeds the comparing methods by a large margin in quantitative performance, and it is nearly 25 times faster than the comparing methods. Moreover, as another contribution, a large and challenging road centerline data set for the VHR remote sensing image will be publicly available for further studies.