We propose a new network structural similarity metric-based clustering protocol NCEM for clustering the noisy cryo-EM images. We first construct an image complex network for all the cryo-EM single particle images, where each image is represented as a node in the network. Then the similarity between two images is refined from network structural geometry. By extending the similarity measurement from two independent images to their corresponding neighbored sets in the network, this new NCEM has typical advantages over direct measurement of two images for its noise resistance by using the structural information of the network. This study is published in Yin et.al, Journal of Chemical Information and Modeling, 2019.

We develop a new annotator for the fruit fly embryonic images, AnnoFly. Driven by an attention-enhanced RNN model, it can weigh images of different qualities, so as to focus on the most informative image patterns. We assess the new model on three standard data sets. The experimental results reveal that the attention-based model provides a transparent approach for identifying the important images for labeling, and it substantially enhances the accuracy compared with the existing annotation methods, including both single-instance and multi-instance learning methods. This study is published in Yang et.al, Bioinformatics, 2019.

We propose a deep learning-based method, iDeepS, to simultaneously identify the binding sequence and structure motifs from RNA sequences using convolutional neural networks (CNNs) and a bidirectional long short term memory network (BLSTM). We first perform one-hot encoding for both the sequence and predicted secondary structure, to enable subsequent convolution operations. To reveal the hidden binding knowledge from the observed sequences, the CNNs are applied to learn the abstract features. Considering the close relationship between sequence and predicted structures, we use the BLSTM to capture possible long range dependencies between binding sequence and structure motifs identified by the CNNs. Finally, the learned weighted representations are fed into a classification layer to predict the RBP binding sites. This study is published in Pan et.al, BMC Genomics, 2018.

How to measure the resolution of a reconstructed 3D density map is an important problem of the Single-Particle Reconstruction (SPR) of cryo-EM images. It plays a critical role for promoting methodology development of SPR and structural biology. Due to there is no benchmark map in a new structure generation, how to realize the resolution estimation of a new map is still an open question. We proposed a new self-reference-based resolution estimation protocol SRes, which only requires a single reconstructed 3D map for the purpose of resolution measurement. The core idea in SRes is performing a multi-scale spectral analysis on the map through multiple size-variable masks segmenting the map. The new SRes approach has provided a new routine for measuring the resolution from a single density map. This study is published in Yang et.al, Journal of Chemical Information and Modeling, 2018.

The lncLocator is a new ensemble classifier-based predictor for predicting the lncRNA subcellular localizations developed by Shen lab. The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. To fully exploit lncRNA sequence information, we adopt both k-mer features and high-level abstraction features generated by unsupervised deep models, and construct four classifiers by feeding these two types of features to support vector machine and random forest for predictions. This study is published in Cao et.al, Bioinformatics, 2018.

Enriched RNA-protein binding motifs revealed by new iDeepE model. RNA-binding proteins take over 5–10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. We present a deep learning-based method iDeepE to predict the RBP binding sites from sequences alone by fusing the local multi-channel convolutional neural networks and global convolutional neural networks. It is able to mine new binding motifs from big data pool efficiently. This study is published in Pan and Shen, Bioinformatics, 2018.

We have developed a new cell tracking approach Hift to construct the cell lineage. The quantitative analysis of the cell population trajectories has a widely applications in revealing the complex mechanisms of organisms in the micro-world, such as microtubule, stem cells and embryo. For instance, to understand how the drug effects on cells, or study the propagation process of embryo cells, even analyze the cell cycle, accurate tracking the cell population and extract the motion features is critical. Accurate cell tracking and lineage construction under microscopy has played an important role in analyzing cell migration, mitosis and proliferation. In the last decade, this labor-intensive manual analysis was gradually replaced by automated cell tracking methods. The new hierarchically tracking method Hift is robust to the cell morphologies or staining. The paper is published in Zhi, et.al, Neurocomputing, 2018.

Inter-residue contacts in proteins have been widely acknowledged to be valuable for protein 3D structure prediction. Accurate prediction of long-range transmembrane inter-helix residue contacts can significantly improve the quality of simulated membrane protein models. We found that deep convolutional neural network can mine latent residue contact patterns and thus improve inter-helix residue contact prediction. The new MemBrain is a two-stage inter-helix contact predictor. The first stage takes sequence-based features as inputs and outputs coarse contact probabilities for each residue pair, which will be further fed into convolutional neural network together with predictions from three direct-coupling analysis approaches in the second stage. The study is published in Jing Yang and Hong-Bin Shen, Bioinformatics, 2018, 34: 230-238.

AdipoCount, a new obesity cell segmentation and counting system. Obesity has spread worldwide and become a common health problem in modern society. One typical feature of obesity is the excessive accumulation of fat in adipocytes, which occurs through the following two physiological phenomena: hyperplasia (increase in quantity) and hypertrophy (increase in size) of adipocytes. In clinical and scientific research, the accurate quantification of the number and diameter of adipocytes is necessary for assessing obesity. We have developed a new bioimage-understanding based automatic adipocyte counting system, AdipoCount, which is accurate and supports further manual interaction. The outputs of this system are the labels and the statistical data of all adipose cells in the image. AdipoCount is published in Zhi et.al, Frontiers in Physiology, 2018, 9: 85.

Shen Group's project "Artificial Intelligence Algorithm Development for Biological Medical Big Data Understanding and Its Online Prediction Application Systems" has been elected to the Final list of SAIL award of Artificial Intelligence World Innovations 2018.