Sepp Hochreiter has made numerous contributions in the fields of machine learning and bioinformatics. He developed the long short-term memory (LSTM) for which the first results were reported in his diploma thesis in 1991.[2] The main LSTM paper appeared in 1997[3] and is considered as a discovery that is a milestone in the timeline of machine learning. He applied biclustering methods to drug discovery and toxicology. He extended support vector machines to handle kernels that are not positive definite with the "Potential Support Vector Machine" (PSVM) model, and applied this model to feature selection, especially to gene selection for microarray data.[4] Also in biotechnology, he developed "Factor Analysis for Robust Microarray Summarization" (FARMS).[5]

In addition to his research contributions, Sepp Hochreiter is broadly active within his field: he launched the Bioinformatics Working Group at the Austrian Computer Society; he is founding board member of different bioinformatics start-up companies; he was program chair of the conference Bioinformatics Research and Development,;[6] he is a conference chair of the conference Critical Assessment of Massive Data Analysis (CAMDA); and he is editor, program committee member, and reviewer for international journals and conferences. As a faculty member at Johannes Kepler Linz, he founded the Bachelors Program in Bioinformatics,[7] which is a cross-border, double-degree study program together with the University of South-Bohemia in České Budějovice (Budweis), Czech Republic. He also established the Masters Program in Bioinformatics,[8] where he is still the acting dean of both studies.

Neural networks are different types of simplified mathematical models of biological neural networks like those in human brains. In feedforward neural networks (NNs) the information moves forward in only one direction, from the input layer that receives information from the environment, through the hidden layers to the output layer that supplies the information to the environment. Unlike NNs, recurrent neural networks (RNNs) can use their internal memory to process arbitrary sequences of inputs. If data mining is based on neural networks, overfitting reduces the network's capability to correctly process future data. To avoid overfitting, Sepp Hochreiter developed algorithms for finding low complexity neural networks like "Flat Minimum Search" (FMS),[18] which searches for a "flat" minimum — a large connected region in the parameter space where the network function is constant. Thus, the network parameters can be given with low precision which means a low complex network that avoids overfitting. Low complexity neural networks are well suited for deep learning because they control the complexity in each network layer and, therefore, learn hierarchical representations of the input.[19][20] Sepp Hochreiter's group introduced "exponential linear units" (ELUs) which speed up learning in deep neural networks and lead to higher classification accuracies. Like rectified linear units (ReLUs), leaky ReLUs (LReLUs), and parametrized ReLUs (PReLUs), ELUs alleviate the vanishing gradient problem via the identity for positive values. However, ELUs have improved learning characteristics compared to ReLUs, due to negative values which push mean unit activations closer to zero. Mean shifts toward zero speed up learning by bringing the normal gradient closer to the unit natural gradient because of a reduced bias shift effect.[21]

For unsupervised deep learning, he developed rectified factor networks (RFNs)[22][23] to efficiently construct very sparse, non-linear, high-dimensional representations of the input. RFN models identify rare and small events in the input, have a low interference between code units, have a small reconstruction error, and explain the data covariance structure. RFN learning is a generalized alternating minimization algorithm derived from the posterior regularization method which enforces non-negative and normalized posterior means.

The pharma industry sees many chemical compounds (drug candidates) fail in late phases of the drug development pipeline. These failures are caused by insufficient efficacy on the biomolecular target (on-target effect), undesired interactions with other biomolecules (off-target or side effects), or unpredicted toxic effects. The Deep Learning and biclustering methods developed by Sepp Hochreiter identified novel on- and off-target effects in various drug design projects.[24] In 2013 Sepp Hochreiter's group won the DREAM subchallenge of predicting the average toxicity of compounds.[25] In 2014 this success with Deep Learning was continued by winning the "Tox21 Data Challenge" of NIH, FDA and NCATS.[26][27][28] The goal of the Tox21 Data Challenge was to correctly predict the off-target and toxic effects of environmental chemicals in nutrients, household products and drugs. These impressive successes show Deep Learning may be superior to other virtual screening methods.[29][30]

Sepp Hochreiter developed "Factor Analysis for Bicluster Acquisition" (FABIA)[31] for biclustering that is simultaneously clustering rows and columns of a matrix. A bicluster in transcriptomic data is a pair of a gene set and a sample set for which the genes are similar to each other on the samples and vice versa. In drug design, for example, the effects of compounds may be similar only on a subgroup of genes. FABIA is a multiplicative model that assumes realistic non-Gaussian signal distributions with heavy tails and utilizes well understood model selection techniques like a variational approach in the Bayesian framework. FABIA supplies the information content of each bicluster to separate spurious biclusters from true biclusters. Sepp Hochreiter edited the reference book on biclustering which presents the most relevant biclustering algorithms, typical applications of biclustering, visualization and evaluation of biclusters, and software in R.[32]

Support vector machines (SVMs) are supervised learning methods used for classification and regression analysis by recognizing patterns and regularities in the data. Standard SVMs require a positive definite kernel to generate a squared kernel matrix from the data. Sepp Hochreiter proposed the "Potential Support Vector Machine" (PSVM),[33] which can be applied to non-square kernel matrices and can be used with kernels that are not positive definite. For PSVM model selection he developed an efficient sequential minimal optimization algorithm.[34] The PSVM minimizes a new objective which ensures theoretical bounds on the generalization error and automatically selects features which are used for classification or regression.

Sepp Hochreiter applied the PSVM to feature selection, especially to gene selection for microarray data.[4][35][36] The PSVM and standard support vector machines were applied to extract features that are indicative coiled coil oligomerization.[37]

Sepp Hochreiter's research group is member of the SEQC/MAQC-III consortium, coordinated by the US Food and Drug Administration. This consortium examined Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites regarding RNA sequencing (RNA-seq) performance.[40] Within this project standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments have been defined.[41] For analyzing the structural variation of the DNA, Sepp Hochreiter's research group proposed "cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation data with a low false discovery rate"[42] for detecting copy number variations in next generation sequencing data. cn.MOPS estimates the local DNA copy number, is suited for both whole genome sequencing and exom sequencing, and can be applied to diploid and haploid genomes but also to polyploid genomes. For identifying differential expressedtranscripts in RNA-seq (RNA sequencing) data, Sepp Hochreiter's group suggested "DEXUS: Identifying Differential Expression in RNA-Seq Studies with Unknown Conditions".[43] In contrast to other RNA-seq methods, DEXUS can detect differential expression in RNA-seq data for which the sample conditions are unknown and for which biological replicates are not available. In the group of Sepp Hochreiter, sequencing data was analyzed to gain insights into chromatin remodeling. The reorganization of the cell's chromatin structure was determined via next-generation sequencing of resting and activated T cells. The analyses of these T cell chromatin sequencing data identified GC-rich long nucleosome-free regions that are hot spots of chromatin remodeling.[44]

Sepp Hochreiter developed "Factor Analysis for Robust Microarray Summarization" (FARMS).[5] FARMS has been designed for preprocessing and summarizing high-density oligonucleotideDNA microarrays at probe level to analyze RNAgene expression. FARMS is based on a factor analysis model which is optimized in a Bayesian framework by maximizing the posterior probability. On Affymetrix spiked-in and other benchmark data, FARMS outperformed all other methods. A highly relevant feature of FARMS is its informative/ non-informative (I/NI) calls.[45] The I/NI call is a Bayesian filtering technique which separates signal variance from noise variance. The I/NI call offers a solution to the main problem of high dimensionality when analyzing microarray data by selecting genes which are measured with high quality.[46][47] FARMS has been extended to cn.FARMS[48] for detecting DNA structural variants like copy number variations with a low false discovery rate.