Mining LINE-1 Characteristics That Mediate Gene Expression

Abstract

We proposed to use data mining to identify LINE-1 (L1) characteristics that were associated with gene expression in bladder cancer. The data were collected from L1Base and GSE3167. The memory-efficient data structure called FP-Tree was employed to enumerate all frequent item sets. The frequent item sets were then used to produce rules for predicting “down regulation” and “not down.” Each rule was assigned a p-value by means of Chi-square test. No statistically significant rules for “down” had been found, in contrast 692 rules for “not down” were significant with odd ratios ranging from 1.68 to 1.98. All the significant rules were concentrated only in 20 characteristics. We were able to infer the L1 characteristics that down-regulated genes. Those characteristics were number of L1 elements in host genes, full-length intactness, number of CpG islands, conserved 5’UTR and mutated ORF2.