RACER: Regression Analysis of Combinatorial Expression Regulation


Gene expression is a combinatorial function of various factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) and microRNA (miRNA) regulation. At the technological maturity, large amount of high-throughput data recently became available. In particular, the occupancy of ~100 TFs measured by chromatin immunoprecipitation and sequencing (ChIP-seq) are available from Encyclopedia of DNA Elements (ENCODE); The Cancer Genome Atlas (TCGA) generated mRNA/miRNA expression profiles, DM, and CNV measured by sequencing/microarrays across diverse cancer samples. Accordingly, there is an intense interest in developing integrative model to take full advantage of the data. To this end, we developed RACER (Regression Analysis of Combinatorial Expression Regulation), which fits the mRNA expression as response using as explanatory variables the TF binding signals (TFBS) from ENCODE, CNV, DM, miRNA expression signals from TCGA. Briefly, RACER infers the sample-specific TF/miRNA regulator activities, which are then used as inputs to infer specific TF/miRNA-gene interactions. The two-stage regression circumvents the problem with integrating the non-sample-specific ENCODE TFBS with the sample-specific TCGA measurements.

RACER source code:

  1. racer.zip

  2. The above tar ball contains the script for the RACER model and the script for feature selection discussed in the paper.

Full datasets used in the manuscript:

  1. laml_combined_input.RData

  2. The above data used in the paper is in RData format, which can be loaded directly in R.


  1. Li Y, Liang M, Zhang Z (2014) Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia. PLoS Comput Biol 10(10): e1003908. doi:10.1371/journal.pcbi.1003908