Description

This function preprocesses the design matrix by removing those
columns that contain NA's or are all zero. It also standardizes
non-binary columns to have mean zero and variance one.

Usage

1

Arguments

X

The n times p design matrix. The columns should
represent genes and rows represent the observations. The column names are
used as gene names so they should not be left as NULL. Note that the
input matrix X should NOT contain vector of 1's representing
the intercept.

Value

It returns a list having the following objects:

X

The filtered design matrix which can be used in variable selection
procedure. Binary columns are moved to the end of the design matrix.

gnames

Gene names read from the column names of the filtered design
matrix.

Author(s)

Amir Nikooienejad

Examples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

### Constructing a synthetic design matrix for the purpose of preprocessing### imposing columns with different scales
n <-40
p1 <-50
p2 <-150
p <- p1 + p2
X1 <-matrix(rnorm(n*p1,1,2),ncol= p1)
X2 <-matrix(rnorm(n*p2),ncol= p2)
X <-cbind(X1, X2)### putting NA elements in the matrix
X[3,85]<-NA
X[25,85]<-NA
X[35,43]<-NA
X[15,128]<-NAcolnames(X)<-paste("gene_",c(1:p),sep="")### Running the function. Note the intercept column that is added as the### first column in the "logistic" family
Xout <-PreProcess(X)dim(Xout$X)[2]==(p +1)## 1 is added because intercept column is included## This is FALSE because of the removal of columns with NA elements