Learn R Programming

Patterns (version 1.4)

geneSelection: Methods for selecting genes

Description

Selection of differentially expressed genes.

Usage

# S4 method for omics_array,omics_array,numeric
geneSelection(
  x,
  y,
  tot.number,
  data_log = TRUE,
  wanted.patterns = NULL,
  forbidden.patterns = NULL,
  peak = NULL,
  alpha = 0.05,
  Design = NULL,
  lfc = 0
)

# S4 method for list,list,numeric geneSelection( x, y, tot.number, data_log = TRUE, alpha = 0.05, cont = FALSE, lfc = 0, f.asso = NULL, return.diff = FALSE )

# S4 method for omics_array,numeric genePeakSelection( x, peak, y = NULL, data_log = TRUE, durPeak = c(1, 1), abs_val = TRUE, alpha_diff = 0.05 )

Value

A omics_array object.

Arguments

x

either a omics_array object or a list of omics_array objects. In the first case, the omics_array object represents the stimulated measurements. In the second case, the control unstimulated data (if present) should be the first element of the list.

y

either a omics_array object or a list of strings. In the first case, the omics_array object represents the stimulated measurements. In the second case, the list is the way to specify the contrast:

First element:

condition, condition&time or pattern. The condition specification is used when the overall is to compare two conditions. The condition&time specification is used when comparing two conditions at two precise time points. The pattern specification allows to decide which time point should be differentially expressed.

Second element:

a vector of length 2. The two conditions which should be compared. If a condition is used as control, it should be the first element of the vector. However, if this control is not measured throught time, the option cont=TRUE should be used.

Third element:

depends on the first element. It is no needed if condition has been specified. If condition&time has been specified, then this is a vector containing the time point at which the comparison should be done. If pattern has been specified, then this is a vector of 0 and 1 of length T, where T is the number of time points. The time points with desired differential expression are provided with 1.

tot.number

an integer. The number of selected genes. If tot.number <0 all differentially genes are selected. If tot.number > 1, tot.number is the maximum of diffenrtially genes that will be selected. If 0<tot.number<1, tot.number represents the proportion of diffenrentially genes that are selected.

data_log

logical (default to TRUE); should data be logged ?

wanted.patterns

a matrix with wanted patterns [only for geneSelection].

forbidden.patterns

a matrix with forbidden patterns [only for geneSelection].

peak

interger. At which time points measurements should the genes be selected [optionnal for geneSelection].

alpha

float; the risk level. Default to `alpha=0.05`

Design

the design matrix of the experiment. Defaults to `NULL`.

lfc

log fold change value used in limma's `topTable`. Defaults to 0.

cont

use contrasts. Defaults to `FALSE`.

f.asso

function used to assess the association between the genes. The default value `NULL` implies the use of the usual `mean` function.

return.diff

[FALSE] if TRUE then the function returns the stimulated expression of the differentially expressed genes

durPeak

vector of size 2 (default to c(1,1)) ; the first elements gives the length of the peak at the left, the second at the right. [only for genePeakSelection]

abs_val

logical (default to TRUE) ; should genes be selected on the basis of their absolute value expression ? [only for genePeakSelection]

alpha_diff

float; the risk level

Author

Frédéric Bertrand , Myriam Maumy-Bertrand.

Examples

Run this code

# \donttest{
  if(require(CascadeData)){
	data(micro_US)
	micro_US<-as.omics_array(micro_US,time=c(60,90,210,390),subject=6)
	data(micro_S)
	micro_S<-as.omics_array(micro_S,time=c(60,90,210,390),subject=6)

  #Basically, to find the 50 more significant expressed genes you will use:
  Selection_1<-geneSelection(x=micro_S,y=micro_US,
  tot.number=50,data_log=TRUE)
  summary(Selection_1)
  
  #If we want to select genes that are differentially 
  #at time t60 or t90 :
  Selection_2<-geneSelection(x=micro_S,y=micro_US,tot.number=30,
  wanted.patterns=
  rbind(c(0,1,0,0),c(1,0,0,0),c(1,1,0,0)))
  summary(Selection_2)

  #To select genes that have a differential maximum of expression at a specific time point.
  
  Selection_3<-genePeakSelection(x=micro_S,y=micro_US,peak=1,
  abs_val=FALSE,alpha_diff=0.01)
  summary(Selection_3)
  }

  if(require(CascadeData)){
data(micro_US)
micro_US<-as.omics_array(micro_US,time=c(60,90,210,390),subject=6)
data(micro_S)
micro_S<-as.omics_array(micro_S,time=c(60,90,210,390),subject=6)
#Genes with differential expression at t1
Selection1<-geneSelection(x=micro_S,y=micro_US,20,wanted.patterns= rbind(c(1,0,0,0)))
#Genes with differential expression at t2
Selection2<-geneSelection(x=micro_S,y=micro_US,20,wanted.patterns= rbind(c(0,1,0,0)))
#Genes with differential expression at t3
Selection3<-geneSelection(x=micro_S,y=micro_US,20,wanted.patterns= rbind(c(0,0,1,0)))
#Genes with differential expression at t4
Selection4<-geneSelection(x=micro_S,y=micro_US,20,wanted.patterns= rbind(c(0,0,0,1)))
#Genes with global differential expression 
Selection5<-geneSelection(x=micro_S,y=micro_US,20)

#We then merge these selections:
Selection<-unionOmics(list(Selection1,Selection2,Selection3,Selection4,Selection5))
print(Selection)

#Prints the correlation graphics Figure 4:
summary(Selection,3)

##Uncomment this code to retrieve geneids.
#library(org.Hs.eg.db)
#
#ff<-function(x){substr(x, 1, nchar(x)-3)}
#ff<-Vectorize(ff)
#
##Here is the function to transform the probeset names to gene ID.
#
#library("hgu133plus2.db")
#
#probe_to_id<-function(n){  
#x <- hgu133plus2SYMBOL
#mp<-mappedkeys(x)
#xx <- unlist(as.list(x[mp]))
#genes_all = xx[(n)]
#genes_all[is.na(genes_all)]<-"unknown"
#return(genes_all)
#}
#Selection@name<-probe_to_id(Selection@name)
  }
	# }

Run the code above in your browser using DataLab