Learn R Programming

localScore (version 2.0.3)

karlin: Karlin [p-value] [iid]

Description

karlin Calculates an approximated p-value of a given local score value and a long sequence length in the identically and independently distributed model for the sequence. See also mcc function for another approximated method in the i.i.d. model that improved the one given by karlin or daudin for exact calculation.
karlin_parameters is a annex function returning the parameters \(\lambda\), \(K^+\) and \(K^*\) defined in Karlin and Dembo (1992).

Usage

karlin(
  local_score,
  sequence_length,
  score_probabilities,
  sequence_min = NULL,
  sequence_max = NULL,
  score_values = NULL
)

karlin_parameters( score_probabilities, sequence_min = NULL, sequence_max = NULL, score_values = NULL )

Value

A double representing the probability of a local score as high as the one given as argument

Arguments

local_score

the observed local score

sequence_length

length of the sequence

score_probabilities

the probabilities for each score from lowest to greatest (Optionnaly with scores as names)

sequence_min

minimum score (optional if score_values OR names(score_probabilities) is defined)

sequence_max

maximum score (optional if score_values OR names(score_probabilities) is defined)

score_values

vector of integer score values, associated to score_probabilities (optional if sequence_min and sequence_max OR names(score_probabilities) are defined)

Details

This method works the better the longer the sequence is. Important note : the calculus of the parameter of the distribution uses the resolution of a polynome which is a function of the score distribution, of order max(score)-min(score). There exists only empirical methods to solve a polynome of order greater that 5 with no warranty of reliable solution. The found roots are checked internally to the function and an error message is throw in case of inconsistent. In such case, you could try to change your score scheme (in case of discretization) or use the function karlinMonteCarlo . This function implements the formulae given in Karlin and Dembo (1992), page 115-6. As the score is discrete here (lattice score function), there is no limit distribution of the local score with the size of the sequence, but an inferior and a superior bound are given. The output of this function is conservative as it gives the upper bound for the p-value. Notice the lower bound can easily be found as it is the same call of function with parameter value local_score+1.

See Also

mcc, daudin, karlinMonteCarlo, monteCarlo

Examples

Run this code
karlin(150, 10000, c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02), -5, 5)
p1 <- karlin(local_score = 15, sequence_length = 5000, 
       score_probabilities = c(0.2, 0.3, 0.1, 0.2, 0.1, 0.1), 
       sequence_min = -3, sequence_max = 2)
p2 <- karlin(local_score = 15, sequence_length = 5000, 
       score_probabilities = c(0.2, 0.3, 0.1, 0.2, 0.1, 0.1), 
       score_values = -3:2)
p1 == p2 # TRUE

prob <- c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02)
score_values <- which(prob != 0) - 6 # keep only non null probability scores
prob0 <- prob[prob != 0]             # and associated probability
p <- karlin(150, 10000, prob, sequence_min = -5, sequence_max =  5)
p0 <- karlin(150, 10000, prob0, score_values = score_values) 
names(prob0) <- score_values
p1 <- karlin(150, 10000, prob0)
p == p0 # TRUE
p == p1 # TRUE

Run the code above in your browser using DataLab