Learn R Programming

localScore (version 2.0.1)

karlinMonteCarlo: Monte Carlo - Karlin [p-value]

Description

Estimates p-value of the local score based on a Monte Carlo estimation of Gumble parameters from simulations of smaller sequences with same distribution. Appropriate for great sequences with length > 10^3, for i.i.d and markovian sequence models.

Usage

karlinMonteCarlo(
  local_score,
  sequence_length,
  simulated_sequence_length,
  FUN,
  ...,
  numSim = 1000,
  plot = TRUE,
  keepSimu = FALSE
)

karlinMonteCarlo_double( local_score, sequence_length, simulated_sequence_length, FUN, ..., numSim = 1000, plot = TRUE, keepSimu = FALSE )

Value

If keepSimu is FALSE, returns a list containing:

p_valueProbability to obtain a local score with a value greater or equal to the parameter local_score
K*Parameter \(K^*\) defined in Karlin and Dembo (1990)
lambdaParameter \(\lambda\) defined in Karlin and Dembo (1990)

If keepSimu is TRUE, returns a list containing:

p_valueProbability to obtain a local score with a value greater or equal to the parameter local_score
K*Parameter \(K^*\) defined in Karlin and Dembo (1990)
lambdaParameter \(\lambda\) defined in Karlin and Dembo (1990)
localScoresVector of size numSim containing the simulated local scores for sequence size of simulated_sequence_length

Arguments

local_score

local score observed in a sequence.

sequence_length

length of the sequence

simulated_sequence_length

length of simulated sequences produced by

FUN

function to simulate similar sequences with.

...

parameters for FUN

numSim

number of sequences to create for estimation FUN

plot

boolean value if to display plots for cumulated function, density and linearization of cumulative density function

keepSimu

Boolean, default to FALSE. If TRUE, the simulated local scores are returned as the localScores element of the output list.

Details

The length of the simulated sequences is an argument specific to the function provided for simulation. Thus, it has to be provided also in the parameter simulated_sequence_length in the arguments of the "Monte Carlo - Karlin" function. It is a crucial detail as it influences precision and computation time of the result. Note that to get an appropriate estimation, the given average score must be non-positive. Be careful that the parameters names of the function FUN should differ from those of karlinMonteCarlo function.
Methods - Parameters \(K^\star\) and \(\lambda\) of Karlin and Dembo (1990) are estimated by a linear regression on the log(-log(cumulative distribution function of the local scores)) on shorter sequences (size simulated_sequence_length). The formula used are : \(\hat{\lambda} = -\hat{b}\) and \(\hat{K^\star} = exp(\hat{a})/simulated\_sequence\_length\) where \(\hat{a}\) is the intercept of the regression and \(\hat{b}\) is the slope of the regression. Then p-value is given by \(p = exp(-K^\star * exp(-\lambda*x ))\) where \(x = local\_score - \log(sequence\_length)/\lambda\).
The density plot produced by plot == TRUE depends on the type of the simulated local scores: if they are integer, a barplot of relative frequency is used, else plot(density(...)) is used.
This function calls localScoreC which type of the output depends on the type of the input. To be efficient, be aware to use a simulating function FUN that return a vector of adequate type ("integer" or "numeric"). Warning: in R, typeof(c(1,3,4,10)) == "double". You can set a type of a vector with mode() or as.integer() functions for example.
karlinMonteCarlo_double() is deprecated. At this point, it is just a call to karlinMonteCarlo() function.

See Also

monteCarlo localScoreC

Examples

Run this code
# \donttest{
mySeq <- sample(-7:6, replace = TRUE, size = 100000)
#MonteCarlo taking random sample from the input sequence itself
karlinMonteCarlo(local_score = 160, sequence_length = 100000,
               simulated_sequence_length = 1000,
               FUN = function(x, sim_length) {
                        return(sample(x = x,
                               size = sim_length,
                               replace = TRUE))
                     },
               x = mySeq,
               sim_length = 1000,
               numSim = 1000)
# }
# \donttest{
#Markovian example (longer computation)
MyTransMat_reels <-  matrix(c(0.3, 0.1, 0.1, 0.1, 0.4,
                              0.2, 0.2, 0.1, 0.2, 0.3,
                              0.3, 0.4, 0.1, 0.1, 0.1,
                              0.3, 0.3, 0.1, 0.2, 0.1,
                              0.2, 0.1, 0.2, 0.4, 0.1),
                              ncol = 5, byrow=TRUE)
karlinMonteCarlo(local_score = 18.5, sequence_length = 200000,
                 simulated_sequence_length = 1500,
                 FUN = transmatrix2sequence,
                 matrix = MyTransMat_reels,
                 score =c(-1.5,-0.5,0,0.5,1), length = 1500,
                 plot=TRUE, numSim = 1500)
# }

Run the code above in your browser using DataLab