Learn R Programming

PST (version 0.94.1)

cmine: Mining contexts

Description

Extracting contexts in a PST satisfying user defined criterion

Usage

# S4 method for PSTf
cmine(object, l, pmin, pmax, state, as.tree=FALSE, delete=TRUE)

Value

If as.tree=TRUE a PST, that is an object of class PSTf which can be printed and plotted; if as.tree=FALSE a list of contexts with their associated next symbol probability distribution, that is an object of class cprobd.list for which a plot method is available. Subscripts can be used to select subsets of the contexts, see examples.

Arguments

object

A probabilistic suffix tree, i.e., an object of class "PSTf" as returned by the pstree, prune or tune function.

l

length of the context to search for.

pmin

numeric. Minimal probability for selecting the (sub)sequence.

pmax

numeric. Maximal probability for selecting the (sub)sequence.

state

character. One or several states of the alphabet for which the (cumulated) probability is greater than pmin or less than pmax.

as.tree

logical. If TRUE the cmine method returns a subtree of the PST given as input with selected contexts (including their parent nodes, even if these don't statistify the defined criterion). If FALSE the output is the list of selected contexts. See value.

delete

Logical. If as.tree=TRUE and delete=FALSE, the pruned nodes are not removed from the tree but tagged as pruned=FALSE, so that when plotting the pruned tree these nodes wil appear surrounded with red (can be set to another color) lines.

details

The cmine function searches in the tree for nodes fulfilling certain characteristics, for example contexts that are highly likely to be followed by a given state (see example 1). One can also mine for contexts corresponding to a minimum or maximum probability for several states together (see example 2). For more details, see Gabadinho 2016.

Author

Alexis Gabadinho

References

Gabadinho, A. & Ritschard, G. (2016). Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software, 72(3), pp. 1-39.

Examples

Run this code
## Loading the SRH.seq sequence object
data(SRH)

## Learning the model
SRH.pst <- pstree(SRH.seq, nmin=30, ymin=0.001)

## Example 1: searching for all contexts yielding a probability of the 
## state G1 (very good health) of at least pmin=0.5
cm1 <- cmine(SRH.pst, pmin=0.5, state="G1")
cm1[1:10]

## Example 2: contexts associated with a high probability of 
## medium or lower self rated health 
cm2 <- cmine(SRH.pst, pmin=0.5, state=c("B1", "B2", "M"))
plot(cm2, tlim=0, main="(a) p(B1,B2,M)>0.5")

Run the code above in your browser using DataLab