Learn R Programming

PST (version 0.94.1)

Probabilistic Suffix Trees and Variable Length Markov Chains

Description

Provides a framework for analysing state sequences with probabilistic suffix trees (PST), the construction that stores variable length Markov chains (VLMC). Besides functions for learning and optimizing VLMC models, the PST library includes many additional tools to analyse sequence data with these models: visualization tools, functions for sequence prediction and artificial sequences generation, as well as for context and pattern mining. The package is specifically adapted to the field of social sciences by allowing to learn VLMC models from sets of individual sequences possibly containing missing values, and by accounting for case weights. The library also allows to compute probabilistic divergence between two models, and to fit segmented VLMC, where sub-models fitted to distinct strata of the learning sample are stored in a single PST. This software results from research work executed within the framework of the Swiss National Centre of Competence in Research LIVES, which is financed by the Swiss National Science Foundation. The authors are grateful to the Swiss National Science Foundation for its financial support.

Copy Link

Version

Install

install.packages('PST')

Monthly Downloads

230

Version

0.94.1

License

GPL (>= 2)

Last Published

December 14th, 2023

Functions in PST (0.94.1)

nobs

Extract the number of observations to which a VLMC model is fitted
cprob

Empirical conditional probability distributions of order L
cmine

Mining contexts
predict

Compute the probability of categorical sequences using a probabilistic suffix tree
impute

Impute missing values using a probabilistic suffix tree
cplot

Plot single nodes of a probabilistic suffix tree
prune

Prune a probabilistic suffix tree
ppplot

Plotting a branch of a probabilistic suffix tree
pqplot

Prediction quality plot
subtree

Extract a subtree from a segmented PST
plot-PSTr

Plot a PST
pstree

Build a probabilistic suffix tree
summary-methods

Summary of variable length Markov chain model
pmine

PST based pattern mining
print

Print method for objects of class PSTf and PSTr
query

Retrieve counts or next symbol probability distribution
nodenames

Retrieve the node labels of a PST
s1

Example sequence data set
tune

AIC, AICc or BIC based model selection
pdist

Compute probabilistic divergence between two PST
SRH

Longitudinal data on self rated health
logLik

Log-Likelihood of a variable length Markov chain model
PSTf-class

Flat representation of a probabilistic suffix tree
PSTr-class

Nested representation of a probabilistic suffix tree
generate

Generate sequences using a probabilistic suffix tree