Learn R Programming

TraMineR (version 2.2-10)

seqtree: Tree structured analysis of a state sequence object.

Description

Facility for growing a regression tree for a state sequence object.

Usage

seqtree(formula, data = NULL, weighted = TRUE, min.size = 0.05,
  max.depth = 5, R = 1000, pval = 0.01, weight.permutation = "replicate",
  seqdist.args = list(method = "LCS", norm = "auto"), diss = NULL,
  squared = FALSE, first = NULL, minSize, maxdepth, seqdist_arg)

Value

A seqtree object with same attributes as disstree objects.

The leaf membership is in the first column of the fitted attribute. For example, the leaf memberships for a tree dt are in dt$fitted[,1].

Arguments

formula

a formula where the left hand side is a state sequence object (see seqdef) and the right hand specifies the candidate variables for partitioning the set of sequences.

weighted

Logical. If TRUE, use the weights of the state sequence object.

data

a data frame where variables in the formula will be searched

min.size

minimum number of cases in a node, in percentage if less than 1.

max.depth

maximum depth of the tree.

R

Number of permutations used to assess the significance of the split.

pval

Maximum p-value, in percent.

weight.permutation

Weights permutation method: "diss" (attach weights to the dissimilarity matrix), "replicate" (replicate case according to the weights arguments), "rounded-replicate" (replicate case according to the rounded weights arguments), "random-sampling" (random assignment of covariate profiles to the objects using distributions defined by the weights.)

seqdist.args

list of arguments directly passed to seqdist, only used if diss=NULL

diss

An optional dissimilarity matrix. If not provided, a dissimilarity matrix is computed using seqdist and seqdist.args

squared

Logical. If TRUE, the dissimilarity matrix is squared

first

Character. An optional variable name to force the first split.

minSize

Deprecated. Use min.size instead.

maxdepth

Deprecated. Use max.depth instead.

seqdist_arg

Deprecated. Use seqdist.args instead.

Author

Matthias Studer (with Gilbert Ritschard for the help page)

Details

The function provides a simplified interface for applying disstree on state sequence objects.

The seqtree objects can be "plotted" with seqtreedisplay. A print method is also available which prints the medoid sequence for each terminal node.

References

Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2011). Discrepancy analysis of state sequences, Sociological Methods and Research, Vol. 40(3), 471-510, tools:::Rd_expr_doi("10.1177/0049124111415372").

See Also

seqtreedisplay, disstree

Examples

Run this code
data(mvad)

## Defining a state sequence object
mvad.seq <- seqdef(mvad[, 17:86])

## Growing a seqtree from Hamming distances:
##   Warning: The R=10 used here to save computation time is
##   much too small and will generate strongly unstable results.
##   We recommend to set R at least as R=1000.
##   To comply with this small R value, we set pval = 0.1.
seqt <- seqtree(mvad.seq~ male + Grammar + funemp + gcse5eq + fmpr + livboth,
    data=mvad, R=10, pval=0.1, seqdist.arg=list(method="HAM", norm="auto"))
print(seqt)

## Growing a seqtree from an existing distance matrix
mvad.dhd <- seqdist(mvad.seq, method="DHD")
seqt <- seqtree(mvad.seq~ male + Grammar + funemp + gcse5eq + fmpr + livboth,
    data=mvad, R=10, pval=0.1, diss=mvad.dhd)
print(seqt)


### Following commands only work if GraphViz is properly installed
if (FALSE) {
seqtreedisplay(seqt, type="d", border=NA)
seqtreedisplay(seqt, type="I", sortv=cmdscale(mvad.dhd, k=1))
}

Run the code above in your browser using DataLab