Learn R Programming

seewave (version 2.2.3)

SAX: Symbolic Aggregate approXimation

Description

This function converts a numeric times seris into a series of letters with a specific length and alphabet.

Usage

SAX(x, alphabet_size, PAA_number,
breakpoints = "gaussian", collapse = NULL)

Value

A character vector of length (when collapse is

NULL) or number of character (when collapse is not NULL) corresponding to PAA_number argument.

Arguments

x

a numeric vector.

alphabet_size

a numeric vector of length 1 setting the size of the alphabet.

PAA_number

a numeric vector of length 1 setting the number of elements (subsequences) of the Piecewise Aggregate Approximation (PAA).

breakpoints

either a character vector ("gaussian", "quantiles") or a numeric vector specifying the sorted values of the breakpoints along the distribution of x. See details and examples.

collapse

a character vector of length 1, specifying the way to collapse the output letters, see paste. By default letters are returned separated.

Author

Laurent Lellouch. An improvement added by Pavel Senin.

Details

The SAX method has been developed to reduce the dimensionality of a numerical series into a short chain of characters. SAX follows a two-step process: (1) Piecewise Aggregate Approximation (PAA) and (2) conversion a PAA sequence into a series of letters.

PAA consists in a Z-normalisation, a segmentation of the series of length n into w segments, and the computation of each segment average.

The conversion of the PAA into a series of letters is achieved by attributing with equiprobability each value of the PAA to a letter in reference to a Gaussian distribution. This process therefore assumes that the distribution of the numeric series x follows a Gaussian distribution. To relax the constraints of normality we here added the possibility to directly work on the quantiles of the original data distribution or to specify particular breakpoints along the distribution of x. See the examples.

References

Kasten, E.P., Gage, S.H., Fox, J. & Joo, W. (2012). The remote environmental assessment laboratory's acoustic library: an archive for studying soundscape ecology. Ecological Informatics, 12, 50 - 67.

Lin, J., Keogh, E., Lonardi, S., Chiu, B., June (2003). A symbolic representation of time series with implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. San Diego, California, USA.

See Also

discrets, symba, soundscapespec

Examples

Run this code
data(tico)
spec <- soundscapespec(tico, plot=FALSE)[,2]
SAX(spec, alphabet = 5, PAA = 10)

# change breakpoints
SAX(spec,  alphabet = 5, PAA = 10, breakpoints="quantiles")
SAX(spec,  alphabet = 5, PAA = 10, breakpoints=c(0, 0.5, 0.75, 1))
SAX(spec,  alphabet = 5, PAA = 10, breakpoints=c(0, 0.33, 0.66, 1))

# different output formats
SAX(spec,  alphabet = 5, PAA = 10, collapse="")
SAX(spec,  alphabet = 5, PAA = 10, collapse="-")

Run the code above in your browser using DataLab