corrDim: Correlation dimension

Description

Estimates the correlation dimension by forming a delay embedding of a time series, calculating correlation summation curves (one per embedding dimension), and subsequently fitting the slopes of these curves on a log-log scale using a robust linear regression model. If the slopes converge at a given embedding dimension $E$, then $E$ is the correct embedding dimension and the (convergent) slope value is an estimate of the correlation dimension for the data.

Usage

corrDim(x, dimension=5,
    tlag=timeLag(x, method="acfdecor"), olag=0, resolution=2)

Arguments

a vector containing a uniformly-sampled real-valued time series or a matrix containing an embedding with each column representing a different coordinate. If the latter, the dimension input is set to the number of columns and the tlag input is ignored.

dimension

the maximal embedding dimension. Default: 5.

olag

the number of points along the trajectory of the current point that must be exceeded in order for another point in the phase space to be considered a neighbor candidate. This argument is used to help attenuate temporal correlation in the the embedding which can lead to spuriously low correlation dimension estimates. The orbital lag must be positive or zero. Default: length(x)/10 or 500, whichever is smaller.

resolution

an integer representing the spatial resolution factor. A value of P increases the number of effective scales by a factor of P at a cost of raising the $\ell_\infty$ norm to the Pth power. For example, setting the resolution to 2 will double the number of scales while imposing and additional multiplication operation. The resolution must exceed unity. Default: 2.

tlag

the time delay between coordinates. Default: timeLag(x, method="acfdecor"), the decorrelation time of the autocorrelation function.

Value

an object of class chaoticInvariant.

S3 METHODS

eda.plot

plots an extended data analysis plot, which graphically summarizes the process of obtaining a correlation dimension estimate. A time history, phase plane embeddding, correlation summation curves, and the slopes of correlation summation curves as a function of scale are plotted.

plot

plots the correlation summation curves on a log-log scale. The following options may be used to adjust the plot components:

type: Character string denoting the type of data to be plotted. The "stat" option plots the correlation summation curves while the "dstat" option plots a 3-point estimate of the derivatives of the correlation summation curves. The "slope" option plots the estimated slope of the correlation summation curves as a function of embedding dimension. Default: "stat".

fit

Logical flag. If TRUE, a regression line is overlaid for each curve. Default: TRUE.

grid

Logical flag. If TRUE, a grid is overlaid on the plot. Default: TRUE.

legend

Logical flag. If TRUE, a legend of the estimated slopes as a function of embedding dimension is displayed. Default: TRUE.

...

Additional plot arguments (set internally by the par function).

prints a qualitiative summary of the results.

Details

To estimate the correlation dimension, correlation summation curves must be generated and subsequently fit with a robust linear regression model to obtain the slopes of these curves on a log-log plot. The dimension at which these slope estimates (appear to) converge reveals the proper embedding dimension for the data and the slope at this (and higher) embedding dimensions is an estimate of the correlation dimension. The function used to fit the correlation summation curves is lmsreg which fits a robust linear model to the data using the method of least median of squares regression. See the on-line help documentation for help on the lmsreg function: in R, lmsreg is found in the MASS package while in S-PLUS it is indigenous and appears in the splus database.

The correlation summation at scale $\varepsilon$ for a given embedding dimension is defined as $$C_2(\varepsilon)={ 2 \over (N - \gamma)(N - \gamma - 1) } \sum_{i=1}^N\sum_{j=i+\gamma+1}^N\Theta(\varepsilon - || \mathbf{X_i} - \mathbf{X_j} ||),$$ where $\Theta(\cdot)$ is the Heavyside function $$ \Theta(x)=\left\{ \begin{array}{ll} 0,& \mbox{if $x \le 0$;}\\ 1,& \mbox{otherwise} \end{array} \right.$$

and $\mathbf{X_i}$ is the $i$th point of a collection of N points in the phase space. The parameter $\gamma$ is the orbital lag.

The algorithm used to calculate the correlation summation is made computationally efficient by using:

1: The $\ell_\infty$ norm to calculate the distance between neighbors in the phase space as opposed to (say) the $\ell_2$ norm which involves taking computationally intense square root and power of two operations. The $\ell_\infty$ norm of the distance between two points in the phase space is the absolute value of the maximal difference between any of the points' respective coordinates, i.e. if $\mathbf{X}=[z_1, z_2, z_3]^T$ then $||\mathbf{X}||_\infty \equiv \max_i |z_i|$.
2: Bitwise masking and shift operations to reveal the radix-2 exponent of the $\ell_\infty$ norm. This direct means of obtaining the exponent immediately yields the associated scale of the distance between neighbors in the phase space while avoiding costly log operations. The bitwise mask and shift factors are based on the IEEE standard 754 for binary floating-point arithmetic. Initial tests are performed in the code to verify that the current machine follows this standard.
3: a computationally efficient routine to calculate the resulting value of a float raised to a positive integer power. Specifically, the $\ell_\infty$ norm is raised to an integer power (p) to effectively increase the spatial resolution by a factor of p.

The correlation summation curves $C_2(E,\varepsilon)$ where E is the embedding dimension and $\varepsilon$ is the scale, the correlation dimension curves $D_2(E,\varepsilon)$ can be calculated by $$D_2(E,\varepsilon) ={\ln C_2(E,2\varepsilon) - \ln C_2(E,\varepsilon/2) \over \ln 2\varepsilon - \ln \varepsilon/2} ={1 \over 2} \log_2{ C_2(E,2\varepsilon) \over C_2(E,\varepsilon/2) }.$$ This formulation is used to help suppress numerical instabilities that are present in other numerical derivative schemes such as a first order difference.

As a caveat to the user, the slope estimates of the correlation summation curves will typically display a fair amount of variability and the range of scales over which the slopes are approximately linear may be small. Inasmuch, the correlation dimension estimate should always be interpretted as a subjective summary statistic, even when the original times series is representative of a truly noise-free chaotic response.

References

Peter Grassberger and Itamar Procaccia (1983), Measuring the strangeness of strange attractors, Physica D, 9, 189--208.

Holger Kantz and Thomas Schreiber (1997), Nonlinear Time Series Analysis, Cambridge University Press.

Peter Grassberger and Itamar Procaccia (1983), Characterization of strange attractors, Physical Review Letters, 50(5), 346--349.

Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the American Statistical Association, 79, 871--88.

Examples

Run this code

# NOT RUN {
## calculate the correlation dimension estimates 
## for chaotic beam data using a delay 
## embedding for dimensions 1 through 10, a 
## orbital lag of 10, and a spatial resolution 
## of 4. 
beam.d2 <- corrDim(beamchaos, olag=10, dim=10, res=4)

## print a summary of the results 
print(beam.d2)

## plot the correlation summation curves 
plot(beam.d2, fit=FALSE, legend=FALSE)

## plot an extended data analysis plot 
eda.plot(beam.d2)
# }

Run the code above in your browser using DataLab