Learn R Programming

dynCorr (version 1.1.0)

dynamicCorrelation: Dynamic Correlation

Description

Computes dynamical correlation estimates for pairs of longitudinal responses, including consideration of lags and derivatives, following a local polynomial regression smoothing step.

Usage

dynamicCorrelation(dataFrame, depVar, indepVar, subjectVar,
                   function.choice, width.range, width.place,
                   min.obs, points.length, points.by, boundary.trunc,
                   lag.input, byOrder, by.deriv.only, full.lag.output)

Arguments

dataFrame

The data frame that contains the dependent variables/responses, the independent variable (often time), and the subject/individual identification; there should be one row entry for each combination of subject/individual and indepVar (often time).

depVar

Dependent variables/responses; at least two are necessary for purposes of calculating at least one dynamical correlation estimate of interest; there should be a unique column within depVar for each response.

indepVar

Independent variable, typically the discrete recorded time points at which the dependent variables were collected; note that this is the independent variable for purposes of curve creation leading into estimating the dynamical correlations between pairs of dependent variables; must be contained in a single column.

subjectVar

Column name of the individuals; there should be one row entry for each combination of subject/individual and indepVar.

function.choice

A vector of length 3 to indicate which derivatives desired for local polynomial curve smoothing; 1st entry is for 0th derivative (i.e., original function), 2nd entry is for 1st derivative, 3rd is for 2nd derivative; 1=yes, 0=no. e.g., c(1,0,1) would be specified if interest is in looking at derivative 0 and derivative 2, c(1,0,0) for looking at original function (0th derivative) only, etc.

width.range

Bandwidth for local polynomial regression curve smoothing for each dependent variable/response; it can be a list that specifies a distinct bandwidth two-element vector for each response, or a single two-element vector that is used for each of the responses <U+2014> the program is currently set up to allow linearly increasing or decreasing bandwidths, specified by this two-element vector with the increase (or decrease) occurring from the first argument in width.place to its second argument; the lpepa function within the lpridge package is called, which uses Epanecknikov kernel weighting, and the specifications of bandwidth in width.range will be used there; the default bandwidth is the range of indepVar (usually time) divided by 4, i.e., a constant global bandwidth.

width.place

Endpoints for width change assuming a non-constant bandwidth requested <U+2014> the program is currently set up to allow linearly increasing or decreasing bandwidths, specified in width.range, with the increase (or decrease) occurring from the first argument in width.place to its second argument; it can be a list that specifies different endpoints for each response, or a single vector of endpoints that is used for each of the responses; default is no endpoints specified which means constant global bandwidth throughout the range of indepVar.

min.obs

Minimum oberservation (follow-up period) required. If specified, individuals whose follow-up period shorter than min.obs will be removed from calculation. Default = NA (use whole dataset provided).

points.length

Number of indep (time) points for dynamic correlation calculation for each response of each individual. This is the number of points between time 0 and minimum of maximum of individual follow-up periods (max_common_obs). Note that each individual<U+2019>s full follow-up time span is used in the local polynomial regression curve smoothing step, but only the first points.length number of time points (from time 0 to max_common_obs) is used in the following dynamic correlation calculation. Default points.length value is set to 100; points.length takes precedence unless points.by is specified.

points.by

Interval between indep (time) points for local polynomial regression curve smoothing for each response of each individual. Both integer and non-integer value could be specified, and time grid will be computed accordingly. Note that points.length takes precedence (default = 100) unless points.by is specified.

boundary.trunc

Indicate the boundary of indepVar that should be truncated after smoothing; this may be done in case of concerns about estimating dynamical correlation at the boundaries of indepVar; this is a two element vector, where the first argument is how much to truncate from the right of the minimum value of indep.var and the second argument is how much to truncate from the left of the maximum value of indep var, within an individual and specific response; default is no truncation, i.e., c(0,0).

lag.input

Values of lag to be considered; can be a vector of requested lags, for which a dynamical correlation estimate will be produced for each pair of responses at each lag; a positive value of lag.input means that the first entry for the dynamical correlation leads (occurs before) the second entry <U+2014> conversely, a negative value means that the second entry for the correlation leads the first entry; default is no lag at all considered.

byOrder

A vector that specifies the order of the variables/responses and derivatives (if any) to be the leading variable in the calculations; this will have an effect on how lag.input is to be interpreted; default is to use the order as specified in the depVar argument.

by.deriv.only

If TRUE, the inter-dynamical correlations between different derivatives are not computed, which can save computation time (e.g., when function.choice=c(1,0,1) is specified and by.deriv.only=T, then only dynamical correlations will be calculated within (and not across) the 0th and 2nd derivative estimates, respectively); default is TRUE.

full.lag.output

If TRUE, the dynamical correlation values for each pair of responses and requested derivative will be stored in vectors corresponding to different lag values, which enables plotting the correlations as a function of lag values; all the vectors will be stored in the returned attribute resultMatrix; default is FALSE.

Details

This function will provide smooth estimates (curves/functions or their derivatives) of longitudinal responses of interest per individual, then generate dynamical correlation estimates for each pair of responses. Lags of interest can be specified, using the lag.input argument. For smoothing, the function uses local polynomial smoothing using Epanecknikov kernel weighting (by calling the function lpepa within the lpridge package). The default global bandwidth is generated by taking the range of indepVar (usually time) and dividing by 4. This, by default, will be a constant global bandwidth, but proper specification of the width.range and width.place arguments can allow for a more flexible bandwidth choice, including different specification for each response in depVar. Note that the correlation will only be calculated as far as min of max observation time (max_common_obs) across all individuals, by default, hence caution should be taken if there is large heterogeneity among max observation times across individuals. Details of the methodology for dynamical correlation can be found in Dubin and Muller (2005).

See Also

lpepa

Examples

Run this code
# NOT RUN {
## Example 1: using default smoothing parameters, obtain dynamical
##            correlation estimates for all three pairs of responses,
##            for both original function and the first derivative.
##            Note that in dynCorrData, mininum of maximum obs. time
##            = 120, results should be the same if either points.by
##            = 1 or points.length = 120 is specified.

examp1_by <- dynamicCorrelation(dataFrame = dynCorrData,
                                depVar = c('resp1', 'resp2', 'resp3'),
                                indepVar = 'time',
                                points.by = 1,
                                subjectVar = 'subject',
                                function.choice = c(1,1,0))

examp1_len <- dynamicCorrelation(dataFrame = dynCorrData,
                                 depVar = c('resp1', 'resp2', 'resp3'),
                                 indepVar = 'time',
                                 points.length = 120,
                                 subjectVar = 'subject',
                                 function.choice = c(1,1,0))
examp1_by
examp1_len

## Example 1a: re-run Example 1 using original dataset, but with min.obs
##             set to 200. Range in lengths of follow-up periods between
##             individuals is reduced.

examp1a <- dynamicCorrelation(dataFrame = dynCorrData,
                              depVar = c('resp1', 'resp2', 'resp3'),
                              indepVar = 'time',
                              min.obs = 150,
                              points.by = 1,
                              subjectVar = 'subject',
                              function.choice = c(1,0,0))
examp1a

## Example 2: using default smoothing parameters, obtain dynamical
##            correlation estimates for all three pairs of responses,
##            looking at range of lags between -10 and +10, for original
##            functions only

examp2 <- dynamicCorrelation(dataFrame = dynCorrData,
                             depVar = c('resp1', 'resp2', 'resp3'),
                             indepVar = 'time',
                             points.by = 1,
                             subjectVar = 'subject',
                             function.choice = c(1,0,0),
                             lag.input = seq(-20,20, by=1))
examp2

## note: output includes zero lag correlations, as well as maximum
##       correlation (in absolute value) in max.dynCorr and and its
##       corresponding lag value in max.dynCorrLag

## Example 3: re-rerun example 2, but set up for plotting of specified
##            lagged correlations

examp3 <- dynamicCorrelation(dataFrame = dynCorrData,
                             depVar = c('resp1', 'resp2', 'resp3'),
                             indepVar = 'time',
                             subjectVar = 'subject',
                             points.by = 1,
                             function.choice = c(1,0,0),
                             lag.input = seq(-20,10, by=1),
                             full.lag.output = TRUE)

# conduct plotting, with one panel for each pair of responses considered;
# the ylim adjustment is made here for the different magnitude of the
# correlations between the two pairs

par(mfrow=c(1,2))
plot(seq(-20,10, by=1),
     examp3$lagResultMatrix[[1]][1,],
     type='b',
     xlab = 'lag order (in days)',
     ylab = 'lagged correlations',
     ylim = c(-0.4, -0.2),
     main = 'dyncorr b/t resp1 and resp2 as a function of lag')

abline(v = examp3$max.dynCorrLag[[1]][1,2], lty = 2)

plot(seq(-20,10, by=1),
     examp3$lagResultMatrix[[1]][2,],
     type='b',
     xlab = 'lag order (in days)',
     ylab = 'lagged correlations',
     ylim = c(0.3, 0.5),
     main = 'dyncorr b/t resp1 and resp3 as a function of lag')

abline(v = examp3$max.dynCorrLag[[1]][1,3], lty = 2)

## Example 4: same as the original function piece of Example 1,
##            except now adjust the constant global bandwidth
##            from the default to 40

examp4 <- dynamicCorrelation(dataFrame = dynCorrData,
                             depVar = c('resp1', 'resp2', 'resp3'),
                             indepVar = 'time',
                             points.by = 1,
                             subjectVar = 'subject',
                             function.choice = c(1,0,0),
                             width.range = c(40, 40))
examp4

# }

Run the code above in your browser using DataLab