Learn R Programming

StatMatch (version 1.4.3)

rho.bounds: Estimates plausible values of the Pearson's correlation coefficient between two variables observed in distinct samples referred to the same target population.

Description

This function assesses the uncertainty in estimating the Pearson's correlation coefficient between y.rec (Y) and z.don (Z) when the two variables are observed in two different samples sharing a number of common predictors.

Usage

rho.bounds(data.rec, data.don,
           match.vars, y.rec, z.don,
           w.rec = NULL, w.don = NULL)

Value

A vector with three values: the estimated lower bound for Pearson's correlation coefficient between y.rec(Y) and z.don (Z); the estimated upper bound; and, the mid-point of the interval that corresponds to the estimate Pearson's correlation coefficient under the conditional independence assumption (i.e. the correlations between Y and Z is fully explained by the available X variables match.vars).

Arguments

data.rec

dataframe including the Xs (predictors, listed in match.vars) and y.rec (response; target variable in this dataset).

data.don

dataframe including the Xs (predictors, listed in match.vars) and z.don (response; target variable in this dataset).

match.vars

vector with the names of the Xs variables to be used, jointly with y.rec and z.don, in estimating the correlation matrix. If match.vars include one or more factor variables these will be replaced with the corresponding dummies before estimating the correlation matrix.

y.rec

character indicating the name of Y target variable in data.rec. It should be a numeric variable.

z.don

character indicating the name of Z target variable in data.don. It should be a numeric variable.

w.rec

name of the variable with units' weights in data.rec, if available (default NULL); the weights, if provided, are used in estimating the bounds.

w.don

name of the variable with units' weights in data.don, if available (default NULL); the weights, if provided, are used in estimating the bounds.

Author

Marcello D'Orazio mdo.statmatch@gmail.com

Details

This function evaluates the uncertainty in the estimation of the Pearson's correlation coefficient between y.rec (Y) and z.don (Z), when the two variables are observed in two different samples that refer to the same target population, but that share a set of common predictors X (match.vars). The evaluation of the uncertainty corresponds to the estimation of the bounds (lower and upper) of the correlation coefficient between Y and Z, given the available data. The method uses the expressions proposed by Rodgers and DeVol (1982). Note that the correlations between the X variables common to both samples (match.vars) are estimated after pooling the samples. Factor variables, if present in match.vars, are replaced by the corresponding dummies before estimating the correlation; this method suffers from a number of critical problems related to the estimation of biserial correlation and the underlying assumption of a Gaussian distribution. The correlation matrix between Y and Xs is estimated on data.rec, while the correlation matrix between Z and Xs is estimated on data.don; this way of working can in some cases give unreliable estimates due to problems with the samples (usually when they are not representative of the same target population).

References

D'Orazio, M., (2024). Is Statistical Matching feasible? Note, https://www.researchgate.net/publication/387699016_Is_statistical_matching_feasible.

Rodgers, W.L. and DeVol E.B. (1982). An evaluation of statistical matching. Report Submitted to the Income Survey Development Program, Dept. of Health and Human Services, Institute for Social Reasearch, University of Michigan.

See Also

mixed.mtc.

Examples

Run this code
set.seed(11335577)
pos <- sample(x = 1:150, size = 60, replace = FALSE)
ir.A <- iris[pos, c(1:3, 5)]
ir.B <- iris[-pos, c(1:2, 4:5)]

intersect(colnames(ir.A), colnames(ir.B)) # shared Xs

# Xs without Species (factor)
out.1 <- rho.bounds(data.rec=ir.A, data.don=ir.B, 
                    match.vars=c("Sepal.Length", "Sepal.Width"),
                   y.rec="Petal.Length", z.don="Petal.Width")
out.1

# Xs with Species (factor)
out.2 <- rho.bounds(data.rec=ir.A, data.don=ir.B, 
                    match.vars=c("Sepal.Length", "Sepal.Width", "Species"),
                    y.rec="Petal.Length", z.don="Petal.Width")
out.2

Run the code above in your browser using DataLab