Learn R Programming

StatMatch (version 1.4.2)

plotCont: graphical comparison of the estimated distributions for the same continuous variable.

Description

Compares graphically the estimated distributions for the same continuous variable using data coming from two different data sources.

Usage

plotCont(data.A, data.B, xlab.A, xlab.B=NULL, w.A=NULL, w.B=NULL,
         type="density", ref=FALSE)

Value

The required graphical representation is drawn using the ggplot2 facilities.

Arguments

data.A

A dataframe or matrix containing the variable of interest xlab.A and eventual associated survey weights w.A.

data.B

A dataframe or matrix containing the variable of interest xlab.B and eventual associated survey weights w.B.

xlab.A

Character string providing the name of the variable in data.A whose distribution should be represented graphically and compared with that estimated from data.B.

xlab.B

Character string providing the name of the variable in data.B whose distribution should be represented graphically and compared with that estimated from data.A. If xlab.B=NULL (default) then it assumed xlab.B=xlab.A.

w.A

Character string providing the name of the optional weighting variable in data.A that, in case, should be used to estimate the distribution of xlab.A

w.B

Character string providing the name of the optional weighting variable in data.B that, in case, should be used to estimate the distribution of xlab.B

type

A character string indicating the type of graphical representation that should be used to compare the estimated distributions of xlab.A and xlab.B. By default (type="density") density plots are used. Other possible options are “ecdf”, “qqplot”, “qqshift” and “hist”. See Details for more information.

ref

Logical, indicating whether the distribution estimated from data.B should be considered the reference or not. Default ref=FALSE. when Default ref=TRUE the estimation of the histograms, the density and the empirical cumulative distribution function are guided by data in data.B

Author

Marcello D'Orazio mdo.statmatch@gmail.com

Details

This function compares graphically the distribution of the same variable but estimated from data coming from two different data sources. The graphical comparison con be done in different manners. When type="hist" the continuous variable is categorized and the corresponding histograms, estimated from data.A and data.B, are compared. When present, the weights are used in estimating the relative frequencies. Note that the breaks to categorize the variable are decided according to the Freedman-Diaconis rule (nclass) and, in this case, with ref=TRUE the IQR is estimated solely on data.B, whereas with ref=FALSE it is estimated by joining the two data sources.

With type="density" the density plots are drawn; when available the weights are used in the estimation of the density that are based on the histograms (as suggested by Bellhouse and Stafford, 1999). Whentype="ecdf" the comparison relies on the empirical cumulative distribution function, that can be estimated considering the weights. Note that when ref=TRUE the estimation of the density and the empirical cumulative distribution are guided by the data in data.B.

The comparison is based on percentiles with type="qqplot" and type="qqshift". In the first case, the function draws a scatterplot (red dots) of the estimated percentiles of xlab.A vs. those of xlab.B; the dashed line indicated the ideal situation of equality of percentiles (points lying on the line). When type="qqshift" the scatterplot refers to (percentiles.A - percentiles.B) vs. percentiles.B; in this case the points lying on horizontal line passing through 0 indicate equality (difference equal to 0). Note that the number of estimated percentiles depends on the minimum between the two sample sizes. Only quartiles are calculated when min(n.A, n.B)<=50; quintiles are estimated when min(n.A, n.B)>50 and min(n.A, n.B)<=150; deciles are estimated when min(n.A, n.B)>150 and min(n.A, n.B)<=250; finally quantiles for probs=seq(from = 0.05,to = 0.95,by = 0.05) are estimated when min(n.A, n.B)>250. When survey weights are available (indicated through w.A and/or w.B) they are used in estimating the quantiles by calling the function wtd.quantile in the package Hmisc.

References

Bellhouse D.R. and J. E. Stafford (1999). “Density Estimation from Complex Surveys”. Statistica Sinica, 9, 407--424.

See Also

comp.cont

Examples

Run this code

# plotCont(data.A = samp.A, data.B = samp.B, xlab.A="age")
# plotCont(data.A = samp.A, data.B = samp.B, xlab.A="age", w.A = "ww")

Run the code above in your browser using DataLab