Frechet.bounds.cat: Frechet bounds of cells in a contingency table

Description

This function permits to derive the bounds for cell probabilities of the table Y vs. Z starting from the marginal tables (X vs. Y), (X vs. Z) and the joint distribution of the X variables.

Usage

Frechet.bounds.cat(tab.x, tab.xy, tab.xz, print.f="tables", tol= 0.0001)

Arguments

tab.x

A Rtable crossing the X variables. This table must be obtained by using the function xtabs or table, e.g. tab.x <- xtabs(~x1+x2+x3, data

tab.xy

A Rtable of X vs. Y variable. This table must be obtained by using the function xtabs or table, e.g. table.xy <- xtabs(~x1+x2+x3+y, data=

tab.xz

A Rtable of X vs. Z variable. This table must be obtained by using the function xtabs or table, e.g. tab.xz <- xtabs(~x1+x2+x3+z, data=da

print.f

A string: when print.f="tables" (default) all the cells' estimates will be saved as tables in a list. On the contrary, if print.f="data.frame", they will be saved as columns of a data.frame.

tol

Tolerance used in comparing joint distributions as far as X variables are considered (default tol= 0.0001); the joint distribution of the X variables computed from tab.xy and tab.xz should be equal to t

Value

When print.f="tables" (default) a list with the following components:
low.uThe estimated lower bounds for the relative frequencies in the table Y vs. Z without conditioning on the X variables.
up.uThe estimated upper bounds for the relative frequencies in the table Y vs. Z without conditioning on the X variables.
CIAThe estimated relative frequencies in the table Y vs. Z under the Conditional Independence Assumption (CIA).
low.cxThe estimated lower bounds for the relative frequencies in the table Y vs. Z when conditioning on the X variables.
up.cxThe estimated upper bounds for the relative frequencies in the table Y vs. Z when conditioning on the X variables.
uncertaintyThe uncertainty associated to input data, summarized in terms average width of uncertainty bounds with and without conditioning on the X variables estimated, overall uncertainty estimated according to the suggestion in Conti et al. (2012) (see Fbwidths.by.x for major details).
When print.f="data.frame" the output list contains just two components:
boundsA data.frame whose columns reports the estimated uncertainty bounds.
uncertaintyThe uncertainty associated to input data, summarized in terms average width of uncertainty bounds with and without conditioning on the X variables estimated, overall uncertainty estimated according to the suggestion in Conti et al. (2012) (see Fbwidths.by.x for major details).

Details

This function permits to compute the Frechet bounds for the relative frequencies in the contingency table of Y vs. Z, starting from the distributions P(Y|X), P(Z|X) and P(X). The bounds for the relative frequencies $p_{j,k}$ in the table Y vs. Z are:

$$p^{(low)}_{Y=j,Z=k} = \sum_{i} p_{X=i}\max (0; p_{Y=j|X=i} + p_{Z=k|X=i} - 1 )$$

$$p^{(up)}_{Y=j,Z=k} = \sum_{i} p_{X=i} \min ( p_{Y=j|X=i}; p_{Z=k|X=i})$$

The relative frequencies $p_{X=i}=n_i/n$ are computed from the frequencies in tab.x; the relative frequencies $p_{Y=j|X=i}=n_{ij}/n_{i+}$ are computed from the tab.xy, finally, $p_{Z=k|X=i}=n_{ik}/n_{k+}$ are derived from tab.xy.

It is assumed that the marginal distribution of the X variables is the same in all the input tables: tab.x, tab.xy and tab.xz. If this is not true a warning message will appear.

Note that the cells bounds for the relative frequencies in the contingency table of Y vs. Z are computed also without considering the X variables:

$$\max{0; p_{Y=j} + p_{Z=k} - 1} \leq p_{Y=j,Z=k} \leq \min { p_{Y=j}; p_{Z=k}}$$

Finally, the contingency table of Y vs. Z estimated under the Conditional Independence Assumption (CIA) is obtained by considering:

$$p_{Y=j,Z=k} = p_{Y=j|X=i} \times p_{Z=k|X=i} \times p_{X=i}.$$

References

Ballin, M., D'Orazio, M., Di Zio, M., Scanu, M. and Torelli, N. (2009) Statistical Matching of Two Surveys with a Common Subset. Working Paper, 124. Dip. Scienze Economiche e Statistiche, Univ. di Trieste, Trieste.

Conti P.L, Marella, D., Scanu, M. (2012) Uncertainty Analysis in Statistical Matching. Journal of Official Statistics, 28, pp. 69--88.

D'Orazio, M., Di Zio, M. and Scanu, M. (2006). Statistical Matching: Theory and Practice. Wiley, Chichester.

Examples

Run this code

data(quine, package="MASS") #loads quine from MASS
str(quine)

# split quine in two subsets
set.seed(7654)
lab.A <- sample(nrow(quine), 70, replace=TRUE)
quine.A <- quine[lab.A, 1:3]
quine.B <- quine[-lab.A, 2:4]

# compute the tables required by Frechet.bounds.cat()
freq.x <- xtabs(~Sex+Age, data=quine.A)
freq.xy <- xtabs(~Sex+Age+Eth, data=quine.A)
freq.xz <- xtabs(~Sex+Age+Lrn, data=quine.B)

# apply Frechet.bounds.cat()
bounds.yz <- Frechet.bounds.cat(tab.x=freq.x, tab.xy=freq.xy,
        tab.xz=freq.xz, print.f="data.frame")
bounds.yz

#compare marg. distribution of Xs in A and B
comp.prop(p1=margin.table(freq.xy,c(1,2)), p2=margin.table(freq.xz,c(1,2)), 
          n1=nrow(quine.A), n2=nrow(quine.B))

# harmonize distr. of Sex vs. Age before applying
# Frechet.bounds.cat()

N <- nrow(quine)
quine.A$pop <- N
quine.A$f <- N/70 # reciprocal sampling fraction
quine.B$pop <- N
quine.B$f <- N/(N-70)

# derive the table of Sex vs. Age related to the whole data set
tot.sex.age <- colSums(model.matrix(~Sex*Age-1, data=quine))
tot.sex.age

# use hamonize.x() to harmonize the Sex vs. Age between
# quine.A and quine.B

# create svydesign objects
require(survey)
svy.qA <- svydesign(~1, weights=~f, fpc=~pop, data=quine.A)
svy.qB <- svydesign(~1, weights=~f, fpc=~pop, data=quine.B)

# apply harmonize.x 
out.hz <- harmonize.x(svy.A=svy.qA, svy.B=svy.qB, form.x=~Sex*Age-1, x.tot=tot.sex.age)

# compute the new tables required by Frechet.bounds.cat()
freq.x <- xtabs(out.hz$weights.A~Sex+Age, data=quine.A)
freq.xy <- xtabs(out.hz$weights.A~Sex+Age+Eth, data=quine.A)
freq.xz <- xtabs(out.hz$weights.B~Sex+Age+Lrn, data=quine.B)

#compare marg. distribution of Xs in A and B
comp.prop(p1=margin.table(freq.xy,c(1,2)), p2=margin.table(freq.xz,c(1,2)), 
          n1=nrow(quine.A), n2=nrow(quine.B))

# apply Frechet.bounds.cat()
bounds.yz <- Frechet.bounds.cat(tab.x=freq.x, tab.xy=freq.xy,
        tab.xz=freq.xz, print.f="data.frame")
bounds.yz

Run the code above in your browser using DataLab