pw.assoc(formula, data, weights=NULL, freq0c=NULL)
y~x1+x2
where y
denotes the name of the categorical variable (a factor
in x1
and x2
are the name of tformula
.data
which provides the units' weights. Weights are used to estimate frequencies (a cell frequency is estimated by summing the weights of the units which present the given characteristics). Default is NULL
(default) a cell with zero frequency is substitutes with 1/N^2, being N the sample size.list
object with for components.formula
. The following association measure are considered:Cramer's V:
$$V=\sqrt{\frac{\chi^2}{N \times min\left[I-1,J-1\right]} }$$
N is the sample size, I is the number of rows and J is the number of columns. Cramer's V ranges from 0 to 1.
Goodman--Kruskal $\lambda(R|C)$:
$$\lambda(R|C) = \frac{\sum_{j=1}^J max_{i}(p_{ij}) - max_{i}(p_{i+})}{1-max_{i}(p_{i+})}$$
It ranges from 0 to 1, and denotes how much the knowledge of the column variable (predictor) helps in reducing the prediction error of the values of the row variable.
Goodman--Kruskal $\tau(R|C)$:
$$\tau(R|C) = \frac{ \sum_{i=1}^I \sum_{j=1}^J p^2_{ij}/p_{+j} - \sum_{i=1}^I p_{i+}^2}{1 - \sum_{i=1}^I p_{i+}^2}$$
It takes values in the interval [0,1] and has the same PRE meaning of the lambda.
Theil's Uncertainty coefficient:
$$U(R|C) = \frac{\sum_{i=1}^I \sum_{j=1}^J p_{ij} log(p_{ij}/p_{+j}) - \sum_{i=1}^I p_{i+} log p_{i+}}{- \sum_{i=1}^I p_{i+} log p_{i+}}$$
It takes values in the interval [0,1] and measure the reduction of uncertainty in the row variable due to knowing the column variable.
It is worth noting that $\lambda$, $\tau$ and U are asymmetric measures of the proportional reduction of the variance of the row column when passing from its marginal distribution to its conditional distribution given the column variable obtained starting from the general expression (cf. Agresti, 2002, p. 56):
$$\frac{V(R) - E[V(R|C)]}{V(R)}$$
They differ in the way of measuring variance, in fact it does not exist a general accepted definition of the variance of a categorical variable.
data(quine, package="MASS") #loads quine from MASS
str(quine)
# how Lrn is response variable
pw.assoc(Lrn~Age+Sex+Eth, data=quine)
# usage of units' weights
quine$ww <- runif(nrow(quine), 1,4) #random gen 1<=weights<=4
pw.assoc(Lrn~Age+Sex+Eth, data=quine, weights="ww")
Run the code above in your browser using DataLab