Given a grouping variable (treatment assignment, exposure status, etc)
and variables on which to compare the groups, compare averages across groups
and test hypothesis of no selection into groups on the basis of that variable.
The multivariate test is the method of combined differences discussed by
Hansen and Bowers (2008, Statist. Sci.), a variant of Hotelling's T-squared
test; the univariate tests are presented with multiplicity adjustments, the
details of which can be controlled by the user. Clustering, weighting and/or
stratification variables can be provided, and are addressed by the tests.
The function assembles various univariate descriptive statistics
for the groups to be compared: (weighted) means of treatment and
control groups; differences of these (adjusted differences); and
adjusted differences as multiples of a pooled s.d. of the variable
in the treatment and control groups (standard differences). Pooled
s.d.s are calculated with weights but without attention to clustering,
and ordinarily without attention to stratification. (If the user does
not request unstratified comparisons, overriding the default setting,
then pooled s.d.s are calculated with weights corresponding to the first
stratification for which comparison is requested. In this case as
in the default setting, the same pooled s.d.s are used for standardization
under each stratification considered. This facilitates comparison of
standard differences across stratification schemes.) Means
are contrasted separately for each provided stratifying factor and, by
default, for the unstratified comparison, in each case with weights
reflecting a standardization appropriate to the designated (post-)
stratification of the sample. In the case without stratification
or clustering, the only weighting used to calculate treatment and
control group means is that provided by the user as
unit.weights
; in the absence of such an argument, these
means are unweighted. When there are strata, within-stratum means
of treatment or of control observations are calculated using
unit.weights
, if provided, and then these are combined
across strata according to a ‘effect of treatment on
treated’-type weighting scheme. (The function's
stratum.weights
argument figures in the function's
inferential calculations but not these descriptive calculations.)
To figure a stratum's effect of treatment on treated weight, the
sum of all unit.weights
associated with treatment or
control group observations within the stratum is multiplied by the
fraction of clusters in that stratum that are associated with the
treatment rather than the control condition. (Unless this
fraction is 0 or 1, in which case the stratum is downweighted to
0.)
The function also calculates univariate and multivariate inferential
statistics, targeting the hypothesis that assignment was random within strata. These
calculations also pool unit.weights
-weighted, within-stratum group means across strata,
but the default weighting of strata differs from that of the descriptive calculations.
With stratum.weights=harmonic_times_mean_weight
(the default), each stratum
is weighted in proportion to the product of the stratum mean of unit.weights
and the harmonic mean \(1/[(1/a + 1/b)/2]=2*a*b/(a+b)\) of the number of
treated units (a) and control units (b) in the stratum; this weighting is optimal
under certain modeling assumptions (discussed in Kalton 1968 and Hansen and
Bowers 2008, Sections 3.2 and 5). The multivariate assessment is based on a Mahalanobis-type
distance that combines each of the univariate mean differences while accounting
for correlations among them. It's similar to the Hotelling's T-squared statistic,
except standardized using a permutation covariance. See Hansen and Bowers (2008).
In contrast to the earlier function xBalance
that it is intended to replace,
balanceTest
accepts only binary assignment variables (for now).
stratum.weights
must be a function of a single argument,
a data frame containing the variables in data
and
additionally Tx.grp
, stratum.code
, and unit.weights
,
returning a named numeric vector of non-negative weights identified by stratum.
(For an example, enter getFromNamespace("harmonic", "RItools")
.)
the data stratum.weights
function.
If the stratifying factor has NAs, these cases are dropped. On the other
hand, if NAs in a covariate are found then those observations are dropped for descriptive
calculations and "imputed" to the stratum mean of the variable for inferential calculations.
When covariate values are dropped due to missingness, proportions of observations not missing on
that variable are recorded and returned. The printed output presents non-missing proportions alongside of
the variables themselves, distinguishing the former by placing them at the bottom of the list and enclosing the
variable's name in parentheses. If a variable shares a missingness pattern with other another variable,
its missingness information may be labeled with the name of the other variable in the output.