asterdata: Object Describing Saturated Aster Model

Description

Functions to construct and test conformance to the contract for objects of class "asterdata". All other functions in this package take model descriptions of this form.

Usage

asterdata(data, vars, pred, group, code, families, delta,
  response.name = "resp", varb.name = "varb",
  tolerance = 8 * .Machine$double.eps)
validasterdata(object, tolerance = 8 * .Machine$double.eps)
is.validasterdata(object, tolerance = 8 * .Machine$double.eps)

Value

an object of class "asterdata" is a list containing the following components

redata

a data frame having nrow(data) * length(vars) rows and containing variables having names in setdiff(names(data), vars) and also the names "id", response.name, and varb.name. Produced from data using the reshape function. Each variable in setdiff(names(data), vars) is repeated length(vars) times. The variable named response.name is the concatenation of the variables in data with names in vars. The variable named varb.name is a factor having levels vars that says which of the variables in the data frame data correspond to which components of the response vector. The variable named "id" is an integer vector that says which of the individuals (which rows of data) correspond to which rows of redata. Not all objects of class "asterdata" need have an id variable, although all those constructed by this function do.

repred

an integer vector satisfying length(repred) == nrow(redata) specifying the arrows of the graph of the aster model for all individuals. Must be nonnegative and satisfy all(repred < seq(along = repred)). A zero value of repred[j] indicates the predecessor of node j is an initial node (formerly called root node) of the graph. A nonzero value of repred[j] indicates the predecessor of node j is node repred[j]. In either case there is an arrow in the graph from predecessor node to successor node.

Note that repred is determined by pred but is quite different from it. Firstly, the lengths differ. Secondly, repred is not just a repetition of pred. The numbers in pred, if nonzero, are indices for the vector vars whereas the numbers in repred, if nonzero, are row indices for the data frame redata.

initial

a numeric vector specifying constants associated with initial nodes (formerly called root nodes) of the graphical model for all individuals. If repred[j] == 0 then the predecessor of node j is an initial node associated with the constant initial[j], which must be a positive integer unless the family associated with the arrow from this initial node to node j is infinitely divisible (the only such family currently implemented being Poisson), in which case initial[j] must be a strictly positive and finite real number. If repred[j] != 0, then initial[j] is ignored and may be any numeric value, including NA or NaN. This function always makes initial equal to rep(1, nrow(redata)) but the more general description above is valid for objects of class "asterdata" constructed “by hand”.

regroup

an integer vector satisfying length(regroup) == nrow(redata) specifying the lines of the graph of the aster model for all individuals, which in turn specify the dependence groups. Must be nonnegative and satisfy all(regroup < seq(along = regroup)). Nonzero elements of regroup indicate nodes of the graph that are connected by a line and hence are in the same dependence group: nodes j and regroup[j] are connected by a line. Since nodes in the same dependence group must have the same predecessor, this requires repred[regroup[j]] == repred[j]. Since nodes in the same dependence group must be in the same family, this requires recode[regroup[j]] == recode[j].

It also requires that the dimension of the family specified by recode[j] be the same as the number of nodes in the dependence group. Zero elements of regroup indicate nothing about dependence groups.

The lines indicate a transitive relation. If there is a line from node j1 to node j2 and a line from node j2 to node j3 then there is also a line from node j1 to node j3, but this line need not be specified by the group vector, and indeed cannot. If there is a dependence group with d nodes, then there are choose(d, 2) lines connecting these nodes, but the group vector can only specify d - 1 lines which imply the rest.

For example, if nodes j1, j2, j3, and j4 are to make up a four-dimensional dependence group and j1 < j2, j2 < j3, and j3 < j4, we must have regroup[j1] == 0, regroup[j2] == j1, regroup[j3] == j2, and regroup[j4] == j3. This is forced by the requirement all(regroup < seq(along = regroup)).

Note that regroup is determined by group but is quite different from it. Firstly, the lengths differ. Secondly, regroup is not just a repetition of group. The numbers in group, if nonzero, are indices for the vector vars whereas the numbers in regroup, if nonzero, are row indices for the data frame redata.

recode

an integer vector satisfying length(recode) == nrow(redata) specifying the families corresponding to the dependence groups. This requires

all(recode %in% seq(along = families)

Node j is in a dependence group with family described by families[recode[j]].

Note that regroup[j] == k requires recode[j] == recode[k] when regroup[j] != 0. Also note that recode is determined by code but is different from it. Firstly, the lengths differ. Secondly, recode need not be just a repetition of code. This function always makes recode equal to rep(code, each = nrow(redata)) but the more general description above is valid for objects of class "asterdata" constructed “by hand”.

families

a copy of the argument of the same name of this function except that any character string abbreviations are converted to objects of class "astfam".

redelta

a numeric vector satisfying length(redelta) == nrow(redata) specifying the degeneracies of the aster model for all individuals. If not the zero vector, the degenerate model specified is the limit as \(s \to \infty\) of nondegenerate models having conditional canonical parameter vector \(\theta + s \delta\) (note that the conditional canonical parameter vector is always used here, regardless of whether conditional or unconditional canonical affine submodels are to be used).

Note that redelta is determined by delta but is different from it. Firstly, the lengths differ. Secondly, redelta need not be just a repetition of delta. This function always makes redelta equal to rep(delta, each = nrow(redata)) but the more general description above is valid for objects of class "asterdata" constructed “by hand”.

response.name

a character string giving the name of the response variable in redata. For this function, a copy of the argument response.name.

varb.name

a character string giving the name of the “varb” variable in redata. For this function, a copy of the argument varb.name.

In addition an object of class "asterdata" may contain (and those constructed by this function do contain) components

pred, group, and code, which are copies of the arguments of the same names of this function. Objects of class "asterdata" not constructed by this function need not contain these additional components, since they may make no sense if the graph for all individuals is not the repetition of isomorphic subgraphs, one for each individual.

Arguments

data

a data frame containing response and predictor variables for the aster model.

vars

a character vector containing names of variables in the data frame data that are components of the response vector of the aster model.

pred

an integer vector satisfying length(pred) == length(vars) specifying the arrows of the subgraph of the aster model corresponding to a single individual. Must be nonnegative and satisfy all(pred < seq(along = pred)). A zero value of pred[j] indicates the predecessor of node j is an initial node (formerly called root node) of the subgraph. A nonzero value of pred[j] indicates the predecessor of node j is node pred[j]. In either case there is an arrow in the subgraph from predecessor node to successor node.

group

an integer vector satisfying length(group) == length(vars) specifying the lines of the subgraph of the aster model corresponding to a single individual, which in turn specify the dependence groups. Must be nonnegative and satisfy all(group < seq(along = group)). Nonzero elements of group indicate nodes of the subgraph that are connected by a line and hence are in the same dependence group: nodes j and group[j] are connected by a line. Since nodes in the same dependence group must have the same predecessor, this requires pred[group[j]] == pred[j]. Since nodes in the same dependence group must be in the same family, this requires code[group[j]] == code[j]. It also requires that the dimension of the family specified by code[j] be the same as the number of nodes in the dependence group. Zero elements of group indicate nothing about dependence groups.

For example, if nodes j1, j2, j3, and j4 are to make up a four-dimensional dependence group and j1 < j2, j2 < j3, and j3 < j4, we must have group[j1] == 0, group[j2] == j1, group[j3] == j2, and group[j4] == j3. This is forced by the requirement all(group < seq(along = group)).

code

an integer vector satisfying length(code) == length(vars) specifying the families corresponding to the dependence groups. This requires

all(code %in% seq(along = families)

Node j is in a dependence group with family described by families[code[j]].

Note that group[j] == k requires families[j] == families[k] when k != 0.

families

a list of family specifications (see families). Specifications of families not having hyperparameters may be abbreviated as character strings, for example, "binomial" rather than fam.binomial().

delta

a numeric vector satisfying length(delta) == length(vars) specifying the degeneracies of the aster model for a single individual. The model specified is the limit as \(s \to \infty\) of nondegenerate models having conditional canonical parameter vector \(\theta + s \delta\) (note that the conditional canonical parameter vector is always used here, regardless of whether conditional or unconditional canonical affine submodels are to be used). May be missing (and usually is) in which case \(\delta = 0\) is implied, meaning the limit is trivial (same as not taking a limit).

response.name

a character string giving the name of the response vector.

varb.name

a character string giving the name of the factor covariate that says which of the variables in the data frame data correspond to which components of the response vector.

tolerance

numeric >= 0. Relative errors smaller than tolerance are not considered in checking validity of normal location-scale data.

object

an object of class "asterdata". The function validasterdata always returns TRUE or throws an error with an informative message. The function is.validasterdata never throws an error unless object has the wrong class, returning TRUE or FALSE according to whether object does or does not conform to the contract for class "asterdata".

Details

Response variables in dependence groups are taken to be in the order they appear in the response vector. The first to appear in the response vector is the first canonical statistic for the dependence group distribution, the second to appear the second canonical statistic, and so forth. The number of response variables in the dependence group must match the dimension of the dependence group distribution.

This function only handles the usual case where the subgraph for every individual is isomorphic to subgraph for every other individual and all initial nodes (formerly called root nodes) correspond to the constant one. Each row of data is the data for one individual. The vectors vars, pred, group, code, and delta (if not missing) describe the subgraph for one individual (which is the same for all individuals).

In other cases for which this function does not have the flexibility to construct the appropriate object of class "asterdata", such an object will have to be constructed “by hand” using R statements not involving this function or modifying an object produced by this function. See the following section for description of such objects. The functions validasterdata and is.validasterdata can be used to check whether objects constructed “by hand” have been constructed correctly.

Examples

Run this code

data(test1)
fred <- asterdata(test1, vars = c("m1", "n1", "n2"), pred = c(0, 1, 1),
    group = c(0, 0, 2), code = c(1, 2, 2),
    families = list("bernoulli", "normal.location.scale"))
is.validasterdata(fred)