Functions to construct and test conformance to the contract for objects
of class "asterdata"
. All other functions in this package take
model descriptions of this form.
asterdata(data, vars, pred, group, code, families, delta,
response.name = "resp", varb.name = "varb",
tolerance = 8 * .Machine$double.eps)
validasterdata(object, tolerance = 8 * .Machine$double.eps)
is.validasterdata(object, tolerance = 8 * .Machine$double.eps)
an object of class "asterdata"
is a list containing the
following components
a data frame having nrow(data) * length(vars)
rows
and containing variables having names
in setdiff(names(data), vars)
and also the names
"id"
, response.name
, and varb.name
.
Produced from data
using the reshape
function. Each variable in setdiff(names(data), vars)
is repeated
length(vars)
times. The variable named response.name
is the concatenation of the variables in data
with names
in vars
. The variable named varb.name
is a factor
having levels vars
that says which of the variables in the data
frame data
correspond to which components of the response vector.
The variable named "id"
is an integer vector that says which of
the individuals (which rows of data
) correspond to which rows
of redata
. Not all objects of class "asterdata"
need
have an id
variable, although all those constructed by this
function do.
an integer vector satisfying
length(repred) == nrow(redata)
specifying the arrows of the
graph of the aster model for all individuals.
Must be nonnegative and satisfy all(repred < seq(along = repred))
.
A zero value of repred[j]
indicates the predecessor of node
j
is an initial node (formerly called root node) of the graph.
A nonzero value of repred[j]
indicates the predecessor of node
j
is node repred[j]
. In either case there is an arrow in
the graph from predecessor node to successor node.
Note that
repred
is determined by pred
but is quite different from
it. Firstly, the lengths differ. Secondly, repred
is not just
a repetition of pred
. The numbers in pred
, if nonzero,
are indices for the vector vars
whereas the numbers
in repred
, if nonzero,
are row indices for the data frame redata
.
a numeric vector specifying constants associated with
initial nodes (formerly called root nodes) of the graphical model
for all individuals. If repred[j] == 0
then the predecessor
of node j
is an initial node associated with the constant
initial[j]
, which must be a positive integer unless the
family associated with the arrow from this initial node to node j
is infinitely divisible (the only such family currently implemented
being Poisson), in which case initial[j]
must
be a strictly positive and finite real number. If repred[j] != 0
,
then initial[j]
is ignored and may be any numeric value, including
NA
or NaN
. This function always makes initial
equal to rep(1, nrow(redata))
but the more general description
above is valid for objects of class "asterdata"
constructed
“by hand”.
an integer vector satisfying
length(regroup) == nrow(redata)
specifying the lines of the graph of the aster model for all individuals,
which in turn specify the dependence groups.
Must be nonnegative
and satisfy all(regroup < seq(along = regroup))
.
Nonzero elements of regroup
indicate nodes of the graph that
are connected by a line and hence are in the same dependence group:
nodes j
and regroup[j]
are connected by a line. Since
nodes in the same dependence group must have the same predecessor,
this requires repred[regroup[j]] == repred[j]
. Since
nodes in the same dependence group must be in the same family,
this requires recode[regroup[j]] == recode[j]
.
It also requires that the dimension of the family specified by
recode[j]
be the same as the number of nodes in the dependence
group. Zero elements of regroup
indicate nothing about dependence
groups.
The lines indicate a transitive relation. If there is a line from
node j1
to node j2
and a line from
node j2
to node j3
then there is also a line from
node j1
to node j3
, but this line need not be specified
by the group
vector, and indeed cannot. If there is a dependence
group with d
nodes, then there are choose(d, 2)
lines
connecting these nodes, but the group
vector can only specify
d - 1
lines which imply the rest.
For example, if nodes j1
, j2
, j3
, and j4
are to make up a four-dimensional dependence group and j1 < j2
,
j2 < j3
, and j3 < j4
, we must have regroup[j1] == 0
,
regroup[j2] == j1
, regroup[j3] == j2
, and
regroup[j4] == j3
.
This is forced by the requirement
all(regroup < seq(along = regroup))
.
Note that
regroup
is determined by group
but is quite different from
it. Firstly, the lengths differ. Secondly, regroup
is not just
a repetition of group
. The numbers in group
, if nonzero,
are indices for the vector vars
whereas the numbers
in regroup
, if nonzero,
are row indices for the data frame redata
.
an integer vector satisfying
length(recode) == nrow(redata)
specifying the families corresponding to the dependence groups.
This requires
all(recode %in% seq(along = families)
Node j
is in a dependence group
with family described by families[recode[j]]
.
Note that regroup[j] == k
requires recode[j] == recode[k]
when regroup[j] != 0
.
Also note that
recode
is determined by code
but is different from
it. Firstly, the lengths differ. Secondly, recode
need not be
just a repetition of code
.
This function always makes recode
equal to rep(code, each = nrow(redata))
but the more general
description
above is valid for objects of class "asterdata"
constructed
“by hand”.
a copy of the argument of the same name of this function
except that any character string abbreviations are converted to objects
of class "astfam"
.
a numeric vector satisfying
length(redelta) == nrow(redata)
specifying the degeneracies of the aster model for all individuals.
If not the zero vector, the degenerate model
specified is the limit as \(s \to \infty\) of
nondegenerate models having conditional canonical parameter vector
\(\theta + s \delta\) (note that the conditional
canonical parameter vector is always used here, regardless of whether
conditional or unconditional canonical affine submodels are to be used).
Note that
redelta
is determined by delta
but is different from
it. Firstly, the lengths differ. Secondly, redelta
need not be
just a repetition of delta
.
This function always makes redelta
equal to rep(delta, each = nrow(redata))
but the more general
description
above is valid for objects of class "asterdata"
constructed
“by hand”.
a character string giving the name of the response
variable in redata
. For this function, a copy of the argument
response.name
.
a character string giving the name of the “varb”
variable in redata
. For this function, a copy of the argument
varb.name
.
In addition an object of class "asterdata"
may contain (and those
constructed by this function do contain) components
pred
, group
, and code
,
which are copies of the arguments of the same names of this function.
Objects of class "asterdata"
not constructed by this function need
not contain these additional components, since they may make no sense if
the graph for all individuals is not the repetition of isomorphic subgraphs,
one for each individual.
a data frame containing response and predictor variables for the aster model.
a character vector containing names of variables in the data
frame data
that are components of the response vector of the
aster model.
an integer vector satisfying length(pred) == length(vars)
specifying the arrows of the subgraph of the aster model corresponding
to a single individual. Must be nonnegative and satisfy
all(pred < seq(along = pred))
.
A zero value of pred[j]
indicates the predecessor of node j
is an initial node (formerly called root node) of the subgraph.
A nonzero value of pred[j]
indicates the predecessor of node
j
is node pred[j]
. In either case there is an arrow in
the subgraph from predecessor node to successor node.
an integer vector satisfying length(group) == length(vars)
specifying the lines of the subgraph of the aster model corresponding to
a single individual, which in turn specify the dependence groups.
Must be nonnegative and satisfy all(group < seq(along = group))
.
Nonzero elements of group
indicate nodes of the subgraph that
are connected by a line and hence are in the same dependence group:
nodes j
and group[j]
are connected by a line. Since
nodes in the same dependence group must have the same predecessor,
this requires pred[group[j]] == pred[j]
. Since
nodes in the same dependence group must be in the same family,
this requires code[group[j]] == code[j]
.
It also requires that the dimension of the family specified by
code[j]
be the same as the number of nodes in the dependence
group. Zero elements of group
indicate nothing about dependence
groups.
The lines indicate a transitive relation. If there is a line from
node j1
to node j2
and a line from
node j2
to node j3
then there is also a line from
node j1
to node j3
, but this line need not be specified
by the group
vector, and indeed cannot. If there is a dependence
group with d
nodes, then there are choose(d, 2)
lines
connecting these nodes, but the group
vector can only specify
d - 1
lines which imply the rest.
For example, if nodes j1
, j2
, j3
, and j4
are to make up a four-dimensional dependence group and j1 < j2
,
j2 < j3
, and j3 < j4
, we must have group[j1] == 0
,
group[j2] == j1
, group[j3] == j2
, and
group[j4] == j3
.
This is forced by the requirement all(group < seq(along = group))
.
an integer vector satisfying length(code) == length(vars)
specifying the families corresponding to the dependence groups.
This requires
all(code %in% seq(along = families)
Node j
is in a dependence group
with family described by families[code[j]]
.
Note that group[j] == k
requires families[j] == families[k]
when k != 0
.
a list of family specifications
(see families
). Specifications of families not having
hyperparameters may be abbreviated as character strings, for example,
"binomial"
rather than fam.binomial()
.
a numeric vector satisfying length(delta) == length(vars)
specifying the degeneracies of the aster model for a single individual.
The model specified is the limit as \(s \to \infty\) of
nondegenerate models having conditional canonical parameter vector
\(\theta + s \delta\) (note that the conditional
canonical parameter vector is always used here, regardless of whether
conditional or unconditional canonical affine submodels are to be used).
May be missing (and usually is) in which case \(\delta = 0\)
is implied, meaning the limit is trivial (same as not taking a limit).
a character string giving the name of the response vector.
a character string giving the name of the factor covariate
that says which of the variables in the data frame data
correspond
to which components of the response vector.
numeric >= 0. Relative errors smaller
than tolerance
are not considered in checking validity
of normal location-scale data.
an object of class "asterdata"
. The function
validasterdata
always returns TRUE
or throws an error with
an informative message. The function is.validasterdata
never throws
an error unless object
has the wrong class, returning TRUE
or FALSE
according to whether object
does or does not
conform to the contract for class "asterdata"
.
Response variables in dependence groups are taken to be in the order they appear in the response vector. The first to appear in the response vector is the first canonical statistic for the dependence group distribution, the second to appear the second canonical statistic, and so forth. The number of response variables in the dependence group must match the dimension of the dependence group distribution.
This function only handles the usual case where the subgraph for every
individual is isomorphic to subgraph for every other individual
and all initial nodes (formerly
called root nodes) correspond to the constant one. Each row of data
is the data for one individual. The vectors vars
, pred
,
group
, code
, and delta
(if not missing) describe
the subgraph for one individual (which is the same for all individuals).
In other cases for which this function does not have the flexibility to
construct the appropriate object of class "asterdata"
, such an
object will have to be constructed “by hand” using R statements
not involving this function or modifying an object produced by this
function. See the following section for description of such objects.
The functions validasterdata
and is.validasterdata
can be
used to check whether objects constructed “by hand” have been
constructed correctly.
families
and subset.asterdata
data(test1)
fred <- asterdata(test1, vars = c("m1", "n1", "n2"), pred = c(0, 1, 1),
group = c(0, 0, 2), code = c(1, 2, 2),
families = list("bernoulli", "normal.location.scale"))
is.validasterdata(fred)
Run the code above in your browser using DataLab