Function for comfortably creating a D-optimal design with or without blocking based on functions optFederov or optBlock from package AlgDesign; this functionality is still somewhat experimental.
Dopt.design(nruns, data=NULL, formula=~., factor.names=NULL, nlevels=NULL,
digits=NULL, constraint=NULL, center=FALSE, nRepeats=5, seed=NULL, randomize=TRUE,
blocks=1, block.name="Blocks", wholeBlockData=NULL, qual=NULL, ...)
The function returns a data frame of S3 class design
with attributes attached.
The data frame contains the experimental settings.
The matrix desnum
attached as attribute desnum
contains the
model matrix of the design, using the formula as specified in the call.
Function Dopt.augment
preserves additional variables (e.g. responses) that
have been added to the design design
before augmenting. Note, however, that
the response data are NOT used in deciding about which points to augment the design with.
The attribute run.order
provides the run number in standard order (as returned from
function optFederov
in package AlgDesign) as well
as the randomized actual run order. The third column is always identical to the first.
The attribute design.info
is a list of various design properties, with type resolving to “Dopt”,
“Dopt.blocked”, “Dopt.splitplot”.
In addition to the standard list elements (cf. design
), the element
quantitative
is a vector of nfactor
logical values or NAs,
and the optional digits
elements indicates the number of digits to
which the data were rounded.
For blocked and splitplot designs, the list contains additional information on numbers and sizes of blocks or plots,
as well as the number of whole plot factors (which are always the first few factors) and split-plot factors.
The list contains a list of optimality criteria as calculated by function optFederov
,
see documentation there)
with elements D
, Dea
, A
and G
.
(Note that replications
is always 1 and repeat.only
is always FALSE;
these elements are only present to fulfill the formal requirements for class design
.
Note however, that blocked designs do in fact repeat experimental runs if nruns
and blocks
imply this.)
number of runs in the requested design
data frame or matrix of candidate design points;
if data
is specified, factor.names
and levels
are ignored
a model formula (starting with a tilde),
for the estimation of which a D-optimal design is sought;
it can contain all column names from data
or elements or element names from factor.names
, respectively;
usage of the “.”-notation for “all variables” from data
or factor.names
is possible.
The default formula linearly includes all main effects for columns of data
or factors from
factor.names
respectively, by using the “.”-notation.
Note that the variables from wholeBlockData
must be explicitly included into the formula
and are not covered by the “.”-notation for “all variables”. (Thus, the default formula
does not work, if wholeBlockData
is used.)
For quantitative factors, functions quad()
and cubic
describe the
full quadratic or full cubic model in the listed variables (cf. examples
and the expand.formula
-function from package AlgDesign).
is used for creating a candidate set (for the within Block factors)
with the help of function
fac.design
, if data
is not specified. It is a
list of vectors which contain
- individual levels
- or (in case of numerical values combined with nlevels) lower and upper scale end values
for each factor.
The element names are used as variable names;
if the list is not named, the variable names are A, B and so forth (from function
fac.design
).
factor.names
can also be a character vector.
In this case, nlevels
must be specified, and levels are automatically assigned
as integers starting with 1, which implies quantitative factors,
unless qual=TRUE
is specified.
can be omitted if the list factor.names
explicitly
lists all factor levels (which of course defines the number of levels).
For numeric factors for which factor.names
only specifies the
two scale ends, these are filled with equally-spaced intermediate points,
using the nlevels entry as the length.out
argument to function
seq
.
If factor.names
is a character vector of factor names only,
nlevels
is required, and default levels are created.
is used for creating a candidate set if data
is not specified.
It specifies the digits to which numeric design columns are rounded in case of
automatic creation of intermediate values. It can consist of one single value
(the same for all such factors) or a numeric vector of the same length
as factor.names
with integer entries.
a condition (character string!) used for reducing the candidate
set to admissible points only.
constraint
is evaluated on the specified data set or after automatic creation
of a full factorial candidate data set.
The variable names from data
or factor.names
can be used by the constraint.
The variable names from wholePlotData
can NOT be used.
See Syntax
and Logic
for an explanation of the syntax of general and especially logical
R expressions.
requests that optimization is run for the centered model; the design is nevertheless output in non-centered coordinates
number of independent repeats of the design optimization process; increasing this number may improve the chance of finding a global optimum, but will also increase search time
seed for generation and randomization of the design (integer number);
here, the seed is needed even if the design is not randomized, because the
generation process for the optimum design involves random numbers, even if the
order of the final design is not randomized;
if a reproducible design is needed, it is therefore recommended to specify a seed.
In R version 3.6.0 and later, the default behavior of function sample
has changed. If you work in a new (i.e., >= 3.6.-0) R version and want to reproduce
a randomized design from an earlier R version (before 3.6.0),
you have to change the RNGkind setting by
RNGkind(sample.kind="Rounding")
before running function Dopt.design
.
It is recommended to change the setting back to the new recommended way afterwards:
RNGkind(sample.kind="default")
For an example, see the documentation of the example data set VSGFS
.
logical deciding whether or not the design should be randomized;
if it is TRUE
, the design (or the additional portion of the design) returned by the
workhorse function optFederov
is brought
into random order after generation. Note that the generation process
itself contains a random element per default; if exact repeatability for the
returned design is desired, it is necessary to specify a seed (option seed
)
if in the case randomize=FALSE
.
a single integer giving the number of blocks (default 1, if no blocking is needed)
OR
a vector of block sizes which enable blocks of different sizes;
for a scalar value, nruns
must be divisible into blocks equally-sized blocks;
for a vector value, the block sizes must add up to nruns
.
If blocking is requested, the following two options are potentially important.
character string: name of the blocking variable (used only if blocks are requested)
optional matrix or data frame that specifies the whole block characteristics;
can only be used if blocks are requested; if used, it must have as many rows as there are block sizes.
If this is specified, the resulting design is a split-plot design with the whole-plot
factors specified in wholeBlockData, the split-plot factors specified in data.
Note that usage of this option makes it necessary to explicitly specify a formula.
Since wholeBlockData must be completely specified by the user, optimization is for the split-plot portion of the design only. The rationale is (assumably) that the characteristics of the available blocks are known. If this is not the case, users may want to try out various possible whole block setups, or to proceed sequentially by first optimizing a whole block design for a model with the whole block factors only and subsequently using this model for adding split-plot factors.
optional logical (length 1 or same as number of factors); ignored, if data
is specified; overrides automatic determination of whether or not factors are quantitative;
if neither qual
nor data
are specified, factors are per default quantitative,
unless they have non-numeric levels in a list-valued factor.names
additional arguments to functions optFederov
or optBlock
(if blocking is requested)
from package AlgDesign;
interesting arguments for optFederov
: maxIteration
,
nullify
(calculate good starting design, especially set to 1,
in which case nRepeats
is set to 1);
arguments criterion
and augment
are not available, neither
are evaluateI
, space
, or rows
, and args
does not have an effect.
Since R version 3.6.0, the behavior of function sample
has changed
(correction of a biased previous behavior that should not be relevant for the randomization of designs).
For reproducing a design that was produced with an earlier R version,
please follow the steps described with the argument seed
.
Ulrike Groemping
Function Dopt.design
creates a D-optimal design, optionally with blocking,
and even as a split-plot design. If no blocks are required, calculations are carried
out through function optFederov
from package AlgDesign.
In case of blocked designs, function optBlock
from package AlgDesign
is behind the calculations. By specifying wholeBlockData
, a blocked design becomes
a split-plot design. The model formula can refer to both the within block data (only those
are referred to by the “.” notation) and the whole block data and interactions between both.
In comparison to direct usage of package AlgDesign, the function adds the possibility
of automatically creating the candidate points on the fly, with or without constraints.
Furthermore, it embeds the D-optimal designs into the class design
.
On the other hand, it sacrifices some of AlgDesigns flexibility; of course, users
can still use AlgDesign directly.
The D-optimal designs are particularly useful, if the classical regular designs are too demanding in run size requirements, or if constraints preclude automatic generation of orthogonal designs. Note, however, that the best design in few runs can still be very bad in absolute terms!
When specifying the design without the data
option, a full factorial in the
requested factors is the default candidate set of design points. For some situations - especially
with many factors - it may be better to start from a restricted candidate set. Such a candidate set
can be produced with another R function, e.g. oa.design
or FrF2
,
or can be manually created.
If there are doubts, whether the process has delivered a design close to the absolute optimum,
nRepeats
can be increased.
For unblocked designs, it is additionally possible to increase maxIteration
.
Also, improving the starting
value by nullify=1
or nullify=2
may lead to an improved design.
These options are handed through to function optFederov
from package AlgDesign and are documented there.
Atkinson, A.C. and Donev, A.N. (1992). Optimum experimental designs. Clarendon Press, Oxford.
Federov, V.V. (1972). Theory of optimal experiments. Academic Press, New York.
Wheeler, R.E. (2004). Comments on algorithmic design. Vignette accompanying package AlgDesign. ../../AlgDesign/doc/AlgDesign.pdf.
See also optFederov
, fac.design
,
quad
, cubic
,
Dopt.augment
. Furthermore, unrelated to function Dopt.design
,
see also function gen_design
from package skpr
for a new general R package for creating D-optimal or other letter optimal designs.
## a full quadratic model with constraint in three quantitative factors
plan <- Dopt.design(36,factor.names=list(eins=c(100,250),zwei=c(10,30),drei=c(-25,25)),
nlevels=c(4,3,6),
formula=~quad(.),
constraint="!(eins>=200 & zwei==30 & drei==25)")
plan
cor(plan)
y <- rnorm(36)
r.plan <- add.response(plan, y)
plan2 <- Dopt.augment(r.plan, m=10)
plot(plan2)
cor(plan2)
## designs with qualitative factors and blocks for
## an experiment on assessing stories of social situations
## where each subject is a block and receives a deck of 5 stories
plan.v <- Dopt.design(480, factor.names=list(cause=c("sick","bad luck","fault"),
consequences=c("alone","children","sick spouse"),
gender=c("Female","Male"),
Age=c("young","medium","old")),
blocks=96,
constraint="!(Age==\"young\" & consequences==\"children\")",
formula=~.+cause:consequences+gender:consequences+Age:cause)
## an experiment on assessing stories of social situations
## with the whole block (=whole plot) factor gender of the assessor
## not run for saving test time on CRAN
if (FALSE) plan.v.splitplot <- Dopt.design(480, factor.names=list(cause=c("sick","bad luck","fault"),
consequences=c("alone","children","sick spouse"),
gender.story=c("Female","Male"),
Age=c("young","medium","old")),
blocks=96,
wholeBlockData=cbind(gender=rep(c("Female","Male"),each=48)),
constraint="!(Age==\"young\" & consequences==\"children\")",
formula=~.+gender+cause:consequences+gender.story:consequences+
gender:consequences+Age:cause+gender:gender.story)
Run the code above in your browser using DataLab