These functions generate fractional polynomials for a variable similar to
fracgen
in Stata. transform_vector_acd
generates the acd transformation
for a variable.
transform_vector_fp(
x,
power = 1,
scale = 1,
shift = 0,
name = NULL,
check_binary = TRUE
)transform_vector_acd(
x,
power = c(1, 1),
shift = 0,
powers = NULL,
scale = 1,
acd_parameter = NULL,
name = NULL
)
Returns a matrix of transformed variable(s). The number of columns
depends on the number of powers provided, the number of rows is equal to the
length of x
. The columns are sorted by increased power.
If all powers are NA
, then this function returns NULL
.
In case an acd transformation is applied, the output is a list with two
entries. The first acd
is the matrix of transformed variables, the acd
term is returned as the last column of the matrix (i.e. in case that the
power for the normal data is NA
, then it is the only column in the matrix).
The second entry acd_parameter
returns a list of estimated parameters
for the ACD transformation, or simply the input acd_parameter
if it was
not NULL
.
a vector of a predictor variable.
a numeric vector indicating the FP power. Default is 1 (linear).
Must be a vector of length 2 for acd transformation. Ignores NA
, unless
an ACD transformation is applied in which case power must be a numeric
vector of length 2, and NA
indicated which parts are used for the final
FP.
scaling factor for x of interest. Must be a positive integer
or NULL
. Default is 1, meaning no scaling is applied.
If NULL
, then scaling factors are automatically estimated by the
program.
shift required for shifting x to positive values. Default is 0,
meaning no shift is applied. If NULL
then the shift is estimated
automatically using the Royston and Sauerbrei formula iff any x
<= 0.
character used to define names for the output matrix. Default
is NULL
, meaning the output will have unnamed columns.
a logical indicating whether or not input x
is checked
if it is a binary variable (i.e. has only two distinct values). The default
TRUE
usually only needs to changed when this function is to be used to
transform data for predictions. See Details.
passed to fit_acd()
.
a list usually returned by fit_acd()
. In particular,
it must have components that define beta0
, beta1
, power
, shift
and
scale
which are to be applied when using the acd transformation in
new data.
transform_vector_acd()
: Function to generate acd transformation.
An important note on data processing. Variables are shifted and scaled
before being transformed by any powers. That is to ensure positive values
and reasonable scales. Note that scaling does not change the estimated
powers, see also find_scale_factor()
.
However, they may be centered after transformation. This is not done by
these functions.
That is to ensure that the correlation between variables stay intact,
as centering before transformation would affect them. This is described
in Sauerbrei et al (2006), as well as in the Stata manual of mfp
.
Also, centering is not recommended, and should only be done for the final
model if desired.
The fp transformation generally transforms x
as follows. For each pi in
power
= (p1, p2, ..., pn) it creates a variable x^pi and returns the
collection of variables as a matrix. It may process the data using
shifting and scaling as desired. Centering has to be done after the
data is transformed using these functions, if desired.
A special case are repeated powers, i.e. when some pi = pj. In this case, the fp transformations are given by x^pi and x^pi * log(x). In case more than 2 powers are repeated they are repeatedly multiplied with log(x) terms, e.g. pi = pj = pk leads to x^pi, x^pi * log(x) and x^pi * log(x)^2.
Note that the powers pi are assumed to be sorted. That is, this function
sorts them, then proceeds to compute the transformation. For example,
the output will be the same for power = c(1, 1, 2)
and
power = c(1, 2, 1)
. This is done to make sense of repeated powers and
to uniquely define FPs. In case an ACD transformation is used, there is a
specific order in which powers are processed, which is always the same (but
not necessarily sorted).
Thus, throughout the whole package powers will always be given and processed
in either sorted, or ACD specific order and the columns of the matrix
returned by this function will always align with the powers used
throughout this package.
Binary variables are not transformed, unless check_binary
is set to
FALSE
. This is usually not necessary, the only special case to set it to
FALSE
is when a single value is to be transformed during prediction (e.g.
to transform a reference value). When this is done, binary variables are
still returned unchanged, but a single value from a continuous variable will
be transformed as desired by the fitted transformations. For model fit,
check_binary
should always be at its default value.
Sauerbrei, W., Meier-Hirmer, C., Benner, A. and Royston, P., 2006. Multivariable regression model building by using fractional polynomials: Description of SAS, STATA and R programs. Comput Stat Data Anal, 50(12): 3464-85.
z = 1:10
transform_vector_fp(z)
transform_vector_acd(z)
Run the code above in your browser using DataLab