prepareCGPairedDifferenceData: Prepare data object from a data frame for Paired Samples evaluations

Description

The function prepareCGPairedDifferenceData reads in a data frame and settings in order to create a cgPairedDifferenceData object. The created object is designed to have exploratory and fit methods applied to it.

Usage

prepareCGPairedDifferenceData(dfr, format = "listed", analysisname = "",
 endptname = "", endptunits = "", logscale = TRUE, zeroscore = NULL,
 addconstant = NULL, digits = NULL, expunitname= "",
 refgrp = NULL, stamps = FALSE)

Value

A cgPairedDifferenceData object is returned, with the following slots:

dfr

The original input data frame that is the specified value of the dfr argument in the function call.

dfru

Processed version of the input data frame, which will be used for the various evaluation methods.

dfr.gcfmt

A groupcolumns version of the input data frame with an additional column of the differences between groups, where the regfrp column of values is the subtrahend (second term) in the subtraction.

settings

A list of properties associated with the data frame:

analysisname: Drawn from the input argument value of analysisname.

endptname

Drawn from the input argument value of endptname, and set to "Endpoint" if input was left at the default "".

endptunits

Drawn from the input argument value of endptunits.

endptscale

Has the value of "log" if logscale=TRUE and "original" if logscale=FALSE.

zeroscore

Has the value of NULL if the input argument was NULL. Otherwise has the derived (from zeroscore="estimate") or specified numeric value.

addconstant

Has the value of NULL if the input argument was NULL. Otherwise has the specified or derived numeric value.

digits

Has the value of the input argument digits or is set to the determined value of digits from the input data. Will be an integer of 0, 1, 2, 3, or 4.

grpnames

Of length 2 and determined from the single factor identified of the group names. The order is determined by the first occurence in the input data frame header in dfr and the refgrp specification.

expunitname

Drawn from the input argument value of expunitname and processing of the data frame.

refgrp

Drawn from the input argument of refgrp.

stamps

Drawn from the input argument of stamps.

Arguments

dfr

A valid data frame, see the format argument.

format

Default value of "listed". Either "listed" or "groupcolumns" must be used. Abbreviations of "l" or "g", respectively, or otherwise sufficient matching values can be used:

"listed": At least two columns, with the two and only two levels of a factor to represent the samples. These factor levels would need to be in the first column and response values in the second column. If there are three columns, then an experimental unit identifier need to be defined in the first column instead, with the second column having the two level factor, and the third column having the response values. See the Details Input Data Frame section below.

"groupcolumns"

At least two columns and no more than three are permitted. In the two columns case, each column must uniquely represent one of the two samples, implying a factor with two and only two levels. The levels of the factor make up the column headers. The values in the data frame are for the response. Each row assumes the pairing of the observation within an experimental unit, such as the same subject. If there are three columns, then an experimental unit identifier need to be defined in the first column instead, with the second and third column having the response values and headers to represent the two factor levels. See the Details Input Data Frame section below.

analysisname

Optional, a character text or math-valid expression that will be set for default use in graph title and table methods. The default value is the empty "".

endptname

Optional, a character text or math-valid expression that will be set for default use as the y-axis label of graph methods, and also used for table methods. The default value is the empty "".

endptunits

Optional, a character text or math-valid expression that can be used in combination with the endptname argument. Parentheses are automatically added to this input, which will be added to the end of the endptname character value or expression. The default value is the empty "".

logscale

Apply a log-transformation to the data for evaluations. The default value is TRUE.

zeroscore

Optional, replace response values of zero with a derived or specified numeric value, as an approach to overcome the presence of zeroes when evaluation in the logarithmic scale (logscale=TRUE) is specified. The default value is NULL. To derive a score value to replace zero, "estimate" can be specified, see Details below on the algorithm used.

addconstant

Optional, add a numeric constant to all response values, as an approach to overcome the presence of zeroes when evaluation in the logarithmic scale logscale=TRUE is desired. The default value is NULL. A positive numeric value can be specified to be added, or a "simple" algorthm specified to estimate a value to add. See Details secion below on the algorithm used.

digits

Optional, for output display purposes in graphs and table methods, values will be rounded to this numeric value. Only the integers of 0, 1, 2, 3, and 4 are accepted. No rounding is done during any calculations. The default value is NULL, which will examine each individual data value and choose the one that has the maximum number of digits after any trailing zeroes are ignored. The max number of digits will be 4.

expunitname

Optional, a character text that will be set for default use as the experimental unit label of graph methods, and also used for table methods. The default value is the empty "".

refgrp

Optional, specify one of the factor levels to be the “reference group”, such as a “control” group. The default value is NULL, which will just use the first level determined in the data frame.

stamps

Optional, specify a time stamp in graphs, along with cg package version identification. The default value is FALSE.

Author

Bill Pikounis [aut, cre, cph], John Oleynick [aut], Eva Ye [ctb]

Details

Input Data Frame

The input data frame dfr can be of the format "listed" or "groupcolumns".

If format="listed" for dfr is specified, then there must be three columns for an input data frame. The first column needs to be the experimental unit identifier, the second column needs to be the group identifier, and the third is the endpoint. The first column of the listed input data format, needs to have two sets of distinct values since it is the experimental unit identifier of response pairs. The second column of the listed input data format needs to have exactly 2 distinct values since it is the group identifier.

If format="groupcolumns" for dfr is specified, then there can be two columns or three columns.

two columns: The column headers specify the two paired group names. Each row contains the experimental unit of paired numeric values under those two groups. In the course of creating the cgPairedDifferenceData object, another column will be binded from the left and become the first column, with the column header of expunitname is specified, and "expunit" if the default expunitname="" is specified. A sequence of integers starting with 1 up to the number of pairs/rows will be generated to uniquely identify each experimental unit pair.
three columns: The first column needs to be unique experimental unit identifiers of the paired numeric values in the second and third columns. The second and third column headers will be used to identify the two paired group names. Each row's second and third column needs to contain the experimental unit of paired numeric values under those two groups. The name of the first column will be assigned to the expunitname setting if expunitname is not explicity specified to something else instead of its default expunitname="".

As the evaluation data set is prepared for cgPairedDifferenceData object, any experimental unit pairs/rows with missing values in the endpoint are flagged. This includes a check to make sure that each experimental unit identified has a complete pair of numeric observations.

zeroscore

If zeroscore="estimate" is specified, a number close to zero is derived to replace all zeroes for subsequent log-scale analyses. A spline fit (using spline and method="natural") of the log of the response vector on the original response vector is performed. The zeroscore is then derived from the log-scale value of the spline curve at the original scale value of zero. This approach comes from the concept of arithmetic-logarithmic scaling discussed in Tukey, Ciminera, and Heyse (1985).

addconstant

If addconstant="simple" is specified, a number is derived and added to all response values. The approach taken is from the "white" book on S (Chambers and Hastie, 1992), page 68. The range (max - min) of the response values is

multiplied by 0.0001 to derive the number to add to all the response values.

References

Tukey, J.W., Ciminera, J.L., and Heyse, J.F. (1985). "Testing the Statistical Certainty of a Response to Increasing Doses of a Drug," Biometrics, Volume 41, 295-301.

Chambers, J.M, and Hastie, T.R. (1992), Statistical Modeling in S. Chapman&Hall/CRC.

Examples

Run this code

data(anorexiaFT)
anorexiaFT.data <- prepareCGPairedDifferenceData(anorexiaFT, format="groupcolumns",
                                                 analysisname="Anorexia FT",
                                                 endptname="Weight",
                                                 endptunits="lbs",
                                                 expunitname="Patient",
                                                 digits=1, logscale=TRUE)

Run the code above in your browser using DataLab