prep()
aggregates a single dataset in a long format
according to any number of grouping variables. This makes prep()
suitable for aggregating data from various types of experimental designs
such as between-subjects, within-subjects (i.e., repeated measures), and
mixed designs (i.e., experimental designs that include both between- and
within- subjects independent variables). prep()
returns a data
frame with a number of dependent measures for further analysis for each
aggregated cell (i.e., experimental cell) according to the provided
grouping variables (i.e., independent variables). Dependent measures for
each experimental cell include among others means before and after
rejecting observations according to a flexible standard deviation
criteria, number of rejected observations according to the flexible
standard deviation criteria, proportions of rejected observations
according to the flexible standard deviation criteria, number of
observations before rejection, means after rejecting observations
according to procedures described in Van Selst & Jolicoeur (1994;
suitable when measuring reaction-times), standard deviations, medians,
means according to any percentile (e.g., 0.05, 0.25, 0.75, 0.95) and
harmonic means. The data frame prep()
returns can also be exported
as a txt or csv file to be used for statistical analysis in other
statistical programs.
prep( dataset = NULL , file_name = NULL , file_path = NULL , id = NULL , within_vars = c() , between_vars = c() , dvc = NULL , dvd = NULL , keep_trials = NULL , drop_vars = c() , keep_trials_dvc = NULL , keep_trials_dvd = NULL , id_properties = c() , sd_criterion = c(1, 1.5, 2) , percentiles = c(0.05, 0.25, 0.75, 0.95) , outlier_removal = NULL , keep_trials_outlier = NULL , decimal_places = 4 , notification = TRUE , dm = c() , save_results = TRUE , results_name = "results.txt" , results_path = NULL , save_summary = TRUE
)
file_merge()
. Either dataset
or file_name
must be
provided. Default is NULL
."my_data.txt"
) with the merged table in case
the user already merged the individual data files. Either dataset
or
file_name
must be provided. Default is NULL
.file_name
is located. If file_name
was used, then
file_path
must be provided. Default is NULL
.file_name
or in
dataset
that contains the variable specifying the case identifier
(i.e., the variable upon which the measurement took place; e.g.,
"subject_number"
). This should be a unique value per case. Values
in this column must be numeric. Argument must be provided. Default is
NULL
.file_name
or in dataset
that contain independent variables
manipulated (or observed) within-ids (i.e., within-subjects, repeated
measures). Single or multiple values must be specified as a string (e.g.,
c("SOA", "condition")
) according to the hierarchical order you
wish. Note that the order of the names in within_vars()
is
important because prep()
aggregates the data for the dependent
measures by first dividing them to the levels of the first grouping
variable in witin_vars()
, and then within each of those levels
prep()
divides the data according to the next variable in
within_vars()
and so forth. Values in these columns must be
numeric. Either within_vars
or between_vars
(or both)
arguments must be provided. Default is c()
.file_name
or in dataset
that contain independent variables
manipulated (or observed) between-ids (i.e., between-subjects). Single
or multiple values must be specified as a string (e.g., c("order")
).
Order of the names in between_vars()
does not matter. Values in
this column must be numeric. Either between_vars
or
within_vars
(or both) arguments must be provided. Default is
c()
.file_name
or in
dataset
that contains the dependent variable (e.g., "rt" for
reaction-time as a dependent variable). Values in this column must be in
an interval or ratio scale. Either dvc
or dvd
(or both)
arguments must be provided. Default is NULL
.file_name
or in
dataset
that contains the dependent variable (e.g., "ac"
for accuracy as a dependent variable). Values in this column must be
numeric and discrete (e.g., 0 and 1). Either dvc
or dvd
(or
both) arguments must be provided. Default is NULL
.file_name
or in dataset
according to logical conditions specified as a string. For example, if the
dataset contains practice trials for each subject, these trials should not
be included in the aggregation. The user should remove these trials by
specifying how they were coded in the raw data (i.e., data before
aggregation). For example, if practice trials are the ones for which
the "block" column in the raw data tables equals to zero, the
keep_trials
argument should be "raw_data$block !== 0"
.
raw_data
is the internal object in prep()
representing the
merged table. All logical conditions in keep_trials
should be put
in the same string and be concatenated by &
or |
. Logical
conditions for this argument can relate to different columns in the merged
table. Note that all further arguments of prep()
will relate to the
remaining observations in the merged table. Default is NULL
.file_name
or in dataset
. Single or multiple values must be specified as a
string (e.g., c("font_size")
). Order of the names in
drop_vars
does not matter. Note that all further arguments of
prep()
will relate to the remaining variables in the merged table.
Default is c()
.file_name
or in dataset
for calculations and aggregation of the dependent variable in dvc
according to logical conditions specified as a string. Logical conditions
should be specified as a string as in the keep_trials
argument
(e.g., "raw_data$rt > 100 & raw_data$rt < 3000 & raw_dada$ac == 1"
).
All dependent measures for dvc
except for those specified in
outlier_removal
will be calculated on the remaining observations.
Defalut is NULL
.file_name
or in dataset
for calculations and aggregation of the dependent variable in dvd
according to logical conditions specified as a string. Logical conditions
should be specified as a string as in the keep_trials
argument
(e.g., raw_data$rt > 100 & raw_data$rt < 3000
). All dependent
measures for dvd
(i.e., "mdvd"
and "merr"
) will be
calculated on the remaining observations. Default is NULL
.dataset
or in
file_name
that describe the ids (e.g., subjects) in the data and
were not manipulated within-or between-ids. For example, in case the user
logged for each observation and for each id in an experiment also the age
and the gender of the subject, this argument will be
c("age", "gender")
. Order of the names in id_properties
does
not matter. Single or multiple values must be specified as a string.
Values in these columns must be numeric. Default is c()
.prep()
will calculate the mean dvc
for
each cell in the finalized table after rejecting observations that did not
meet the criterion (e.g., rejecting observations that were more than 2
standard deviations above or below the mean of that cell). Values in this
vector must be numeric. Default is c(1, 1.5, 2)
.dvc
.
Values in this vector must be decimal numbers between 0 to 1. Percentiles
are calculated according to type = 7
(see
quantile
for more information). Default is
c(0.05, 0.25, 0.75, 0.95)
.dvc
according to procedures
described by Van Selst & Jolicoeur (1994). If 1
then non-recursive
procedure is calculated, if 2
then modified recursive procedure is
calculated, if 3
then hybrid recursive procedure is calculated.
Moving criterion is according to Table 4 in Van Selst & Jolicoeur (1994).
If experimental cell has 4 trials or less it will result in NA
.
Default is NULL
.file_name
or in
dataset
for calculations and aggregation of the outlier removal
procedures by Van Selst & Jolicoeur (1994). Logical conditions should be
specified as a string as in the keep_trials
argument (e.g.,
"raw_data$ac == 1"
). outlier_removal
procedure will be
calculated on the remaining observations. Defalut is NULL
.results_name
for each value of the dependent measures for
dvc
. Value must be numeric. Default is 4
.TRUE
, prints messages about the
progress of the function. Default is TRUE
.c()
) the function returns a data frame with
all possible dependent measures in prep()
. Values in this vector
must be strings from the following list: "mdvc", "sdvc", "meddvc", "tdvc",
"ntr", "ndvc", "ptr", "prt", "rminv", "mdvd", "merr". Default is
c()
. See Value section below for more details.TRUE
.prep
returns in case save_results
is TRUE
. Extension of the file
can be txt or csv and should be included. Default is "results.txt"
.results_name
will be saved. Default is the path provided in
file_path
. In case no path was provided in file_path
,
results_path
must be provided.TRUE
, creates a summary file in the
same format as results_name
. Default is TRUE
.dvc
and dvd
by id
and grouping variables.The first column in the finalized table is the id
column.
In case id_properties
was used, the next columns will be the
value of each id_properties
for each id
.If between_vars
was used then the next column{}s will be the value
of each beween_vars
for each id
.The next columns of the finalized table contain the dependent measures
according to the design specified. If within_vars
was used, then the
data for each dependent measure was first divided according to the levels
of the first grouping variable in witin_vars()
, and then within each
of those levels prep()
divided the data according to the next
variable in within_vars()
and so forth.
The dependent measures in the finalized table are:mdvc
: mean dvc
.sdvc
: SD for dvc
.meddvc
: median dvc
.tdvc
: mean dvc
after rejecting observations above
standard deviation criteria specified in sd_criterion
.ntr
: number of observations rejected for each standard deviation
criterion specified in sd_criterion
.ndvc
: number of observations before rejection.ptr
: proportion of observations rejected for each standard
deviation criterion specified in sd_criterion
.rminv
: harmonic mean of dvc
.prt
: dvc
according to each of the percentiles specified
in percentiles
.mdvd
: mean dvd
.merr
: mean error.nrmc
: mean dvc
according to non-recursive procedure with
moving criterion.nnrmc
: number of observations rejected for dvc
according
to non-recursive procedure with moving criterion.pnrmc
: percent of observations rejected for dvc
according
to non-recursive procedure with moving criterion.tnrmc
: total number of observations upon which the non-recursive
procedure with moving criterion was applied.mrmc
: mean dvc
according to modified-recursive procedure
with moving criterion.nmrmc
: number of observations rejected for dvc
according
to modified-recursive procedure with moving criterion.pmrmc
: percent of observations rejected for dvc
according
to modified-recursive procedure with moving criterion.tmrmc
: total number of observations upon which the
modified-recursive procedure with moving criterion was applied.hrmc
: mean dvc
according to hybrid-recursive procedure
with moving criterion.nhrmc
: number of observations rejected for dvc
according
to hybrid-recursive procedure with moving criterion.thrmc
: total number of observations upon which the
hybrid-recursive procedure with moving criterion was applied.
Van Selst, M., & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. The quarterly journal of experimental psychology, 47(3), 631-650.
data(stroopdata)
finalized_stroopdata <- prep(
dataset = stroopdata
, file_name = NULL
, file_path = NULL
, id = "subject"
, within_vars = c("block", "target_type")
, between_vars = c("order")
, dvc = "rt"
, dvd = "ac"
, keep_trials = NULL
, drop_vars = c()
, keep_trials_dvc = "raw_data$rt > 100 & raw_data$rt < 3000 & raw_data$ac == 1"
, keep_trials_dvd = "raw_data$rt > 100 & raw_data$rt < 3000"
, id_properties = c()
, sd_criterion = c(1, 1.5, 2)
, percentiles = c(0.05, 0.25, 0.75, 0.95)
, outlier_removal = 2
, keep_trials_outlier = "raw_data$ac == 1"
, decimal_places = 0
, notification = TRUE
, dm = c()
, save_results = FALSE
, results_name = "results.txt"
, results_path = NULL
, save_summary = FALSE
)
Run the code above in your browser using DataLab