- data
data.frame of the dataset to be used.
- genes
data.frame of the variables inside the genetic score G (can be any sort of variable, doesn't even have to be genetic).
- env
data.frame of the variables inside the environmental score E (can be any sort of variable, doesn't even have to be environmental).
- formula
Model formula. Use E for the environmental score and G for the genetic score. Do not manually code interactions, write them in the formula instead (ex: G*E*z or G:E:z).
- cv_iter
Number of cross-validation iterations (Default = 5).
- cv_folds
Number of cross-validation folds (Default = 10). Using cv_folds=NROW(data)
will lead to leave-one-out cross-validation.
- folds
Optional list of vectors containing the fold number for each observation. Bypass cv_iter and cv_folds. Setting your own folds could be important for certain data types like time series or longitudinal data.
- Huber_p
Parameter controlling the Huber cross-validation error (Default = 1.345).
- classification
Set to TRUE if you are doing classification (binary outcome).
- start_genes
Optional starting points for genetic score (must be the same length as the number of columns of genes
).
- start_env
Optional starting points for environmental score (must be the same length as the number of columns of env
).
- eps
Threshold for convergence (.01 for quick batch simulations, .0001 for accurate results).
- maxiter
Maximum number of iterations.
- family
Outcome distribution and link function (Default = gaussian).
- ylim
Optional vector containing the known min and max of the outcome variable. Even if your outcome is known to be in [a,b], if you assume a Gaussian distribution, predict() could return values outside this range. This parameter ensures that this never happens. This is not necessary with a distribution that already assumes the proper range (ex: [0,1] with binomial distribution).
- seed
Seed for cross-validation folds.
- id
Optional id of observations, can be a vector or data.frame (only used when returning list of possible outliers).
- crossover
If not NULL, estimates the crossover point of E using the provided value as starting point (To test for diathesis-stress vs differential susceptibility).
- crossover_fixed
If TRUE, instead of estimating the crossover point of E, we force/fix it to the value of "crossover". (Used when creating a diathes-stress model) (Default = FALSE).
- lme4
If TRUE, uses lme4::lmer or lme4::glmer; Note that is an experimental feature, bugs may arise and certain functions may fail. Currently only summary(), plot(), GxE_interaction_test(), LEGIT(), LEGIT_cv() work. Also note that the AIC and certain elements ignore the existence of the genes and environment variables, thus the AIC may not be used for variable selection of the genes and the environment. However, the AIC can still be used to compare models with the same genes and environments. (Default=FALSE).
- test_only
If TRUE, only uses the first fold for training and predict the others folds; do not train on the other folds. So instead of cross-validation, this gives you train/test and you get the test R-squared as output.