Learn R Programming

MasterBayes (version 2.58)

varPed: Transforms Variables for a Multinomial Log-Linear Model

Description

Creates offspring specific design matrices the columns of which refer to the explanatory variables of the liner model.

Usage

varPed(x, gender=NULL, lag=c(0,0), relational=FALSE, 
  lag_relational=c(0,0), restrict=NULL, keep=FALSE, 
  USvar=NULL, merge=FALSE, NAvar=NULL)

Arguments

x

predictor variable; numeric or factor

gender

the gender of the parent to which x applies

lag

numeric vector of length 2. The time interval over which x is evaluated relative to a record of the offspring.

relational

a character string. If "OFFSPRING", the Euclidean distance between x in the parents and x in the offspring is calculated. If "MATE", the Euclidean distance between x in the two parental sexes is calculated. Specifying "OFFSPRINGV" and "MATEV" is similar, although the signed vector is calculated rather than the Euclidean distance. The signed vector is calculated by substracting offspring phenotype from parental phenotype in the case of "OFFSPRINGV", and by substacting the phenotype of the sex NOT specified in gender from the phenotype of the sex specified in gender, in the case of "MATEV". If x is a factor then both the Euclidean distance and the signed vector are 1 if the factor levels for offspring and parent (or the two parental sexes) match, and zero otherwise. If FALSE, x is untransformed.

lag_relational

numeric vector of length 2. If relational is not FALSE then the time interval over which x is evaluated in the relational category relative to the offspring record.

restrict

character string designating parents with a zero prior probability of parentage. Only parents for which x matches restrict have non-zero probabilities of parentage. When relational="OFFSPRING" is specified, then restrict can take on the inequalities "==", "!=", ">", ">=", "<" and "<=". Parents for which the inequalities are satisfied have non-zero probabilities of parentage, with the parental value of x on the left hand side of the inequality and the offspring value on the right hand side. If a number appears on the right hand side of the inequality (e.g. "<=10") then the distance between parent and offspring appears on the left-hand side of the inequality. Restrict is not implemented when relational="MATE"

keep

logical; if TRUE then the design matrices for parents excluded using the argument restrict are retained in the estimation of beta

USvar

if NULL, the phenotypes of unsampled parents are assumed to be drawn from the same statistaical population as the sampled parents. If x is a factor then USvar can be a level of that factor to which unsampled parents belong. If x is numeric then USvar can be the value for unsampled parents. Sampled individuals for which there are missing covariate data will also take on USvar if specified.

merge

logical; if TRUE then beta is the log odds ratio of an offspring's parent belonging to category \(A\) compared to category \(B\), where \(A\) and \(B\) are levels of x. If FALSE then beta is the log odds ratio of an individual belonging to category \(A\) being the parent of an offspring compared to an individual of category \(B\). When relational=="MATE", relational=="MATEV" or male and female variables are interacted keep must be FALSE.

NAvar

numeric; replacement for missing values in the predictors.

Value

list containing the design matrix for variable x, the identity of retained parents and the gender of the parents

Details

The design matrix for each offspring represents the state of each parental (dam/sire) combination for each explanatory variable. The number of rows in the design matrix (the number of parental combinations) is free to vary across offspring, but the number of explanatory variables remain the same. As with standard generalised linear modelling the columns of the design matrices take on numerical values or inidicator values for continuous and categorical variables, respectively. When relational=FALSE, elements of the design matrices refering to specific parental combinations will not vary across offspring (unless longitudinal data are being used) and the associated vector of parameters will relate the explanatory variables to overall fecundity. For these variables the model is essentially the multinomial analogue of the more familiar Poisson model often used to analyse such data. However, the counts of the multinomial are not known with certainty because uncertainty exists around the maternity and/or paternity of each offspring.

Additional variables can be fitted that relate specific parental combinations to specific offspring, or specific dams to specific sires. Elements of the design matrices refering to specific parental combinations are then free to vary across offspring. The most obvious variable of this type is the mendelian transition probability obtained from the genetic data themsleves. However, by specifying relational="OFFSPRING", relational="OFFSPRINGV", relational="MATE" or relational="MATEV", non-genetic variables are free to vary across offspring. When x is numeric the Euclidean distances between parents and offspring, or between mates enter into the design matrix, when relational="OFFSPRING" or relational="MATE" respectively. When relational="OFFSPRINGV" or relational="MATEV" are specified a signed vector is calculated rather than a distance. When x is a factor then an indicator variable is set up indicating whether parent and offspring, or mate, factor levels match. Often, each offspring will have a variable number of candidate parents as some parents may be excluded a priori. When x is a factor and both relational="OFFSPRING" and restrict="==", only those potential parents that have factor levels matching the offspring factor level are retained. When relational=FALSE, restrict can take on factor levels which exclude parents that have non-matching factor levels.

If a time variable (timevar) is not passed to PdataPed the data are assumed to be cross-sectional and each indivdiual only respresented once. If a time variable (timevar) is passed to PdataPed then lag and lag_relational can be set so that time specific covariates are used. lag designates time units relative to the offspring record when relational=FALSE; for example, if lag=c(0,0) the value of x is taken for that parent during the same time period as the offspring record. If relational="OFFSPRING" or relational="MATE" then lag determines the time units relative to the record of the offspring or mate to which the focal inidvidual is being compared. This record can be specified by using lag_relational, which is always relative to the offspring record. Negative lags refer to previous time intervals (e.g. lag=c(-1,-1) takes x from the previous time step), and if the elements of lag or lag_relational differ then the average value of x during this period is taken (e.g lag=c(-1,0) averages x in the record matching and preceding the offspring record). This is not applicable when x is a factor unless restrict takes one of the logical values (e.g."==") in which case parents are retained when the logical value is TRUE at least once in the specified interval.

Below are models that can be fitted using varPed, where x is a univariate continuous variable:

varPed(x, gender="Female") $$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}x_{i}...)$$

varPed(x, gender="Male") $$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}x_{j}...)$$

varPed(x) $$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}(x_{i}+x_{j})...)$$

varPed(x, gender="Female", relational="OFFSPRING") $$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}(|x_{i}-x_{o}|)...)$$

varPed(x, gender="Female", relational="OFFSPRINGV") $$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}(x_{i}-x_{o})...)$$

varPed(x, gender="Female", relational="MATE") $$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}(|x_{i}-x_{j}|)...)$$

varPed(x, gender="Female", relational="MATEV") $$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}(x_{i}-x_{j})...)$$

varPed(x, gender="Female", lag=c(-1,-1)) $$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}x_{i,t-1}...)$$

varPed(x, gender="Female", lag=c(-1,-1), relational="OFFSPRING") $$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}(|x_{i,t-1}-x_{o,t}|)...)$$

varPed(x, gender="Female", lag=c(-2,-2), relational="MATE",

lag_relational=c(-1,-1)) $$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}(|x_{i,t-2}-x_{j,t-1}|)...)$$

varPed(x, gender="Male", lag=c(-2,-2), relational="OFFSPRING",

lag_relational=c(-1,-1)) $$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}(|x_{j,t-2}-x_{o,t-1}|)...)$$

Where \(p^{(o)}_{i,j}\) is the probability that dam \(i\) and sire \(j\) are the parents of an offspring \(o\). \(x\) and \(\beta\) are the variable of interest and the associated parameter, and \(t\) is the time period to which the offspring record belongs.

For a categorical variable with two levels (A and B) the model specified by varPed(x, gender="Female") takes on the form

$$p^{(o)}_{i,j} \propto \textrm{exp}(\beta_{1}\delta_{i}...)$$

where \(\delta_{i}\) is an indicator variable taking the value 1 if \(x_{i}\) is equal to the first level of x and zero otherwise. \(\beta_{1}\) is then the log odds ratio of the two levels of x with respect to maternity. If merge=TRUE is specified then \(\beta_{1}\) may vary across offspring, and \(\beta_{m}\) is estimated. \(\beta_{m}\) is related to \(\beta_{1}\):

$$\beta_{m} = \textrm{logit}\left[\frac{\theta N_{A}}{\theta N_{A} + (1-\theta)N_{B}}\right]$$

where \(\theta\) is the inverse logit transformation of \(\beta_{1}\), and \(N_{A}\) and \(N_{B}\) are the number of potential mothers that have level A and B for x. If \(N_{A}\) and \(N_B\) are invariant over offspring the models are functionally equivalent.

The denominator of the multinomial likelihood is the summed linear predictors of all possible parents (after setting up a contrast with the baseline parents). Designating the first set of parents as baseline, the contrast for each set of parents is simply:

$$\eta^{(o)}_{i,j} = \textrm{log}\left[\frac{p^{(o)}_{i,j}}{p^{(o)}_{1,1}}\right]$$

and the likelihood of \(\beta\) is

$$Pr(x| \bm{\beta}) = \prod^{n_{o}}_{o}\left[\frac{\textrm{exp}(\eta^{(o)}_{d,s})}{\sum^{n^{(o)}_{i}}_{i=1}\sum^{n^{(o)}_{j}}_{j=1}\textrm{exp}(\eta^{(o)}_{i,j})}\right]$$

where \(n_{o}\), \(n^{(o)}_{i}\) and \(n^{(o)}_{j}\) are the number of offspring, the number of potential mothers for offspring \(o\), and the number of potential fathers for offspring \(o\), respectively. \(d\) and \(s\) are the actual parents of offspring \(o\). The set of possible parents in the denominator of the multinomial likelihood are those that are not excluded using the argument restrict. However, if the argument keep=TRUE is used then the denominator of the likelihood will include excluded parents depsite the fact that \(d\neq i\) and \(s\neq j\).

In version 2.31-2.42 DSapprox=TRUE can be passed to MCMCped which approximates the likelihood of \(\beta\) when a variable specifies the distance between mates (i.e relational="MATE"). This approximation reduces the computational burden by fixing \(i=d\) or \(j=s\) in the denominator of the multinomial likelihood. The parent defined as the "MATE" is fixed, so that a varPed expression with gender="Male" has the approximated likelihood:

$$Pr(x | \bm{\beta}) \approx \prod^{n_{o}}_{o}\left[\frac{\textrm{exp}(\eta^{(o)}_{d,s})}{\sum^{n^{(o)}_{j}}_{j=1}\textrm{exp}(\eta^{(o)}_{d,j})}\right]$$

For certain types of problem this approximation does not work well. In version 2.43 and after, another approximation is used which seems to work better:

$$Pr(x | \bm{\beta}) \approx \prod^{n_{o}}_{o}\left[\frac{\textrm{exp}(\eta^{(o)}_{d,s})}{\sum^{n^{(o)}_{i}}_{i=1}\textrm{exp}(\eta^{(o)}_{i,s})+\sum^{n^{(o)}_{j}}_{j=1}\textrm{exp}(\eta^{(o)}_{d,j})-\textrm{exp}(\eta^{(o)}_{d,s})}\right]$$

References

Hadfield J.D. et al (2006) Molecular Ecology 15 3715-31

See Also

MCMCped