mhChain: Execution of a Single Chain in Metropolis-Hastings for Cancer Risk Estimation

Description

Performs a single chain execution in the Metropolis-Hastings algorithm for Bayesian inference, specifically tailored for cancer risk estimation. This function can handle both sex-specific and non-sex-specific scenarios.

Usage

mhChain(
  seed,
  n_iter,
  burn_in,
  chain_id,
  ncores,
  data,
  twins,
  max_age,
  baseline_data,
  prior_distributions,
  prev,
  median_max,
  BaselineNC,
  var,
  age_imputation,
  imp_interval,
  remove_proband,
  sex_specific
)

Value

A list containing samples, log likelihoods, log-acceptance ratio, and rejection rate for each iteration.

Arguments

seed: Integer, the seed for the random number generator to ensure reproducibility.
n_iter: Integer, the number of iterations to perform in the Metropolis-Hastings algorithm.
burn_in: Integer, the number of initial iterations to discard (burn-in period).
chain_id: Integer, the identifier for the chain being executed.
ncores: Integer, the number of cores to use for parallel computation.
data: Data frame, containing family and genetic information used in the analysis.
twins: Information on monozygous twins or triplets in the pedigrees.
max_age: Integer, the maximum age considered in the analysis.
baseline_data: Numeric matrix or vector, containing baseline risk estimates for different ages and sexes.
prior_distributions: List, containing prior distributions for the parameters being estimated.
prev: Numeric, the prevalence of the risk allele in the population.
median_max: Logical, indicates if the maximum median age should be used for the Weibull distribution.
BaselineNC: Logical, indicates if non-carrier penetrance should be based on SEER data.
var: Numeric, the variance for the proposal distribution in the Metropolis-Hastings algorithm.
age_imputation: Logical, indicates if age imputation should be performed.
imp_interval: Integer, the interval at which age imputation should be performed when age_imputation = TRUE.
remove_proband: Logical, indicates if the proband should be removed from the analysis.
sex_specific: Logical, indicates if the analysis should differentiate by sex.

Examples

Run this code

# Create sample data in PanelPRO format
data <- data.frame(
  ID = 1:10,
  PedigreeID = rep(1, 10),
  Sex = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1),  # 0=female, 1=male
  MotherID = c(NA, NA, 1, 1, 3, 3, 5, 5, 7, 7),
  FatherID = c(NA, NA, 2, 2, 4, 4, 6, 6, 8, 8),
  isProband = c(1, rep(0, 9)),
  CurAge = c(45, 35, 55, 40, 50, 45, 60, 38, 52, 42),
  isAff = c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0),
  Age = c(40, NA, 50, NA, 45, NA, 55, NA, 48, NA),
  Geno = c(1, NA, 1, 0, 1, 0, NA, NA, 1, NA)
)

# Transform data into required format
data <- transformDF(data)

# Set parameters for the chain
seed <- 123
n_iter <- 10
burn_in <- 0.1  # 10% burn-in
chain_id <- 1
ncores <- 1
max_age <- 100

# Create baseline data (simplified example)
baseline_data <- matrix(
  c(rep(0.005, max_age), rep(0.008, max_age)),  # Increased baseline risks
  ncol = 2,
  dimnames = list(NULL, c("Male", "Female"))
)

# Set prior distributions with carefully chosen bounds
prior_distributions <- list(
  prior_params = list(
    asymptote = list(g1 = 2, g2 = 3),  # Mode around 0.4
    threshold = list(min = 20, max = 30),  # Narrower range for threshold
    median = list(m1 = 3, m2 = 2),  # Mode around 0.6
    first_quartile = list(q1 = 2, q2 = 3)  # Mode around 0.4
  )
)

# Create variance vector for all 8 parameters in sex-specific case
# Using very small variances for initial stability
var <- c(0.005, 0.005,  # asymptotes (smaller variance since between 0-1)
         1, 1,          # thresholds
         1, 1,          # medians
         1, 1)          # first quartiles

# Run the chain
results <- mhChain(
  seed = seed,
  n_iter = n_iter,
  burn_in = burn_in,
  chain_id = chain_id,
  ncores = ncores,
  data = data,
  twins = NULL,
  max_age = max_age,
  baseline_data = baseline_data,
  prior_distributions = prior_distributions,
  prev = 0.05,  # Increased prevalence
  median_max = FALSE,  # Changed to FALSE for simpler median constraints
  BaselineNC = TRUE,
  var = var,
  age_imputation = FALSE,
  imp_interval = 10,
  remove_proband = TRUE,
  sex_specific = TRUE
)

Run the code above in your browser using DataLab