Learn R Programming

fake: Flexible Data Simulation Using The Multivariate Normal Distribution

Description

This R package can be used to generate artificial data conditionally on pre-specified (simulated or user-defined) relationships between the variables and/or observations. Each observation is drawn from a multivariate Normal distribution where the mean vector and covariance matrix reflect the desired relationships. Outputs can be used to evaluate the performances of variable selection, graphical modelling, or clustering approaches by comparing the true and estimated structures.

Installation

The released version of the package can be installed from CRAN with:

install.packages("fake")

The development version can be installed from GitHub:

remotes::install_github("barbarabodinier/fake")

Main functions

Linear model

library(fake)

set.seed(1)
simul <- SimulateRegression(n = 100, pk = 20)
head(simul$xdata)
head(simul$ydata)

Logistic model

set.seed(1)
simul <- SimulateRegression(n = 100, pk = 20, family = "binomial")
head(simul$ydata)

Structural causal model

set.seed(1)
simul <- SimulateStructural(n = 100, pk = c(3, 2, 3))
head(simul$data)

Gaussian graphical model

set.seed(1)
simul <- SimulateGraphical(n = 100, pk = 20)
head(simul$data)

Gaussian mixture model

set.seed(1)
simul <- SimulateClustering(n = c(10, 10, 10), pk = 20)
head(simul$data)

Extraction and visualisation of the results

The true model structure is returned in the output of any of the main functions in:

simul$theta

The functions print(), summary() and plot() can be used on the outputs from the main functions.

Reference

  • Barbara Bodinier, Sarah Filippi, Therese Haugdahl Nost, Julien Chiquet and Marc Chadeau-Hyam. Automated calibration for stability selection in penalised regression and graphical models: a multi-OMICs network application exploring the molecular response to tobacco smoking. (2021) arXiv. link

Other resources

  • R scripts to reproduce the simulation study (Bodinier et al. 2021) conducted using the functions in fake link

  • R package sharp for stability selection and consensus clustering link

Copy Link

Version

Install

install.packages('fake')

Monthly Downloads

362

Version

1.4.0

License

GPL (>= 3)

Last Published

April 13th, 2023

Functions in fake (1.4.0)

ROC

Receiver Operating Characteristic (ROC)
SimulateComponents

Data simulation for sparse Principal Component Analysis
MinWithinProba

Within-group probabilities for communities
SimulateAdjacency

Simulation of undirected graph with block structure
TuneExplainedVarianceCor

Tuning function (correlation)
TuneCStatisticLogit

Tuning function (logistic regression)
SimulateGraphical

Data simulation for Gaussian Graphical Modelling
SimulatePrecision

Simulation of precision matrix
SimulateCorrelation

Simulation of a correlation matrix
plot.roc_curve

Receiver Operating Characteristic (ROC) curve
SimulateRegression

Data simulation for multivariate regression
SimulateStructural

Data simulation for Structural Causal Modelling
TuneExplainedVarianceCov

Tuning function (covariance)
SimulateSymmetricMatrix

Simulation of symmetric matrix with block structure
BlockMatrix

Block matrix
BlockDiagonal

Block diagonal matrix
HugeAdjacency

Simulation of undirected graph
ExpectedConcordance

Expected concordance statistic
BlockStructure

Block structure
Contrast

Matrix contrast
ExpectedCommunities

Expected community structure
Concordance

Concordance statistic
LayeredDAG

Layered Directed Acyclic Graph
Heatmap

Heatmap visualisation
MatchingArguments

Matching arguments
MakePositiveDefinite

Making positive definite matrix
SamplePredictors

Simulation of binary contribution status
SimulateClustering

Simulation of data with underlying clusters
MaxContrast

Maximising matrix contrast
Rates

True and False Positive Rates