Learn R Programming

mdir (version 0.9.0)

generateSimulationDataset: Generate simulation dataset

Description

Generates a dataset based upon a mixture of $K$ Gaussian distributions with $P$ independent, relevant features and $P_n$ irrelevant features. Irrelevant features contain no signal for underlying structure and all measurements for an irrelevant feature are drawn from a common standard Gaussian distribution.

Usage

generateSimulationDataset(
  K,
  N,
  P,
  delta_mu = 1,
  cluster_sd = 1,
  pi = rep(1/K, K),
  P_n = 0
)

Value

A list of `data` (a data.frame of the generated data) and `cluster_IDs` (a vector of the cluster membership of each item).

Named list containing ``data``, a matrix of the generated Gaussian data and ``cluster_IDs``, the true generating structure.

Arguments

K

The number of components to sample from.

N

The number of samples to draw.

P

The number of relevant (i.e. signal-bearing) features.

delta_mu

The difference between the means defining each component within each feature (defaults to 1).

cluster_sd

The standerd deviation of the Gaussian distributions.

pi

The K-vector of the populations proportions across each component.

P_n

The number of irrelevant features (defaults to 0).

Examples

Run this code
K <- 4
N <- 100
P <- 4
generateSimulationDataset(K, N, P)

Run the code above in your browser using DataLab