Learn R Programming

bootSVD (version 1.1)

simEEG: Simulation functional EEG data

Description

Our data from (Fisher et al. 2014) consists of EEG measurements from the Sleep Heart Health Study (SHHS) (Quan et al. 1997). Since we cannot publish the EEG recordings from the individuals in the SHHS, we instead include the summary statistics of the PCs from our subsample of the processed SHHS EEG data. This data is used by the simEEG to simulate functional data that is approximately similar to the data used in our work. The resulting simulated vectors are always of length 900, and are generated from 5 basis vectors (see EEG_leadingV).

Usage

simEEG(n = 100, centered = TRUE, propVarNoise = 0.45, wide = TRUE)

Arguments

n

the desired sample size

centered

if TRUE, the sample will be centered to have mean zero for each dimension. If FALSE, measurements will be simulated from a population where the mean is equal to that observed in the sample used in (Fisher et al. 2014) (see EEG_mu).

propVarNoise

the approximate proportion of total sample variance attributable to random noise.

wide

if TRUE, the resulting data is outputted as a n by 900 matrix, with each row corresponding to a different subject. If FALSE, the resulting data is outputted as a 900 by n matrix, with each column corresponding to a different subject.

Value

A matrix containing n simulated measurement vectors of Normalized Delta Power, for the first 7.5 hours of sleep. These vectors are generated according to the equation:

\(y = \sum_{j=1}^{5} B_j * s_j + e\)

Where \(y\) is the simulated measurement for a subject, \(B_j\) is the \(j^{th}\) basis vector, \(s_j\) is a random normal variable with mean zero, and e is a vector of random normal noise. The specific values for \(B_j\) and \(var(s_j)\) are determined from the EEG data sample studied in (Fisher et al., 2014), and are respectively equal to the \(j^{th}\) empirical principal component vector (see EEG_leadingV), and the empirical variance of the \(j^{th}\) score variable (see EEG_score_var).

References

Aaron Fisher, Brian Caffo, and Vadim Zipunnikov. Fast, Exact Bootstrap Principal Component Analysis for p>1 million. 2014. http://arxiv.org/abs/1405.0922

Stuart F Quan, Barbara V Howard, Conrad Iber, James P Kiley, F Javier Nieto, George T O'Connor, David M Rapoport, Susan Redline, John Robbins, JM Samet, et al. The sleep heart health study: design, rationale, and methods. Sleep, 20(12):1077-1085, 1997. 1.1

Examples

Run this code
# NOT RUN {
set.seed(0)

#Low noise example, for an illustration of smoother functions
Y<-simEEG(n=20,centered=FALSE,propVarNoise=.02,wide=FALSE)
matplot(Y,type='l',lty=1)

#Higher noise example, for PCA
Y<-simEEG(n=100,centered=TRUE,propVarNoise=.5,wide=TRUE)
svdY<-fastSVD(Y)
V<-svdY$v #since Y is wide, the PCs are the right singular vectors (svd(Y)$v). 
d<-svdY$d
head(cumsum(d^2)/sum(d^2),5) #first 5 PCs explain about half the variation

# Compare fitted PCs to true, generating basis vectors
# Since PCs have arbitrary sign, we match the sign of 
# the fitted sample PCs to the population PCs first
V_sign_adj<- array(NA,dim=dim(V))
for(i in 1:5){
	V_sign_adj[,i]<-V[,i] * sign(crossprod(V[,i],EEG_leadingV[,i]))
}
par(mfrow=c(1,2))
matplot(V_sign_adj[,1:5],type='l',lty=1,
		main='PCs from simulated data,\n sign adjusted')
matplot(EEG_leadingV,type='l',lty=1,main='Population PCs')
# }

Run the code above in your browser using DataLab