This function estimates the standard deviation sigma of the noise of the model where the data are generated from a signal of rank k corrupted by homoscedastic Gaussian noise. Two estimators are implemented. The first one, named LN, is asymptotically unbiased for sigma in the asymptotic framework where both the number of rows and the number of columns are fixed while the noise variance tends to zero (Low Noise). It is calculated by computing the residuals sum of squares (using the truncated SVD at order k as an estimator) divided by the number of data minus the number of estimated parameters. Thus, it requires as an input the rank k. The second one, MAD (mean absolute deviation) is a robust estimator defined as the ratio of the median of the singular values of X over the square root of the median of the Marcenko-Pastur distribution. It can be useful when the signal can be considered of low-rank (the rank is very small in comparison to the matrix size).
estim_sigma(X, k = NA, method = c("LN", "MAD"), center = "TRUE")
a data frame or a matrix with numeric entries
integer specifying the rank of the signal only if method = "LN". By default k is estimated using the estim_ncp function of the FactoMineR package
LN for the low noise asymptotic estimate (it requires to specify the rank k) or MAD for mean absolute deviation
boolean, to center the data. By default "TRUE".
sigma the estimated value
In the low noise (LN) asymptotic framework, the estimator requires providing the rank k. Different methods are available in the litterature and if by default the user does not provide any value, we use of the function estim_ncp of the FactoMineR package with the option GCV (see ?estim_ncp).
Josse, J & Husson, F. (2012). Selecting the number of components in principal component analysis using cross-validation approximations. Computational Statistics & Data Analysis, 6 (56).
Gavish, M & Donoho, D. L. Optimal Shrinkage of Singular Values.
Gavish, M & Donoho, D. L. (2014). The Optimal Hard Threshold for Singular Values is 4/sqrt(3). IEEE Transactions on Information Theory, 60 (8), 5040-5053.
Josse, J. & Husson, F. (2011). Selecting the number of components in PCA using cross-validation approximations.Computational Statististics and Data Analysis. 56 (6), pp. 1869-1879.
# NOT RUN {
Xsim <- LRsim(100, 30, 2, 4)
res.sig <- estim_sigma(Xsim$X, k = 2)
# }
Run the code above in your browser using DataLab