This function finds an interval in the sequence where their underlying distribution differs from the rest of the sequence when data has repeated observations. It provides four graph-based test statistics.
gseg2_discrete(n, E, id, statistics=c("all","o","w","g","m"), l0=0.05*n, l1=0.95*n,
pval.appr=TRUE, skew.corr=TRUE, pval.perm=FALSE, B=100)
The number of observations in the sequence.
The edge matrix (a "number of edges" by 2 matrix) for the similarity graph. Each row contains the node indices of an edge.
The index of observations (order of observations).
The scan statistic to be computed. A character indicating the type of of scan statistic desired. The default is "all"
.
"all"
: specifies to compute all of the scan statistics: original, weighted, generalized, and max-type;
"o", "ori"
or "original"
: specifies the original edge-count scan statistic;
"w"
or "weighted"
: specifies the weighted edge-count scan statistic;
"g"
or "generalized"
: specifies the generalized edge-count scan statistic; and
"m"
or "max"
: specifies the max-type edge-count scan statistic.
The minimum length of the interval to be considered as a changed interval.
The maximum length of the interval to be considered as a changed interval.
If it is TRUE, the function outputs p-value approximation based on asymptotic properties.
This argument is useful only when pval.appr=TRUE. If skew.corr is TRUE, the p-value approximation would incorporate skewness correction.
If it is TRUE, the function outputs p-value from doing B permutations, where B is another argument that you can specify. Doing permutation could be time consuming, so use this argument with caution as it may take a long time to finish the permutation.
This argument is useful only when pval.perm=TRUE. The default value for B is 100.
Returns a list scanZ
with tauhat
, Zmax
, and a vector of the scan statistics for each type of scan statistic specified. See below for more details.
An estimate of the two ends of the changed interval for averaging approach.
An estimate of the two ends of the changed interval for union approach.
The test statistic (maximum of the scan statistics) for averaging approach.
The test statistic (maximum of the scan statistics) for union approach.
A matrix of the original scan statistics (standardized counts) for averaging approach if statistic specified is "all" or "o".
A matrix of the original scan statistics (standardized counts) for union approach if statistic specified is "all" or "o".
A matrix of the weighted scan statistics (standardized counts) for averaging approach if statistic specified is "all" or "w".
A matrix of the weighted scan statistics (standardized counts) for union approach if statistic specified is "all" or "w".
A matrix of the generalized scan statistics (standardized counts) for averaging approach if statistic specified is "all" or "g".
A matrix of the generalized scan statistics (standardized counts) for union approach if statistic specified is "all" or "g".
A matrix of the max-type scan statistics (standardized counts) for averaging approach if statistic specified is "all" or "m".
A matrix of the max-type scan statistics (standardized counts) for union approach if statistic specified is "all" or "m".
A matrix of raw counts of the original scan statistic for averaging approach. This output only exists if the statistic specified is "all" or "o".
A matrix of raw counts of the original scan statistic for union approach. This output only exists if the statistic specified is "all" or "o".
A matrix of raw counts of the weighted scan statistic for averaging approach. This output only exists if statistic specified is "all" or "w".
A matrix of raw counts of the weighted scan statistic for union approach. This output only exists if statistic specified is "all" or "w".
The approximated p-value based on asymptotic theory for each type of statistic specified.
This output exists only when the argument pval.perm is TRUE . It is the permutation p-value from B permutations and appears for each type of statistic specified (same for perm.curve, perm.maxZs, and perm.Z).
A B by 2 matrix with the first column being critical values corresponding to the p-values in the second column.
A sorted vector recording the test statistics in the B permutaitons.
A B by n-squared matrix with each row being the vectorized scan statistics from each permutaiton run.
# NOT RUN {
d = 50
mu = 2
tau = 100
n = 200
set.seed(500)
y1_temp = matrix(rnorm(d*tau),tau)
sam1 = sample(1:tau, replace = TRUE)
y1 = y1_temp[sam1,]
y2_temp = matrix(rnorm(d*(n-tau),mu/sqrt(d)), n-tau)
sam2 = sample(1:tau, replace = TRUE)
y2 = y2_temp[sam2,]
y = rbind(y1, y2)
# This data y has repeated observations
y_uni = unique(y)
E = nnl(dist(y_uni), 1)
cha = do.call(paste, as.data.frame(y))
id = match(cha, unique(cha))
r1 = gseg2_discrete(n, E, id, statistics="all")
# }
Run the code above in your browser using DataLab