Learn R Programming

OutlierDC (version 0.3-0)

odc: Outlier detection using quantile regression for censored data

Description

outlier detection algorithms using quantile regression for censored data

Usage

odc(formula, data, 
          method = c("score", "boxplot","residual"), 
          rq.model = c("Wang", "PengHuang", "Portnoy"), 
          k_r = 1.5, k_b =1.5, h = .05)

Arguments

formula
a type of Formula object with a survival object on the left-hand side of the ~ operator and covariate terms on the right-hand side. The survival object with survival time and its censoring status is constructed by the
data
a data frame with variables used in the formula. It needs at least three variables, including survival time, censoring status, and covariates.
method
the outlier detection method to be used. The options "socre", "boxplot", and "residual" conduct the scoring, boxplot, and residual-based algorithm, respectively. The default algorithm is "score".
rq.model
the type of censored quantile regression to be used for fitting. The options "Wang", "Portnoy", and "PengHuang" conduct Wang and Wang's, Portnoy's, and Peng and Huang's censored quantile regression approaches, respectively. The d
k_r
a value to control the tightness of cut-offs for the residual algorithm with a default value of 1.5.
k_b
a value to control the tightness of cut-offs for the boxplot algorithm with a default value of 1.5.
h
bandwidth for locally weighted censored quantile regression with a default value of 0.05.

Value

  • an object of the S4 class "OutlierDC" with the following slots: call: evaluated function call formula: formula to be used raw.data: data to be used for model fitting refined.data: the data set after removing outliers refined.data: the data set containing outliers coefficients: the estimated censored quantile regression coefficient matrix consisting of 10th, 25th, 50th, 75th, and 90th quantiles fitted.mat: the censored quantile regression fitted value matrix consisting of 10th, 25th, 50th, 75th, and 90th quantiles score: outlying scores (scoring algorithm) or residuals (residual-based algorithm) cutoff: estimated scale parameter for the residual-based algorithm lower: lower fence vector used for the boxplot and scoring algorithms upper: upper fence vector used for the boxplot and scoring algorithms outliers: logical vector to determine which observations are outliers n.outliers: number of outliers detected method: outlier detection method to be used rq.model: censored quantile regression to be used k_r: a value to be used for the tightness of cut-offs in the residual algorithm k_b: a value to be used for the tightness of cut-offs in the boxplot algorithm k_s: a value to be used for the tightness of upper fence cut-offs used for the scoring algorithm with the update function

source

Eo S-H, Hong S-M Hong, Cho H (2014). Identification of outlying observations with quantile regression for censored data, Submitted. Wang HJ, Wang L (2009) Locally Weighted Censored Quantile Regression. JASA 104:1117--1128. doi: 10.1198/jasa.2009.tm08230

Details

The odc function conducts three outlier detection algorithms on the basis of censored quantile regression. Three outlier detection algorithms were implemented: residual-based, boxplot, and scoring algorithms. The residual-based algorithm detects outlying observations using constant scale estimates; however, it does not account for the heterogeneity of variability. When the data is extremely heterogeneous, the boxplot algorithm with censored quantile regression is more effective. The residual-based and boxplot algorithms produce cut-offs to determine whether observations are outliers. In contrast, the scoring algorithm provides the outlying magnitude or deviation of each point from the distribution of observations. Outlier detection is achieved by visualising the scores.

See Also

OutlierDC-package coef, plot, show, update

Examples

Run this code
require(OutlierDC)
    # Toy example 
    data(ebd)
    # The data consists of 402 observations with 6 variables. 
    dim(ebd)
    # To show the first six observations of the dataset,
    head(ebd)
    
    #scoring algorithm
    fit <- odc(Surv(log(time), status) ~ meta, data = ebd)
    fit
    coef(fit)
    plot(fit)

    # Add upper bound for the selection of outleirs
    fit1 <- update(fit, k_s = 4)
    fit1
    plot(fit1)

    # residual-based algorithm
    fit2 <- odc(Surv(log(time), status) ~ meta, data = ebd, method = "residual", k_r = 1.5)
    fit2
    plot(fit2)
    
    # To display all of outlying observations in the fitted object
    fit2@outlier.data
    
    # boxplot algorithm
    fit3 <- odc(Surv(log(time), status) ~ meta, data = ebd, method = "boxplot", k_b = 1.5)
    fit3
    plot(fit3, ylab = "log survival times", xlab = "metastasis lymph nodes")

Run the code above in your browser using DataLab