Usage
dataProcess(raw,logTrans=2,
normalization="equalizeMedians",nameStandards=NULL,
betweenRunInterferenceScore=FALSE, address="",
fillIncompleteRows=TRUE,
featureSubset="all",
remove_proteins_with_interference=FALSE, n_top_feature=3,
summaryMethod="TMP",
equalFeatureVar=TRUE,
filterLogOfSum=TRUE,
censoredInt="NA",
cutoffCensored="minFeature",
MBimpute=TRUE,
original_scale=FALSE,
logsum=FALSE,
remove50missing=FALSE,
skylineReport=FALSE)
Arguments
raw
name of the raw (input) data set.
logTrans
logarithm transformation with base 2(default) or 10.
normalization
normalization to remove systematic bias between MS runs. There are three different normalizations supported. 'equalizeMedians'(default) represents constant normalization (equalizing the medians) based on reference signals is performed. 'quantile' represents quantile normalization based on reference signals is performed. 'globalStandards' represents normalization with global standards proteins. FALSE represents no normalization is performed.
nameStandards
vector of global standard peptide names. only for normalization with global standard peptides.
betweenRunInterferenceScore
interference is detected by a between-run-interference score. TRUE means the scores are generated automatically and stored in a .csv file. FALSE(default) means no scores are generated.
fillIncompleteRows
If the input dataset has incomplete rows, TRUE(default) adds the rows with intensity value=NA for missing peaks. FALSE reports error message with list of features which have incomplete rows.
featureSubset
"all"(default) uses all features that the data set has. "top3" uses top 3 features which have highest average of log2(intensity) across runs. "topN" uses top N features which has highest average of log2(intensity) across runs. It needs the input for n_top_feature option. "highQuality" selects the most informative features which agree the pattern of the average features across the runs.
remove_proteins_with_interference
TRUE allows the algorithm to remove the proteins if deem interfered. FALSE (default) does not allow to remove the proteins, in which all features are interfered. In this case, the proteins, which will completely loss all features by the algorithm, will keep the most abundant peptide.
n_top_feature
The number of top features for featureSubset='topN'. Default is 3, which means to use top 3 features.
summaryMethod
"TMP"(default) means Tukey's median polish, which is robust estimation method. "linear" uses linear mixed model. "logOfSum" conducts log2 (sum of intensities) per run.
equalFeatureVar
only for summaryMethod="linear". default is TRUE. Logical variable for whether the model should account for heterogeneous variation among intensities from different features. Default is TRUE, which assume equal variance among intensities from features. FALSE means that we cannot assume equal variance among intensities from features, then we will account for heterogeneous variation from different features.
filterLogOfSum
For summaryMethod="logOfSum" option, TRUE (default) will filter out the runs which have any missing value. FALSE will not remove any run or features.
censoredInt
Missing values are censored or at random. 'NA' (default) assumes that all 'NA's in 'Intensity' column are censored. '0' uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use '0'. Null assumes that all NA intensites are randomly missing.
cutoffCensored
Cutoff value for censoring. only with censoredInt='NA' or '0'. Default is 'minFeature', which uses minimum value for each feature.'minFeatureNRun' uses the smallest between minimum value of corresponding feature and minimum value of corresponding run. 'minRun' uses minumum value for each run.
MBimpute
only for summaryMethod="TMP" and censoredInt='NA' or '0'. TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. FALSE uses the values assigned by cutoffCensored.
original_scale
Default is FALSE, which uses log transformed intensity for TMP. TRUE uses original intensitiesy for TMP.
logsum
Default is FALSE, which uses log of sum intensities after residuals from TMP.
remove50missing
only for summaryMethod="TMP". TRUE removes the runs which have more than 50% missing values. FALSE is default.
skylineReport
default is FALSE. 'TRUE' means raw (input) data set from Skyline MSstats input format, which includes 'Truncated' column and can distinguish zero value and NA (missing values). Zero values in 'Intensity' column will be kept for 'skyline' summary method. Otherwise, they will be replaced with one in order to log transform.
address
the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output csv file is automatically created with the default name of "BetweenRunInterferenceFile.csv". The command address can help to specify where to store the file as well as how to modify the beginning of the file name.