Learn R Programming

Boruta (version 1.2)

Boruta: Important attribute search using Boruta algorithm

Description

Boruta is an algorithm of finding important attributes in information systems by iterative learning of the randomForest classifier.

Usage

## S3 method for class 'formula':
Boruta(formula,data=.GlobalEnv,...)
## S3 method for class 'default':
Boruta(x,y,confidence=0.999,maxRuns=100,light=TRUE,doTrace=0,...)
## S3 method for class 'Boruta':
print(x,...)

Arguments

x, formula
data frame of predictors or a formula describing model to be analysed.
data
data frame containing model variables. Global environment is default.
y
response vector. Must be a factor.
confidence
confidence level. Default value should be used. Lower value may reduce computation time of test runs.
maxRuns
maximal number of randomForest runs in the final round. You may increase it to resolve attributes left Tentative.
doTrace
0 means no tracing, 1 means printing a "." sign after each randomForest run, 2 means same as 1, plus consecutive reporting of test results.
light
if set to TRUE, Boruta runs in standard, light mode. If set to FALSE, Boruta runs in more restrictive, force mode.
...
additional parameters that will be passed to randomForest function.

Value

  • An object of class Boruta, which is a list with the following components:
  • finalDecisiona factor of three value: Confirmed, Rejected or Tentative, containing final result of feature selection.
  • decisionHistorya data frame containing evolution of decision register during Boruta run. Each row corresponds to a situation after one, consecutive test.
  • ZScoreHistorya data frame of ZScores of attributes gathered in each randomForest run. Beside predictors' ZScores contains maximal, mean and minimal ZScore of shadow attributes in each run. Rejected attributes have -Inf ZScore assumed.
  • timeTakentime taken by the computation.
  • callthe original call of the Boruta function.

Details

Boruta iteratively compares ZScores of attributes with ZScores of shadow attributes, created by shuffling original ones. Attributes that have significantly worst importance than shadow ones are being consecutively dropped. On the other hand, attributes that are significantly better than shadows are admitted to be Confirmed. If algorithm is run in default light mode, unimportant attributes are being dropped along with their random shadows, while in the force mode all shadow attributes are preserved during the whole Boruta run. Algorithm stops when only Confirmed attributes are left, or when it reaches maxRuns randomForest runs in the last round. If the second scenario occurs, some attributes may be left without a decision. They are claimed Tentative. You may try to extend maxRuns or lower confidence to clarify them, but in some cases their ZScores do fluctuate too much for Boruta to converge. Instead, you can use TentativeRoughFix function, which will perform other, weaker test to make a final decision, or simply treat them as undecided in further analysis.

Examples

Run this code
set.seed(777);
#Add some nonsense attributes to iris dataset by shuffling original attributes
iris.extended<-data.frame(iris,apply(iris[,-5],2,sample));
names(iris.extended)[6:9]<-paste("Nonsense",1:4,sep="");
#Run Boruta on this data
Boruta(Species~.,data=iris.extended,doTrace=2)->Boruta.iris.extended
#Nonsense attributes should be rejected
print(Boruta.iris.extended);

#Boruta on the Ozone data from mlbench 
library(mlbench); data(Ozone);
na.omit(Ozone)->ozo;
#Takes some time, so be patient
Boruta(V4~.,data=ozo,doTrace=2)->Bor.ozo;
cat('Random forest run on all attributes:\n');
print(randomForest(V4~.,data=ozo));
cat('Random forest run only on confirmed attributes:\n');
print(randomForest(getConfirmedFormula(Bor.ozo),data=ozo));

#Boruta on the HouseVotes84 data from mlbench 
library(mlbench); data(HouseVotes84);
na.omit(HouseVotes84)->hvo;
#Takes some time, so be patient
Boruta(Class~.,data=hvo,doTrace=2)->Bor.hvo;
print(Bor.hvo);
plot(Bor.hvo);

#Boruta on the Sonar data from mlbench 
library(mlbench); data(Sonar);
#Takes some time, so be patient
Boruta(Class~.,data=Sonar,doTrace=2)->Bor.son;
print(Bor.son);
#Shows important bands
plot(Bor.son,sort=FALSE);

Run the code above in your browser using DataLab