Creates a valid simple dataset object.
new()
Method for initializing the object arguments during runtime.
Dataset$new(
filepath,
header = TRUE,
sep = ",",
skip = 0,
normalize.names = FALSE,
string.as.factor = FALSE,
ignore.columns = NULL
)
filepath
The name of the file which the data are to be read from.
Each row of the table appears as one line of the file. If it does not
contain an _absolute_ path, the file name is _relative_ to the current
working directory, 'getwd()
'.
header
A logical value indicating whether the file contains
the names of the variables as its first line. If missing, the value is
determined from the file format: 'header
' is set to 'TRUE'
if and only if the first row contains one fewer field than the number of
columns.
sep
The field separator character. Values on each line of the file are separated by this character.
skip
Defines the number of header lines should be skipped.
normalize.names
A logical value indicating whether the columns names should be automatically renamed to ensure R compatibility.
string.as.factor
A logical value indicating if character
columns should be converted to factors (default = FALSE
).
ignore.columns
Specify the columns from the input file that should be ignored.
getColumnNames()
Get the name of the columns comprising the dataset.
Dataset$getColumnNames()
A character vector with the name of each column.
getDataset()
Gets the full dataset.
Dataset$getDataset()
A data.frame with all the loaded information.
getNcol()
Obtains the number of columns present in the dataset.
Dataset$getNcol()
getNrow()
Obtains the number of rows present in the dataset.
Dataset$getNrow()
getRemovedColumns()
Get the columns removed or ignored.
Dataset$getRemovedColumns()
A list containing the name of the removed columns.
cleanData()
Removes data.frame columns matching some criterion.
Dataset$cleanData(remove.funcs = NULL, remove.na = TRUE, remove.const = FALSE)
remove.funcs
A vector of functions use to define which columns must be removed.
remove.na
A logical value indicating whether NA values should be removed.
remove.const
A logical value used to indicate if constant values should be removed.
removeColumns()
Applies cleanData
function over an specific set of
columns.
Dataset$removeColumns(
columns,
remove.funcs = NULL,
remove.na = FALSE,
remove.const = FALSE
)
remove.funcs
A vector of functions use to define which columns must be removed.
remove.na
A logical value indicating whether
NA
values should be removed.
remove.const
A logical value used to indicate if constant values should be removed.
createPartitions()
Creates a k-folds partition from the initial dataset.
Dataset$createPartitions(
num.folds = NULL,
percent.folds = NULL,
class.balance = NULL
)
num.folds
A numeric for the number of folds (partitions)
percent.folds
A numeric vector with the percentage of instances containing each fold.
class.balance
A logical value indicating if class balance should be kept.
createSubset()
Creates a Subset
for testing or classification
purposes. A target class should be provided for testing purposes.
Dataset$createSubset(
num.folds = NULL,
opts = list(remove.na = TRUE, remove.const = FALSE),
class.index = NULL,
positive.class = NULL
)
num.folds
A numeric defining the number of folds that should we used to build the Subset.
opts
A list with optional parameters. Valid arguments are
remove.na
(removes columns with NA values) and
remove.const
(ignore columns with constant values).
class.index
A numeric value identifying the column representing the target class
positive.class
Defines the positive class value.
A Subset object.
createTrain()
Creates a set for training purposes. A class should be defined to guarantee full-compatibility with supervised models.
Dataset$createTrain(
class.index,
positive.class,
num.folds = NULL,
opts = list(remove.na = TRUE, remove.const = FALSE)
)
class.index
A numeric value identifying the column representing the target class
positive.class
Defines the positive class value.
num.folds
A numeric defining the number of folds that
should we used to build the Subset
.
opts
A list with optional parameters. Valid arguments are
remove.na
(removes columns with NA values) and
remove.const
(ignore columns with constant values).
A Trainset
object.
HDDataset