# discretize continuous data into factors.
discretize(data, method, breaks = 3, ordered = FALSE, ..., debug = FALSE)
# screen continuous data for highly correlated pairs of variables.
dedup(data, threshold, debug = FALSE)
dedup
) or
a combination of numeric or factor columns (for
).interval
for interval
discretization, quantile
for quantile discretization
(the default) or hartemink
for Hartemink's pairwise mutual
information methmethod
is set to hartemink
, an integer number,
the number of levels the variables are to be discretized into. Otherwise,
a vector of integer numbers, one for each column of the data set, specifying
the number of leTRUE
the discretized variables are
returned as ordered factors instead of unordered ones.TRUE
a lot of debugging output
is printed; otherwise the function is completely silent.discretize
returns a data frame with the same structure (number
of columns, column names, etc.) as data
, containing the discretized
variables. dedup
returns a data frame with a subset of the columns of data
.
discretize
takes a data frame of continuous variables as its first
argument and returns a secdond data frame of discrete variables, transformed
using of three methods: interval
, quantile
or hartemink
. dedup
screens the data for pairs of highly correlated variables, and
discards one in each pair.
data(gaussian.test)
d = discretize(gaussian.test, method = 'hartemink', breaks = 4, ibreaks = 20)
plot(hc(d))
d2 = dedup(gaussian.test)
Run the code above in your browser using DataLab