# discretize continuous data into factors.
discretize(data, method, breaks = 3, ordered = FALSE, ..., debug = FALSE)
# screen continuous data for highly correlated pairs of variables.
dedup(data, threshold, debug = FALSE)
dedup
) or a
combination of numeric or factor columns (for
).interval
for interval
discretization, quantile
for quantile discretization
(the default) or hartemink
for Hartemink's pairwise mutual
information method.method
is set to hartemink
, an integer number,
the number of levels the variables are to be discretized into. Otherwise,
a vector of integer numbers, one for each column of the data set, specifying
the number of levels for each variable.TRUE
the discretized variables are
returned as ordered factors instead of unordered ones.TRUE
a lot of debugging output is
printed; otherwise the function is completely silent.discretize
returns a data frame with the same structure (number of
columns, column names, etc.) as data
, containing the discretized
variables. dedup
returns a data frame with a subset of the columns of data
.discretize
takes a data frame of continuous variables as its first
argument and returns a secdond data frame of discrete variables, transformed
using of three methods: interval
, quantile
or hartemink
. dedup
screens the data for pairs of highly correlated variables, and
discards one in each pair.data(gaussian.test)
d = discretize(gaussian.test, method = 'hartemink', breaks = 4, ibreaks = 20)
plot(hc(d))
d2 = dedup(gaussian.test)
Run the code above in your browser using DataLab