Screen and transform the data to make them more suitable for structure and parameter learning.
# discretize continuous data into factors.
discretize(data, method, breaks = 3, ordered = FALSE, ..., debug = FALSE)
# screen continuous data for highly correlated pairs of variables.
dedup(data, threshold, debug = FALSE)
a data frame containing numeric columns (for dedup
) or a
combination of numeric or factor columns (for ).
a numeric value between zero and one, the absolute correlation used a threshold in screening highly correlated pairs.
a character string, either interval
for interval
discretization, quantile
for quantile discretization
(the default) or hartemink
for Hartemink's pairwise mutual
information method.
if method
is set to hartemink
, an integer number,
the number of levels the variables are to be discretized into. Otherwise,
a vector of integer numbers, one for each column of the data set, specifying
the number of levels for each variable.
a boolean value. If TRUE
the discretized variables are
returned as ordered factors instead of unordered ones.
additional tuning parameters, see below.
a boolean value. If TRUE
a lot of debugging output is
printed; otherwise the function is completely silent.
discretize
returns a data frame with the same structure (number of
columns, column names, etc.) as data
, containing the discretized
variables.
dedup
returns a data frame with a subset of the columns of data
.
discretize
takes a data frame of continuous variables as its first
argument and returns a secdond data frame of discrete variables, transformed
using of three methods: interval
, quantile
or hartemink
.
dedup
screens the data for pairs of highly correlated variables, and
discards one in each pair.
Hartemink A (2001). Principled Computational Methods for the Validation and Discovery of Genetic Regulatory Networks. Ph.D. thesis, School of Electrical Engineering and Computer Science, Massachusetts Institute of Technology.
data(gaussian.test)
d = discretize(gaussian.test, method = 'hartemink', breaks = 4, ibreaks = 20)
plot(hc(d))
d2 = dedup(gaussian.test)
Run the code above in your browser using DataLab