
Creates a data frame in H2O with real-valued, categorical, integer, and binary columns specified by the user.
h2o.createFrame(
rows = 10000,
cols = 10,
randomize = TRUE,
value = 0,
real_range = 100,
categorical_fraction = 0.2,
factors = 100,
integer_fraction = 0.2,
integer_range = 100,
binary_fraction = 0.1,
binary_ones_fraction = 0.02,
time_fraction = 0,
string_fraction = 0,
missing_fraction = 0.01,
response_factors = 2,
has_response = FALSE,
seed,
seed_for_column_types
)
Returns an H2OFrame object.
The number of rows of data to generate.
The number of columns of data to generate. Excludes the response column if has_response = TRUE
.
A logical value indicating whether data values should be randomly generated. This must be TRUE if either categorical_fraction
or integer_fraction
is non-zero.
If randomize = FALSE
, then all real-valued entries will be set to this value.
The range of randomly generated real values.
The fraction of total columns that are categorical.
The number of (unique) factor levels in each categorical column.
The fraction of total columns that are integer-valued.
The range of randomly generated integer values.
The fraction of total columns that are binary-valued.
The fraction of values in a binary column that are set to 1.
The fraction of randomly created date/time columns.
The fraction of randomly created string columns.
The fraction of total entries in the data frame that are set to NA.
If has_response = TRUE
, then this is the number of factor levels in the response column.
A logical value indicating whether an additional response column should be pre-pended to the final H2O data frame. If set to TRUE, the total number of columns will be cols+1
.
A seed used to generate random values when randomize = TRUE
.
A seed used to generate random column types when randomize = TRUE
.
if (FALSE) {
library(h2o)
h2o.init()
hf <- h2o.createFrame(rows = 1000, cols = 100, categorical_fraction = 0.1,
factors = 5, integer_fraction = 0.5, integer_range = 1,
has_response = TRUE)
head(hf)
summary(hf)
hf <- h2o.createFrame(rows = 100, cols = 10, randomize = FALSE, value = 5,
categorical_fraction = 0, integer_fraction = 0)
summary(hf)
}
Run the code above in your browser using DataLab