Helper function to create pipeline stage objects with common parameter setters.
spark_pipeline_stage(
sc,
class,
uid,
features_col = NULL,
label_col = NULL,
prediction_col = NULL,
probability_col = NULL,
raw_prediction_col = NULL,
k = NULL,
max_iter = NULL,
seed = NULL,
input_col = NULL,
input_cols = NULL,
output_col = NULL,
output_cols = NULL
)
A `spark_connection` object.
Class name for the pipeline stage.
A character string used to uniquely identify the ML estimator.
Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by ft_r_formula
.
Label column name. The column should be a numeric column. Usually this column is output by ft_r_formula
.
Prediction column name.
Column name for predicted class conditional probabilities.
Raw prediction (a.k.a. confidence) column name.
The number of clusters to create
The maximum number of iterations to use.
A random seed. Set this value if you need your results to be reproducible across repeated calls.
The name of the input column.
Names of output columns.
The name of the output column.
Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t
is predicted, where p
is the original probability of that class and t
is the class's threshold.