Basic model wrappers for h2o model functions that include data conversion, seed configuration, and so on.
Usage
h2o_train(
x,
y,
model,
weights = NULL,
validation = NULL,
save_data = FALSE,
...
)
h2o_train_rf(x, y, ntrees = 50, mtries = -1, min_rows = 1, ...)
h2o_train_xgboost(
x,
y,
ntrees = 50,
max_depth = 6,
min_rows = 1,
learn_rate = 0.3,
sample_rate = 1,
col_sample_rate = 1,
min_split_improvement = 0,
stopping_rounds = 0,
validation = NULL,
...
)
h2o_train_gbm(
x,
y,
ntrees = 50,
max_depth = 6,
min_rows = 1,
learn_rate = 0.3,
sample_rate = 1,
col_sample_rate = 1,
min_split_improvement = 0,
stopping_rounds = 0,
...
)
h2o_train_glm(x, y, lambda = NULL, alpha = NULL, ...)
h2o_train_nb(x, y, laplace = 0, ...)
h2o_train_mlp(
x,
y,
hidden = 200,
l2 = 0,
hidden_dropout_ratios = 0,
epochs = 10,
activation = "Rectifier",
validation = NULL,
...
)
h2o_train_rule(
x,
y,
rule_generation_ntrees = 50,
max_rule_length = 5,
lambda = NULL,
...
)
h2o_train_auto(x, y, verbosity = NULL, save_data = FALSE, ...)
Arguments
- x
A data frame of predictors.
- y
A vector of outcomes.
- model
A character string for the model. Current selections are
"automl"
,"randomForest"
,"xgboost"
,"gbm"
,"glm"
,"deeplearning"
,"rulefit"
and"naiveBayes"
. Useh2o_xgboost_available()
to see if xgboost can be used on your OS/h2o server.- weights
A numeric vector of case weights.
- validation
An integer between 0 and 1 specifying the proportion of the data reserved as validation set. This is used by h2o for performance assessment and potential early stopping. Default to 0.
- save_data
A logical for whether training data should be saved on the h2o server, set this to
TRUE
for AutoML models that needs to be re-fitted.- ...
Other options to pass to the h2o model functions (e.g.,
h2o::h2o.randomForest()
).- ntrees
Number of trees. Defaults to 50.
- mtries
Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrt{p} for classification and p/3 for regression (where p is the # of predictors Defaults to -1.
- min_rows
Fewest allowed (weighted) observations in a leaf. Defaults to 1.
- max_depth
Maximum tree depth (0 for unlimited). Defaults to 20.
- learn_rate
(same as eta) Learning rate (from 0.0 to 1.0) Defaults to 0.3.
- sample_rate
Row sample rate per tree (from 0.0 to 1.0) Defaults to 0.632.
- col_sample_rate
(same as colsample_bylevel) Column sample rate (from 0.0 to 1.0) Defaults to 1.
- min_split_improvement
Minimum relative improvement in squared error reduction for a split to happen Defaults to 1e-05.
- stopping_rounds
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0.
- lambda
Regularization strength
- alpha
Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise.
- laplace
Laplace smoothing parameter Defaults to 0.
- hidden
Hidden layer sizes (e.g. [100, 100]). Defaults to c(200, 200).
- l2
L2 regularization (can add stability and improve generalization, causes many weights to be small. Defaults to 0.
- hidden_dropout_ratios
Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.
- epochs
How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10.
- activation
Activation function. Must be one of: "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout". Defaults to Rectifier.
- rule_generation_ntrees
Specifies the number of trees to build in the tree model. Defaults to 50. Defaults to 50.
- max_rule_length
Maximum length of rules. Defaults to 3.
- verbosity
Verbosity of the backend messages printed during training; Must be one of NULL (live log disabled), "debug", "info", "warn", "error". Defaults to NULL.
Examples
# start with h2o::h2o.init()
if (h2o_running()) {
# -------------------------------------------------------------------------
# Using the model wrappers:
h2o_train_glm(mtcars[, -1], mtcars$mpg)
# -------------------------------------------------------------------------
# using parsnip:
spec <-
rand_forest(mtry = 3, trees = 500) %>%
set_engine("h2o") %>%
set_mode("regression")
set.seed(1)
mod <- fit(spec, mpg ~ ., data = mtcars)
mod
predict(mod, head(mtcars))
}