Skip to content

Basic model wrappers for h2o model functions that include data conversion, seed configuration, and so on.

Usage

h2o_train(
  x,
  y,
  model,
  weights = NULL,
  validation = NULL,
  save_data = FALSE,
  ...
)

h2o_train_rf(x, y, ntrees = 50, mtries = -1, min_rows = 1, ...)

h2o_train_xgboost(
  x,
  y,
  ntrees = 50,
  max_depth = 6,
  min_rows = 1,
  learn_rate = 0.3,
  sample_rate = 1,
  col_sample_rate = 1,
  min_split_improvement = 0,
  stopping_rounds = 0,
  validation = NULL,
  ...
)

h2o_train_gbm(
  x,
  y,
  ntrees = 50,
  max_depth = 6,
  min_rows = 1,
  learn_rate = 0.3,
  sample_rate = 1,
  col_sample_rate = 1,
  min_split_improvement = 0,
  stopping_rounds = 0,
  ...
)

h2o_train_glm(x, y, lambda = NULL, alpha = NULL, ...)

h2o_train_nb(x, y, laplace = 0, ...)

h2o_train_mlp(
  x,
  y,
  hidden = 200,
  l2 = 0,
  hidden_dropout_ratios = 0,
  epochs = 10,
  activation = "Rectifier",
  validation = NULL,
  ...
)

h2o_train_rule(
  x,
  y,
  rule_generation_ntrees = 50,
  max_rule_length = 5,
  lambda = NULL,
  ...
)

h2o_train_auto(x, y, verbosity = NULL, save_data = FALSE, ...)

Arguments

x

A data frame of predictors.

y

A vector of outcomes.

model

A character string for the model. Current selections are "automl", "randomForest", "xgboost", "gbm", "glm", "deeplearning", "rulefit" and "naiveBayes". Use h2o_xgboost_available() to see if xgboost can be used on your OS/h2o server.

weights

A numeric vector of case weights.

validation

An integer between 0 and 1 specifying the proportion of the data reserved as validation set. This is used by h2o for performance assessment and potential early stopping. Default to 0.

save_data

A logical for whether training data should be saved on the h2o server, set this to TRUE for AutoML models that needs to be re-fitted.

...

Other options to pass to the h2o model functions (e.g., h2o::h2o.randomForest()).

ntrees

Number of trees. Defaults to 50.

mtries

Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrtp for classification and p/3 for regression (where p is the # of predictors Defaults to -1.

min_rows

Fewest allowed (weighted) observations in a leaf. Defaults to 1.

max_depth

Maximum tree depth (0 for unlimited). Defaults to 20.

learn_rate

(same as eta) Learning rate (from 0.0 to 1.0) Defaults to 0.3.

sample_rate

Row sample rate per tree (from 0.0 to 1.0) Defaults to 0.632.

col_sample_rate

(same as colsample_bylevel) Column sample rate (from 0.0 to 1.0) Defaults to 1.

min_split_improvement

Minimum relative improvement in squared error reduction for a split to happen Defaults to 1e-05.

stopping_rounds

Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0.

lambda

Regularization strength

alpha

Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise.

laplace

Laplace smoothing parameter Defaults to 0.

hidden

Hidden layer sizes (e.g. [100, 100]). Defaults to c(200, 200).

l2

L2 regularization (can add stability and improve generalization, causes many weights to be small. Defaults to 0.

hidden_dropout_ratios

Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.

epochs

How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10.

activation

Activation function. Must be one of: "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout". Defaults to Rectifier.

rule_generation_ntrees

Specifies the number of trees to build in the tree model. Defaults to 50. Defaults to 50.

max_rule_length

Maximum length of rules. Defaults to 3.

verbosity

Verbosity of the backend messages printed during training; Must be one of NULL (live log disabled), "debug", "info", "warn", "error". Defaults to NULL.

Value

An h2o model object.

Examples

# start with h2o::h2o.init()

if (h2o_running()) {
  # -------------------------------------------------------------------------
  # Using the model wrappers:
  h2o_train_glm(mtcars[, -1], mtcars$mpg)

  # -------------------------------------------------------------------------
  # using parsnip:

  spec <-
    rand_forest(mtry = 3, trees = 1000) %>%
    set_engine("h2o") %>%
    set_mode("regression")

  set.seed(1)
  mod <- fit(spec, mpg ~ ., data = mtcars)
  mod

  predict(mod, head(mtcars))
}
#> # A tibble: 6 × 1
#>   .pred
#>   <dbl>
#> 1  20.9
#> 2  20.8
#> 3  23.3
#> 4  20.4
#> 5  17.9
#> 6  18.7