Model wrappers for h2o — h2o

Basic model wrappers for h2o model functions that include data conversion, seed configuration, and so on.

Usage

h2o_train(
  x,
  y,
  model,
  weights = NULL,
  validation = NULL,
  save_data = FALSE,
  ...
)

h2o_train_rf(x, y, ntrees = 50, mtries = -1, min_rows = 1, ...)

h2o_train_xgboost(
  x,
  y,
  ntrees = 50,
  max_depth = 6,
  min_rows = 1,
  learn_rate = 0.3,
  sample_rate = 1,
  col_sample_rate = 1,
  min_split_improvement = 0,
  stopping_rounds = 0,
  validation = NULL,
  ...
)

h2o_train_gbm(
  x,
  y,
  ntrees = 50,
  max_depth = 6,
  min_rows = 1,
  learn_rate = 0.3,
  sample_rate = 1,
  col_sample_rate = 1,
  min_split_improvement = 0,
  stopping_rounds = 0,
  ...
)

h2o_train_glm(x, y, lambda = NULL, alpha = NULL, ...)

h2o_train_nb(x, y, laplace = 0, ...)

h2o_train_mlp(
  x,
  y,
  hidden = 200,
  l2 = 0,
  hidden_dropout_ratios = 0,
  epochs = 10,
  activation = "Rectifier",
  validation = NULL,
  ...
)

h2o_train_rule(
  x,
  y,
  rule_generation_ntrees = 50,
  max_rule_length = 5,
  lambda = NULL,
  ...
)

h2o_train_auto(x, y, verbosity = NULL, save_data = FALSE, ...)

Arguments

x: A data frame of predictors.
y: A vector of outcomes.
model: A character string for the model. Current selections are "automl", "randomForest", "xgboost", "gbm", "glm", "deeplearning", "rulefit" and "naiveBayes". Use h2o_xgboost_available() to see if xgboost can be used on your OS/h2o server.
weights: A numeric vector of case weights.
validation: An integer between 0 and 1 specifying the proportion of the data reserved as validation set. This is used by h2o for performance assessment and potential early stopping. Default to 0.
save_data: A logical for whether training data should be saved on the h2o server, set this to TRUE for AutoML models that needs to be re-fitted.
...: Other options to pass to the h2o model functions (e.g., h2o::h2o.randomForest()).
ntrees: Number of trees. Defaults to 50.
mtries: Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrt{p} for classification and p/3 for regression (where p is the # of predictors Defaults to -1.
min_rows: Fewest allowed (weighted) observations in a leaf. Defaults to 1.
max_depth: Maximum tree depth (0 for unlimited). Defaults to 20.
learn_rate: (same as eta) Learning rate (from 0.0 to 1.0) Defaults to 0.3.
sample_rate: Row sample rate per tree (from 0.0 to 1.0) Defaults to 0.632.
col_sample_rate: (same as colsample_bylevel) Column sample rate (from 0.0 to 1.0) Defaults to 1.
min_split_improvement: Minimum relative improvement in squared error reduction for a split to happen Defaults to 1e-05.
stopping_rounds: Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0.
lambda: Regularization strength
alpha: Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise.
laplace: Laplace smoothing parameter Defaults to 0.
hidden: Hidden layer sizes (e.g. [100, 100]). Defaults to c(200, 200).
l2: L2 regularization (can add stability and improve generalization, causes many weights to be small. Defaults to 0.
hidden_dropout_ratios: Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.
epochs: How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10.
activation: Activation function. Must be one of: "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout". Defaults to Rectifier.
rule_generation_ntrees: Specifies the number of trees to build in the tree model. Defaults to 50. Defaults to 50.
max_rule_length: Maximum length of rules. Defaults to 3.
verbosity: Verbosity of the backend messages printed during training; Must be one of NULL (live log disabled), "debug", "info", "warn", "error". Defaults to NULL.

Value

An h2o model object.

Examples

# start with h2o::h2o.init()
if (h2o_running()) {
 # -------------------------------------------------------------------------
 # Using the model wrappers:
 h2o_train_glm(mtcars[, -1], mtcars$mpg)

 # -------------------------------------------------------------------------
 # using parsnip:

 spec <-
   rand_forest(mtry = 3, trees = 500) %>%
   set_engine("h2o") %>%
   set_mode("regression")

 set.seed(1)
 mod <- fit(spec, mpg ~ ., data = mtcars)
 mod

 predict(mod, head(mtcars))
}