Basic model wrappers for h2o model functions that include data conversion, seed configuration, and so on.

## Usage

```
h2o_train(
x,
y,
model,
weights = NULL,
validation = NULL,
save_data = FALSE,
...
)
h2o_train_rf(x, y, ntrees = 50, mtries = -1, min_rows = 1, ...)
h2o_train_xgboost(
x,
y,
ntrees = 50,
max_depth = 6,
min_rows = 1,
learn_rate = 0.3,
sample_rate = 1,
col_sample_rate = 1,
min_split_improvement = 0,
stopping_rounds = 0,
validation = NULL,
...
)
h2o_train_gbm(
x,
y,
ntrees = 50,
max_depth = 6,
min_rows = 1,
learn_rate = 0.3,
sample_rate = 1,
col_sample_rate = 1,
min_split_improvement = 0,
stopping_rounds = 0,
...
)
h2o_train_glm(x, y, lambda = NULL, alpha = NULL, ...)
h2o_train_nb(x, y, laplace = 0, ...)
h2o_train_mlp(
x,
y,
hidden = 200,
l2 = 0,
hidden_dropout_ratios = 0,
epochs = 10,
activation = "Rectifier",
validation = NULL,
...
)
h2o_train_rule(
x,
y,
rule_generation_ntrees = 50,
max_rule_length = 5,
lambda = NULL,
...
)
h2o_train_auto(x, y, verbosity = NULL, save_data = FALSE, ...)
```

## Arguments

- x
A data frame of predictors.

- y
A vector of outcomes.

- model
A character string for the model. Current selections are

`"automl"`

,`"randomForest"`

,`"xgboost"`

,`"gbm"`

,`"glm"`

,`"deeplearning"`

,`"rulefit"`

and`"naiveBayes"`

. Use`h2o_xgboost_available()`

to see if xgboost can be used on your OS/h2o server.- weights
A numeric vector of case weights.

- validation
An integer between 0 and 1 specifying the

*proportion*of the data reserved as validation set. This is used by h2o for performance assessment and potential early stopping. Default to 0.- save_data
A logical for whether training data should be saved on the h2o server, set this to

`TRUE`

for AutoML models that needs to be re-fitted.- ...
Other options to pass to the h2o model functions (e.g.,

`h2o::h2o.randomForest()`

).- ntrees
Number of trees. Defaults to 50.

- mtries
Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrtp for classification and p/3 for regression (where p is the # of predictors Defaults to -1.

- min_rows
Fewest allowed (weighted) observations in a leaf. Defaults to 1.

- max_depth
Maximum tree depth (0 for unlimited). Defaults to 20.

- learn_rate
(same as eta) Learning rate (from 0.0 to 1.0) Defaults to 0.3.

- sample_rate
Row sample rate per tree (from 0.0 to 1.0) Defaults to 0.632.

- col_sample_rate
(same as colsample_bylevel) Column sample rate (from 0.0 to 1.0) Defaults to 1.

- min_split_improvement
Minimum relative improvement in squared error reduction for a split to happen Defaults to 1e-05.

- stopping_rounds
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0.

- lambda
Regularization strength

- alpha
Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise.

- laplace
Laplace smoothing parameter Defaults to 0.

- hidden
Hidden layer sizes (e.g. [100, 100]). Defaults to c(200, 200).

- l2
L2 regularization (can add stability and improve generalization, causes many weights to be small. Defaults to 0.

- hidden_dropout_ratios
Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.

- epochs
How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10.

- activation
Activation function. Must be one of: "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout". Defaults to Rectifier.

- rule_generation_ntrees
Specifies the number of trees to build in the tree model. Defaults to 50. Defaults to 50.

- max_rule_length
Maximum length of rules. Defaults to 3.

- verbosity
Verbosity of the backend messages printed during training; Must be one of NULL (live log disabled), "debug", "info", "warn", "error". Defaults to NULL.

## Examples

```
# start with h2o::h2o.init()
if (h2o_running()) {
# -------------------------------------------------------------------------
# Using the model wrappers:
h2o_train_glm(mtcars[, -1], mtcars$mpg)
# -------------------------------------------------------------------------
# using parsnip:
spec <-
rand_forest(mtry = 3, trees = 1000) %>%
set_engine("h2o") %>%
set_mode("regression")
set.seed(1)
mod <- fit(spec, mpg ~ ., data = mtcars)
mod
predict(mod, head(mtcars))
}
#> # A tibble: 6 × 1
#> .pred
#> <dbl>
#> 1 20.9
#> 2 20.8
#> 3 23.3
#> 4 20.4
#> 5 17.9
#> 6 18.7
```