Question

Below is a simplified example of a h2o gradient boosting machine model using R's iris dataset. The model is trained to predict sepal length.

The example yields an r2 value of 0.93, which seems unrealistic. How can I assess if these are indeed realistic results or simply model overfitting?

library(datasets)
library(h2o)

# Get the iris dataset
df <- iris

# Convert to h2o
df.hex <- as.h2o(df)

# Initiate h2o
h2o.init()

# Train GBM model
gbm_model <- h2o.gbm(x = 2:5, y = 1, df.hex, 
                     ntrees=100, max_depth=4, learn_rate=0.1)

# Check Accuracy
perf_gbm <- h2o.performance(gbm_model) 
rsq_gbm <- h2o.r2(perf_gbm)

---------->

> rsq_gbm
[1] 0.9312635

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top