Head of data frame (put report here)
Data summary (in Console)
Variance Inflation Factor report
Correlation of the data (table)
Histograms of each numeric column
Boxplots of the numeric data
Each feature vs target (by percent)
Each feature vs target (by number)
Correlation plot of the numeric data (as circles and colors)
Correlation plot of the numeric data (as numbers and colors)
Correlation of the data (report)
library(ClassificationEnsembles)
Classification(data = ISLR::Carseats,
colnum = 7,
numresamples = 25,
predict_on_new_data = "N",
save_all_plots = "N",
set_seed = "N",
how_to_handle_strings = 1,
remove_VIF_above <- 5.00,
save_all_trained_models = "N",
scale_all_numeric_predictors_in_data = "N",
use_parallel = "N",
train_amount = 0.50,
test_amount = 0.25,
validation_amount = 0.25)
Discussion of function call here. (For example, the code above randomly resamples the data 25 times, and sets train = 0.50, test = 0.25, validation = 0.25, you might want to discuss other aspects of the function call. For example, the function call does not set a seed, so the results are random.)
C50:
C50_train_fit <- C50::C5.0(as.factor(y_train) ~ ., data = train)
Linear:
linear_train_fit <- MachineShop::fit(y ~ ., data = train01, model = "LMModel")
Partial Least Squares:
pls_train_fit <- MachineShop::fit(y ~ ., data = train01, model = "PLSModel")
Penalized Discriminant Analysis
pda_train_fit <- MachineShop::fit(y ~ ., data = train01, model = "PDAModel")
RPart:
rpart_train_fit <- MachineShop::fit(y ~ ., data = train01, model = "RPartModel")
Trees:
tree_train_fit <- tree::tree(y_train ~ ., data = train)
How the ensemble is made:
ensemble1 <- data.frame(
"C50" = c(C50_test_pred, C50_validation_pred),
"Linear" = c(linear_test_pred, linear_validation_pred),
"Partial_Least_Squares" = c(pls_test_pred, pls_validation_pred),
"Penalized_Discriminant_Analysis" = c(pda_test_pred, pda_validation_pred),
"RPart" = c(rpart_test_pred, rpart_validation_pred),
"Trees" = c(tree_test_pred, tree_validation_pred)
)
ensemble_row_numbers <- as.numeric(row.names(ensemble1))
ensemble1$y <- df[ensemble_row_numbers, "y"]
ensemble1 <- ensemble1[complete.cases(ensemble1), ]
Ensemble Bagged Cart:
ensemble_bag_cart_train_fit <- ipred::bagging(y ~ ., data = ensemble_train)
Ensemble Bagged Random Forest:
ensemble_bag_train_rf <- randomForest::randomForest(ensemble_y_train ~ ., data = ensemble_train, mtry = ncol(ensemble_train) - 1)
Ensemble C50:
ensemble_C50_train_fit <- C50::C5.0(ensemble_y_train ~ ., data = ensemble_train)
Ensemble Naive Bayes:
ensemble_n_bayes_train_fit <- e1071::naiveBayes(ensemble_y_train ~ ., data = ensemble_train)
Ensemble Support Vector Machines:
ensemble_svm_train_fit <- e1071::svm(ensemble_y_train ~ ., data = ensemble_train, kernel = "radial", gamma = 1, cost = 1)
Ensemble Trees:
ensemble_tree_train_fit <- tree::tree(y ~ ., data = ensemble_train)
Model accuracy (put model accuracy barchart here)
All confusion matrices (in console)
Over or underfitting barchart
True positive rate by model and resample (choose fixed scales or free scales)
True negative rate by model and resample (choose fixed or free scales)
False positive rate by model and resample (choose fixed or free scales)
False negative rate by model and resample (choose fixed or free scales)
Duration barchart
Accuracy by model and resampling (chose fixed or free scales)
Accuracy data, including train and holdout (choose fixed or free scales)
Classification error by model and resample (choose fixed or free scales)
Residuals by model and resample (choose fixed or free scales)
Holdout accuracy / train accuracy by model and resample (choose fixed or free scales)
Head of ensemble (report)
Variance Inflation Factor report
Most accurate model:
Mean Holdout Accuracy
Standard deviation of mean holdout accuracy
Classification error mean
Duration (mean)
True positive rate (mean)
True negative rate (mean)
False positive rate (mean)
False negative rate (mean)
Positive predictive value (mean)
Negative predictive value (mean)
Prevalence (mean)
Detection rate (mean)
Detection prevalence (mean)
F1 Score
Train accuracy (mean)
Test accuracy (mean)
Validation accuracy (mean)
Holdout vs train (mean)
Holdout vs train standard deviation