AutoML (Automated Machine Learning) streamlines machine learning workflows, making them more accessible and
efficient
for users of all experience levels. This crate extends the smartcore machine learning
framework, providing utilities to quickly train, compare, and deploy models.
Add automl to your Cargo.toml to get started:
Stable Version
automl = "0.3.0"Latest Development Version
automl = { git = "https://github.com/cmccomb/rust-automl" }Hereβs a quick example to illustrate how AutoML can simplify model training and comparison:
let dataset = smartcore::dataset::breast_cancer::load_dataset();
let settings = automl::Settings::default_classification();
let mut classifier = automl::SupervisedModel::new(dataset, settings);
classifier.train();will perform a comparison of classifier models using cross-validation. Printing the classifier object will yield:
ββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββββ¬βββββββββββββββββββ
β Model β Time β Training Accuracy β Testing Accuracy β
ββββββββββββββββββββββββββββββββββͺββββββββββββββββββββββͺββββββββββββββββββββͺβββββββββββββββββββ‘
β Random Forest Classifier β 835ms 393us 583ns β 1.00 β 0.96 β
ββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββΌββββββββββββββββββββΌβββββββββββββββββββ€
β Logistic Regression Classifier β 620ms 714us 583ns β 0.97 β 0.95 β
ββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββΌββββββββββββββββββββΌβββββββββββββββββββ€
β Gaussian Naive Bayes β 6ms 529us β 0.94 β 0.93 β
ββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββΌββββββββββββββββββββΌβββββββββββββββββββ€
β Categorical Naive Bayes β 2ms 922us 250ns β 0.96 β 0.93 β
ββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββΌββββββββββββββββββββΌβββββββββββββββββββ€
β Decision Tree Classifier β 15ms 404us 750ns β 1.00 β 0.93 β
ββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββΌββββββββββββββββββββΌβββββββββββββββββββ€
β KNN Classifier β 28ms 874us 208ns β 0.96 β 0.92 β
ββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββΌββββββββββββββββββββΌβββββββββββββββββββ€
β Support Vector Classifier β 4s 187ms 61us 708ns β 0.57 β 0.57 β
ββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββββ΄βββββββββββββββββββ
You can then perform inference using the best model with the predict method.
Explore the automl::cookbook module for copy-pastable examples that mirror
real-world workflows:
cargo run --example breast_cancer_csvβ load the Wisconsin Diagnostic Breast Cancer dataset from CSV, standardize features, and compare tuned classifiers.cargo run --example diabetes_regressionβ impute, scale, and train regression models for the diabetes progression dataset.
automl now supports composable preprocessing pipelines so you can build
feature engineering recipes similar to AutoGluon or caret. Pipelines are
defined with the PreprocessingStep
enum and attached via either the add_step builder or by passing a full
PreprocessingPipeline.
use automl::settings::{
ClassificationSettings, PreprocessingPipeline, PreprocessingStep, RegressionSettings,
StandardizeParams,
};
use automl::DenseMatrix;
let regression = RegressionSettings::<f64, f64, DenseMatrix<f64>, Vec<f64>>::default()
.add_step(PreprocessingStep::Standardize(StandardizeParams::default()))
.add_step(PreprocessingStep::ReplaceWithPCA {
number_of_components: 5,
});
let classification = ClassificationSettings::default().with_preprocessing(
PreprocessingPipeline::new()
.add_step(PreprocessingStep::AddInteractions)
.add_step(PreprocessingStep::ReplaceWithSVD {
number_of_components: 4,
}),
);Pipelines preserve the order of steps. Stateful steps such as PCA, SVD, or
standardization automatically fit during training and reuse the same fitted
state when you call predict.
The pipeline is intentionally modular so you can mix and match steps to
mirror popular AutoML defaults:
- Scaling β standard, minβmax, or robust scaling via
ScaleParams. - Imputation β mean, median, or most-frequent replacement with
ImputeParams. - Categorical encoders β ordinal or one-hot encoding with optional dummy drop.
- Power transforms β per-column log or Box-Cox transforms with automatic shifting for strictly positive domains.
- Column filters β select or exclude features using
ColumnSelectorhelpers.
Each state stores the fitted statistics (e.g., medians, category mappings) so that the same transformation can be applied consistently during inference.
use automl::settings::{
CategoricalEncoderParams, CategoricalEncoding, ColumnFilterParams, ColumnSelector,
ImputeParams, ImputeStrategy, PowerTransform, PowerTransformParams, PreprocessingPipeline,
PreprocessingStep, RobustScaleParams, ScaleParams, ScaleStrategy,
};
let pipeline = PreprocessingPipeline::new()
// Fill numeric columns with the median before scaling.
.add_step(PreprocessingStep::Impute(ImputeParams {
strategy: ImputeStrategy::Median,
selector: ColumnSelector::Include(vec![0, 1, 2]),
}))
// Apply a robust scaler to guard against outliers (similar to AutoGluon).
.add_step(PreprocessingStep::Scale(ScaleParams {
strategy: ScaleStrategy::Robust(RobustScaleParams::default()),
selector: ColumnSelector::Include(vec![0, 1, 2]),
}))
// One-hot encode categorical columns and drop the reference level.
.add_step(PreprocessingStep::EncodeCategorical(CategoricalEncoderParams {
selector: ColumnSelector::Include(vec![3, 4]),
encoding: CategoricalEncoding::one_hot(true),
}))
// Optionally keep only the engineered features.
.add_step(PreprocessingStep::FilterColumns(ColumnFilterParams {
selector: ColumnSelector::Include(vec![0, 1, 2, 5, 6]),
retain_selected: true,
}));use automl::settings::{
ColumnSelector, PowerTransform, PowerTransformParams, PreprocessingPipeline,
PreprocessingStep, ScaleParams, ScaleStrategy, StandardizeParams,
};
let caret_like = PreprocessingPipeline::new()
.add_step(PreprocessingStep::PowerTransform(PowerTransformParams {
selector: ColumnSelector::Include(vec![0]),
transform: PowerTransform::Log { offset: 0.0 },
}))
.add_step(PreprocessingStep::Scale(ScaleParams {
strategy: ScaleStrategy::Standard(StandardizeParams::default()),
selector: ColumnSelector::All,
}));This crate has several features that add some additional methods.
| Feature | Description |
|---|---|
nd |
Adds methods for predicting/reading data using ndarray. |
csv |
Adds methods for predicting/reading data from a .csv using polars. |
- Feature Engineering
- PCA
- SVD
- Interaction terms
- Polynomial terms
- Regression
- Decision Tree Regression
- KNN Regression
- Random Forest Regression
- Linear Regression
- Ridge Regression
- LASSO
- Elastic Net
- Support Vector Regression
- Classification
- Random Forest Classification
- Decision Tree Classification
- Support Vector Classification
- Logistic Regression
- KNN Classification
- Gaussian Naive Bayes
- Meta-learning
- Blending
- Save and load settings
- Save and load models