|
813 | 813 | "source": [ |
814 | 814 | "Although we tried to chose default model parameters that work well in a wide range of scenarios, hyperparameter search will often find an emulator model with a better fit. Internally, `AutoEmulate` compares the performance of different models and hyperparameters using cross-validation on the training data, which can be computationally expensive and time-consuming for larger datasets. To speed it up, we can parallelise the process with `n_jobs`.\n", |
815 | 815 | "\n", |
816 | | - "For each model, we've pre-defined a search space for hyperparameters. When setting up `AutoEmulate` with `param_search=True`, we default to using random search with `param_search_iters = 20` iterations. We plan to add other hyperparameter search methods in the future. \n", |
| 816 | + "For each model, we've pre-defined a search space for hyperparameters. When setting up `AutoEmulate` with `param_search=True`, we default to using random search with `param_search_iters = 20` iterations. This means that 20 hyperparameter combinations from the search space are sampled and evaluated. We plan to add other hyperparameter search methods in the future. \n", |
817 | 817 | "\n", |
818 | | - "Let's do a hyperparameter search for the Gaussian Process and Random Forest models." |
| 818 | + "Let's do a hyperparameter search for the Support Vector Machines and Random Forest models." |
819 | 819 | ] |
820 | 820 | }, |
821 | 821 | { |
|
1352 | 1352 | ], |
1353 | 1353 | "source": [ |
1354 | 1354 | "em = AutoEmulate()\n", |
1355 | | - "em.setup(X, y, param_search=True, param_search_type=\"random\", param_search_iters=20, models=[\"GaussianProcess\", \"RandomForest\"], n_jobs=-2) # n_jobs=-2 uses all cores but one\n", |
| 1355 | + "em.setup(X, y, param_search=True, param_search_type=\"random\", param_search_iters=10, models=[\"SupportVectorMachines\", \"RandomForest\"], n_jobs=-2) # n_jobs=-2 uses all cores but one\n", |
1356 | 1356 | "em.compare()" |
1357 | 1357 | ] |
1358 | 1358 | }, |
|
1427 | 1427 | "metadata": {}, |
1428 | 1428 | "source": [ |
1429 | 1429 | "**Notes**: \n", |
1430 | | - "* Some models, such as `GaussianProcess` can be slow to run hyperparameter search on larger datasets (say n > 1500). \n", |
| 1430 | + "* Some models, such as `GaussianProcess` can be slow when conducting hyperparameter search on larger datasets (say n > 1000). \n", |
1431 | 1431 | "* Use the `models` argument to only run hyperparameter search on a subset of models to speed up the process.\n", |
1432 | 1432 | "* When possible, use `n_jobs` to parallelise the hyperparameter search. With larger datasets, we recommend setting `param_search_iters` to a lower number, such as 5, to see how long it takes to run and then increase it if necessary.\n", |
1433 | 1433 | "* all models can be specified with short names too, such as `rf` for `RandomForest`, `gp` for `GaussianProcess`, `svm` for `SupportVectorMachines`, etc" |
|
0 commit comments