-
Notifications
You must be signed in to change notification settings - Fork 12
44 Improve WhittakerSmooth, AirPls, and ArPls performance
#120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
44 Improve WhittakerSmooth, AirPls, and ArPls performance
#120
Conversation
…d dev-requirements
…APACK wrappers for Whittaker smoothers
…nhancement; extended tests; fixed type hints; black formatted; started lint fixes
…se-matrix-operations
…iguring out 100% fix
…ixed Tikhonov regularisation; shotgun refactor of all respective modules; extended and parametrized tests; added more FIXMEs and TODOs; adapted docstrings
…e difference matrix in favour of speed; added explanation of Whittaker smoothing
…ion and solve; added preliminary tests
… differences matrix version that does not rely on `scipy.sparse` anymore to avoid overhead and redundant computations
…eader; added sections; renamed "LU-" to "LU "
…oother names enum
… Cholesky; adapted models; used iterators; used more efficient numerical operations; incorporated fast pre-computations; got rid off sparse matrix representations via SciPy
…nary to remove one indirection with one branched logic
…and comments; improved docstring
This reverts commit e6e8405.
|
@MothNik FANTASTIC - I have been waiting for this day with a lot of enthusiasm 🤓🤓!! I am starting to review it right now, and it is a long review, but I hope I can have it done in about a month from now! It is a very exciting contribution. During the review process, we could also start considering how to add the different improvements to the documentation pages 😄 The restructure of the package is a good idea, it goes perfectly in line with #53, and I need to get done during the summer, Having the dev dependencies separated is a great starting point! Also nice to hear you have been using I think that now it is my turn, and I have some work to do 🥳🥳 |
|
@paucablop You are highly welcome 😸 Yes, it's a lot of files. I'm sorry it turned so big 😅 I usually would not do package restructuring in a feature branch, but the branch required some setup for the development environment, especially for the tests ✅❌ As I said, take your time 😸 |
|
I want to give special credits and thanks for the support by Guillaume Biessy, the author of Revisiting Whittaker-Henderson Smoothing which is - as far as I'm aware - the best review of the Whittaker-Henderson Smoothing out there because it is illustrated very well and focuses on the key points 🙏 |
- renamed all variables in `utils.banded_linalg` and `utils.finite_differences` and the respective tests to be more concise
- renamed all variables and function of the `_whittaker_base` and all related tests to make them more concise - removed `combination` in `pytest.mark.parametrize` in favour of individual variables which are more concise - made error checks explicit for error messages and not only the Exception type to avoid unintended error behaviour
- added error message pattern matching to the tests for models and input check utility functions
- renamed `test_for_utils` to `tests_for_utils`
added a guideline for proper use of `pytest`
added line on why tests for error handling are mandatory
|
@paucablop |
This pull request primariliy tackles issue #44, but it does not fully close it (see the second point of 🚶 Next Steps).
It should be squashed before merging because it's more than 100 commits.
🏗️ Main Feature Changes
🧑💻 Implementations
WhittakerSmooth,AirPls, andArPlsand transitioning to the more appropriate LAPACK banded storage format. Why? BecauseD.T @ Dthan what can be achieved via a sparse matrix, even though the logic becomes a bit more elaborate because symmetry has to be exploited while the redundant computations that make up most of the computation time have to be avoided.WhittakerSmooth,AirPls, andArPlssince they all rely on the same base algorithm and only differ in weighting strategies. Now, they all inherit from the classchemotools.utils._whittaker_base.main.WhittakerLikeSolverthat handles all the underlying math once in a centralized place. The only thing the 3 transformer classes add now is a customized weighting strategy. Each of the classes uses different access points to the solver depending on the checks/preprocessing they need to conduct.pentapy-support that will check for availability ofpentapyat runtime and use its high-performance solver for all scenarios where the differences order is 2 (see ⏱️ Timings).dtype=int#87) .⏱️ Timings
In summary, the speedup with the minimum set of dependencies is ~5x for all algorithms. However, when
pentapyis used, the speedup can be up to 15x. Since it is used for difference order 2 and this is the standard use case, this is quite some gain.Yet, rust-based implementations seem to be even faster, so we definitely did not reach the limit here.
🌊
WhittakerSmoothwith difference order 1Speedup of ~5 to 6 times

🌊
WhittakerSmoothwith difference order 2Without

pentapy- Speedup of ~5 timesWith

pentapy- Speedup of ~5 to 15 times🏴☠️
ArPlsWithout

pentapy- Speedup of ~4 timesWith

pentapy- Speedup of ~5 to 15 times🛩️
AirPlswith polynomial order 1Speedup of ~12 to 5 times

🛩️
AirPlswith polynomial order 2Speedup of ~12 to 5 times

With

pentapy- Speedup of ~10 to 15 times🚶 Next Steps
lam-values are no longer possible. Many Whittaker smoother implementations out there suffer from this, but this is something that should be tackled by, e.g., also invoking banded QR-decomposition.🎁 Additional features
Given that this was a lot of refactoring, the chance was used to enrich the
WhittakerSmoothbysample_weightkeyword argument totransformandfit_transformto allow for locally weighting datapoints depending on their noise level. Basically, this makes theWhittakerSmoothen par withArPlsandAirPlswhich were already able to pass weights.lamvia maximization of the log marginal likelihood (same approach as for sklearn's GaussianProcessRegressor).chemotools.smooth.estimate_noise_stddevorchemotools.utils.estimate_noise_stddev).lamviachemotools.smooth.WhittakerSmoothLambdasimilar to SciPy's Bounds for specifying the bounds of parameters during optimizationschemotools.utils._banded_linalg).📦📂 Package structure
settings.jsonin the.vscode-folder was removed from the GIT version control. Having a file that can overwrite the user's local settings (which might contain much more than just formatting and linting settings) can be quite destructive. It was replaced by asettings_template.jsonthat can provide the basic setup for the user.requirements.txtwere split into arequirements.txtfor the main package capabilities and arequirements-dev.txtfor the dependencies needed during development. Migrate to pyptoject.toml #53 will profit from this, since then one can simply point to therequirements.txtfrompyproject.tomlwithout having to worry about the user accidently installingpytest,matplotlib, etc.Ruffwas configured in thepyproject.toml(requiressettings.jsonto haveRuffconfigured). It reveals some unused imports, wrong type hints, and non-pythonic statements that should be tackled in the future.✅❌ Tests
pytest-xdist, the tests can now be run in parallel which saves quite some time. The command I always used for running the tests ispytest --cov=chemotools .\tests -n=auto --cov-report html -xpytest.mark.parametrize, the tests were extended by running the same test on multiple input combinations. This was especially useful for transformer functionality and numerics tests.🪤 Miscellaneous