Releases: llm-d/llm-d-kv-cache-manager
Releases · llm-d/llm-d-kv-cache-manager
v0.4.0
Highlights
- Unified chat-template rendering interface across the different tokenizers (fixes a BOS token duplication bug), performance improvements and code improvements on templating path by @hyeongyun0916
- Logger fix by @hyeongyun0916
- Tiered prefix-cache scoring by @Jay-Pd
- UDS-tokenizer service by @delavet and @osswangxining
- Tokenizer from local disk enablement by @pierDipi
- Valkey kvblock backende by @rishi-jat
- General enhancements by: @zhengkezhou1 @Frapschen @my-git9 @samzong @samber @kyanokashi
What's Changed
- Fix examples precalculated hashes by @vMaroon in #141
- feat: make kvcache.Indexer as gRPC Service by @zhengkezhou1 in #109
- Fix: division by zero in metrics logging by @yankay in #145
- feat: Add Valkey and RDMA support for KV-cache indexing by @rishi-jat in #139
- feat: Add UDS-based external tokenizer service by @delavet in #137
- Fix LookupHits metrics not work by @Frapschen in #146
- [feat] Add tolerations for chart by @my-git9 in #149
- [misc] valkey doc repositioning by @vMaroon in #154
- Update README.md to enhance clarity of flowchart labels and descriptions by @samzong in #157
- perf(prefixstore): sync.Map are much faster in read-intensive applications by @samber in #156
- Add support for local tokenizer files by @pierDipi in #142
- Add @delavet as /services/uds_tokenizer owner by @vMaroon in #164
- Add @osswangxining as /services/uds_tokenizer owner by @vMaroon in #168
- Implementation for Tiering in KV-Cache-Manager by @Jay-Pd in #150
- fix: rename LookupHits metric to MaxHitsPerPod to better reflect what's tracked by @kyanokashi in #160
- fix online example chart format by @delavet in #171
- Minor fix for KV Device Tier by @Jay-Pd in #172
- [Fix] Ensure Correct Logger Usage by Replacing
klog.FromContextwithlog.FromContextby @hyeongyun0916 in #169 - refactor(tokenizer): Unify interface for RenderChatTemplate and eliminate object creation overhead by @hyeongyun0916 in #163
- Minor logger fix by @vMaroon in #173
- General refactoring for v0.4.0 by @vMaroon in #174
New Contributors
- @rishi-jat made their first contribution in #139
- @delavet made their first contribution in #137
- @Frapschen made their first contribution in #146
- @samzong made their first contribution in #157
- @samber made their first contribution in #156
- @pierDipi made their first contribution in #142
- @Jay-Pd made their first contribution in #150
- @kyanokashi made their first contribution in #160
- @hyeongyun0916 made their first contribution in #169
Full Changelog: v0.3.2...v0.4.0
v0.3.2
What's Changed
- Bump helm-chart Image by @vMaroon in #66
- Doc Enhancements by @vMaroon in #73
- Update LICENSE by @vMaroon in #74
- Fix README Diagram by @vMaroon in #75
- Enhance README Diagram Clarity by @vMaroon in #78
- Fix kv_events offline example by @irar2 in #82
- fix: Redis kvblock parsing bugs and add basic unit tests by @yankay in #80
- fix: correct shell command substitution syntax in Makefile by @yankay in #81
- Optimized chat completions library, build support and testing infrastructure by @guygir in #79
- Remove redundant keys return from Index.Lookup interface by @sagiahrac in #84
- KVEvents/others minor refactoring by @vMaroon in #88
- Add InMemoryIndex unit tests by @sagiahrac in #86
- Add instrumentedIndex basic unit tests by @sagiahrac in #87
- docs: fix mermaid chart arrow syntax by @Zerohertz in #93
- Chat-Completions Enhancements: Updated Examples + Code Improvements by @guygir in #92
- Tokenization unit tests by @sagiahrac in #90
- feat: Add Synchronous Tokenization Support to Tokenization Pool by @sagiahrac in #95
- [CI]: added some index-related test cases while refactoring the test code to be more concise. by @yankay in #102
- [docs] Update KV-Events and KV-Cache examples with correct paths and commands by @yankay in #106
- Add Prow GitHub Actions by @Jooho in #117
- fix: Modified the download url of libtokenizers.darwin-x86_64.tar.gz by @WillardHu in #110
- Update code-ownership files to best utilize PROW + auto assign by @vMaroon in #121
- CI: Expand LRUStore Unit Tests for Partial and Prefix Matches by @yankay in #120
- fix: remove OWNERS_ALIASES and update OWNERS by @Jooho in #122
- Implement auto-assign for reviewers without write permissions by @vMaroon in #123
- feat: Add a SliceMapE function for handle errors and add unit tests by @WillardHu in #119
- add benchmark data by @vMaroon in #129
- [feat]support specifying imagePullSecrets for chart by @my-git9 in #130
- Support new KVEvents format by @vMaroon in #132
- Fix indexer behavior when no kvblock-keys are generated by @vMaroon in #118
- add liu-cong as reviewer by @vMaroon in #135
- chore: Fix outdated golangci-lint installation URL by @zhengkezhou1 in #136
- Align with recent vLLM kv-block hashing changes by @vMaroon in #138
New Contributors
- @irar2 made their first contribution in #82
- @yankay made their first contribution in #80
- @guygir made their first contribution in #79
- @sagiahrac made their first contribution in #84
- @Zerohertz made their first contribution in #93
- @Jooho made their first contribution in #117
- @WillardHu made their first contribution in #110
- @my-git9 made their first contribution in #130
Full Changelog: v0.2.1...v0.3.2-rc1
v0.3.1
What's Changed
- Add Prow GitHub Actions by @Jooho in #117
- fix: Modified the download url of libtokenizers.darwin-x86_64.tar.gz by @WillardHu in #110
- Update code-ownership files to best utilize PROW + auto assign by @vMaroon in #121
- CI: Expand LRUStore Unit Tests for Partial and Prefix Matches by @yankay in #120
- fix: remove OWNERS_ALIASES and update OWNERS by @Jooho in #122
- Implement auto-assign for reviewers without write permissions by @vMaroon in #123
- feat: Add a SliceMapE function for handle errors and add unit tests by @WillardHu in #119
- add benchmark data by @vMaroon in #129
- [feat]support specifying imagePullSecrets for chart by @my-git9 in #130
- Support new KVEvents format by @vMaroon in #132
- Fix indexer behavior when no kvblock-keys are generated by @vMaroon in #118
New Contributors
- @Jooho made their first contribution in #117
- @WillardHu made their first contribution in #110
- @my-git9 made their first contribution in #130
Full Changelog: v0.3.0...v0.3.1
v0.3.0
Summary
- OpenAI production ready Chat-Completions preprocessing library
- Synchronous tokenization with caching
- Expanded benchmarking and stronger test coverage
- General code and documentation improvements
What's Changed
- Bump helm-chart Image by @vMaroon in #66
- Doc Enhancements by @vMaroon in #73
- Update LICENSE by @vMaroon in #74
- Fix README Diagram by @vMaroon in #75
- Enhance README Diagram Clarity by @vMaroon in #78
- Fix kv_events offline example by @irar2 in #82
- fix: Redis kvblock parsing bugs and add basic unit tests by @yankay in #80
- fix: correct shell command substitution syntax in Makefile by @yankay in #81
- Optimized chat completions library, build support and testing infrastructure by @guygir in #79
- Remove redundant keys return from Index.Lookup interface by @sagiahrac in #84
- KVEvents/others minor refactoring by @vMaroon in #88
- Add InMemoryIndex unit tests by @sagiahrac in #86
- Add instrumentedIndex basic unit tests by @sagiahrac in #87
- docs: fix mermaid chart arrow syntax by @Zerohertz in #93
- Chat-Completions Enhancements: Updated Examples + Code Improvements by @guygir in #92
- Tokenization unit tests by @sagiahrac in #90
- feat: Add Synchronous Tokenization Support to Tokenization Pool by @sagiahrac in #95
- [CI]: added some index-related test cases while refactoring the test code to be more concise. by @yankay in #102
- [docs] Update KV-Events and KV-Cache examples with correct paths and commands by @yankay in #106
New Contributors
- @irar2 made their first contribution in #82
- @yankay made their first contribution in #80
- @guygir made their first contribution in #79
- @sagiahrac made their first contribution in #84
- @Zerohertz made their first contribution in #93
Full Changelog: v0.2.1...v0.3.0-rc1
v0.2.1
v0.2.0
What's Changed
- Introduced vLLM-Native KV-Events processing and new indexing backends
- In-Memory index (default): KV-Events are digested and stored in memory
- Redis index
- Added observability and real-time Prometheus metrics
- Tracks KV-Block admissions, evictions, lookups and hit-rates
- Enhanced configurability
- Updated integration in llm-d-inference-scheduler (accurate prefix-cache aware scorer)
- Initial support for OpenAI-compatible Chat Completions templating (library)
- Enhanced user examples and end-to-end (vLLM <-> indexer) deployment setup
- General documentation improvements
PRs
- (chore): typo in tokenizer file by @buraksekili in #39
- [KV-Events] Introduce KV-Block Indexing Backends - Part 1 of 3 by @vMaroon in #40
- fix: replace llm-d tag to 0.0.8 by @kfirtoledo in #42
- docs: Add a setup documentation about examples/kv-cache-index by @buraksekili in #38
- [KV-Events] KV-Events Processing - Part 3 of 3 by @vMaroon in #44
- Matched Default TokenProcessorConfig.BlockSize with vLLM's by @vMaroon in #52
- [KVBlock.Index] Prometheus Metrics & Logging by @vMaroon in #53
- Enhance Configurability by @vMaroon in #55
- Update configuration.md by @vMaroon in #56
- Implement Metrics Logging Configuration in Indexer by @vMaroon in #57
- Completions-Support (#50) Extension by @guygir in #58
New Contributors
- @buraksekili made their first contribution in #39
- @guygir made their first contribution in #58
Full Changelog: v0.1.1...v0.2.0-RC1
v0.1.1
What's Changed
- Update OWNERS by @vMaroon in #25
- Update CONTRIBUTING.md by @clubanderson in #27
- Update README.md by @clubanderson in #28
- Update CONTRIBUTING.md by @clubanderson in #35
- Refactor Redis config to use redis.Options struct by @relyt0925 in #37
New Contributors
- @clubanderson made their first contribution in #27
- @relyt0925 made their first contribution in #37
Full Changelog: v0.1.0...v0.1.1