Skip to content

Conversation

@yannham
Copy link
Member

@yannham yannham commented Jul 23, 2025

Migration to the Compact Value Representation

This PR refactors the Nickel codebase to use the new compact value introduced in #2282, and improved in subsequent PRs. Unfortunately, this representation is used everywhere, so the migration is huge. I tried to separate independent parts of it in some other PRs, but I quickly reached a point where things were too coupled to do that meaningfully. I'll first describe the main changes, and then a review guide.

Impact

On two stages/sizes of the same real-life code base, I got a consistent improvement of runtime between 20% and 25%. This is not as drastic as I could hope, but the compact values are an enabling milestone for other memory optimizations; see Follow-ups.

On the memory side, a run with Valgrind on the small version of the codebase (takes a few second to run in release mode) shows that the total number of allocated bytes drops by around 33%. Those are very preliminary results, just to give an order of magnitude; more serious benchmarking is due.

Content

NickelValue

Using NickelValue in many places led to changes and improvements of its interface, as I was using it more and more.

ValueContent & Lenses

A new module value::lens, as well as ValueContent types, allow to conditionally take ownership of the content of a NickelValue. Albeit the usefulness is probably limited at runtime, where we tend to duplicate Rcs a lot and thus not really take advantage of avoiding clones, this is useful during transformation time, where all Rcs are 1-counted, which avoids duplicating the whole AST. This interface might also provide more gains in the future, once we have a proper bytecode compiler, where environments are handled quite differently and much less duplicated.

Position indices

The original design had two different variant for position indices, PosIdx and InlinePosIdx. The idea is that in inline values, there's not a lot of available space (if we want to keep them pointer-sized), so the position index is encoded on 32 bits. But in the block, there's plenty of space, so we can use a proper usize, that is usually 64 bits. However, this was causing a lot of trouble, when inheriting positions during evaluation, because if you need to set the position of an inline value which is inherited from a block, you need to have a mutable access to the position table to allocate a new inline index with the same content, just so that it fits in 32 bits. We decided internally to encode everything on 32bits, and to extend it accordingly (there's still space in inline values) if need.

XXXBody wrappers

Any data that goes in ValueBlockRc is wrapped as a struct XXXBody, which is just a wrapper around the actual data (RecordData, ForeignId, etc.). Those were introduced initially because I wanted to control the size and alignment of what goes into a block, using #[repr(_)]. This plan changed for different reasons, and it was annoying to match on those structs, or to use .0 everywhere. I got rid of them and replaced them with simple type aliases (which are named XXXData now instead).

PosTable ownership

The position table needs to be threaded through many stages of the pipeline now, and included in errors. Most of this state has been moved into VmContext, that VM instances borrow from. Implemented separately in #2381.

In general, the presence of the pos table imposes to better separate the pure AST phase from the NickelValue phase, if we don't want to require pos tables where they should not be needed (typically during AST import resolution). This has been achieved for cache and typechecking errors in #2359 and #2361. This has been continued in this PR with a leftover, namely wildcards, that were stilll RichTerm-only.

Migration to edition 2024

I think this has been a mistake, but when I realized it, it was too late to disentangle the changes. I moved to edition 2024 to get let-chains, and this unfortunately pollutes the diff, because edition 2024 is stricter with a number of things, resulting in unrelated diffs (around unsafe and ref patterns in particular). Formatting also has changed. I'll format the remaining files with cargo fmt after the review, to avoid additional unrelated diff.

Review

The diff is huge. I think a lot of the changes are almost mechanical, and don't necessarily deserve deep attention - typically in eval::operation or eval::merge, although the refactoring is not entirely trivial either. The changes to nickel-lang-cli, nickel-lang-lsp, and various tests modules and the tests crate should be really mechanical.

Modules where more substantial modifications were done, that should be reviewed in priority:

  • eval::value (originally bytecode::value)
  • nickel-lang facade and the C API
  • cache
  • serialize
  • term

Follow-ups

NickelValue

  • use a pointer instead of usize for the data field to preserve provenance.

primops

When migrating eval::operation, there were a lot of micro decisions to make around when to take something by reference or to try to extract it "by value" (i.e. like Rc::try_unwrap). At some point I tended to maintain the old behavior, but I think a new pass is deserved, to decide when it is worth trying to avoid copying. Same thing in the main eval module, where we currently take by reference and clone when needed.

Term

  • move term::record to value::record. I think it doesn't make much sense now to have record data in the term module.
  • move CustomContract to Term. I kept function-like stuff in Term, because in the future call-by-push-value VM, functions are computations, and not values. But for some reason CustomContract is a NickelValue.
  • get rid of Term::Value. I introduced it because I thought I would need both direction of the map Term <-> NickelValue, but since NickelValue is the entry point everywhere, we only ever need to wrap a term as a value, and not the other way round.

Other memory savings

  • reduce the size of Term by boxing large arguments
  • reduce the size of stack elements by reworking the layout of stack::Marker
  • reduce the size of errors by boxing the actual variant

@yannham yannham force-pushed the rfc007/migrate-term-to-value branch from b9ebd1b to 01c0f94 Compare July 25, 2025 10:02
@yannham yannham force-pushed the rfc007/migrate-term-to-value branch from 01c0f94 to 488881e Compare August 26, 2025 09:21
@yannham yannham force-pushed the rfc007/migrate-term-to-value branch from 9f948c7 to 8102183 Compare September 8, 2025 08:30
@yannham yannham force-pushed the rfc007/migrate-term-to-value branch from d8edff9 to 03c6a19 Compare September 17, 2025 10:31
@yannham yannham force-pushed the rfc007/migrate-term-to-value branch 4 times, most recently from 65a5b91 to 5952c73 Compare September 24, 2025 08:09
@yannham yannham force-pushed the rfc007/migrate-term-to-value branch from 7e40ccc to f65039f Compare October 3, 2025 16:53
@yannham yannham force-pushed the rfc007/migrate-term-to-value branch from cafc083 to 833b5ec Compare October 14, 2025 16:21
@yannham yannham force-pushed the rfc007/migrate-term-to-value branch 2 times, most recently from ce6badd to 6805e21 Compare October 17, 2025 14:54
@yannham yannham force-pushed the rfc007/migrate-term-to-value branch from 5920bf7 to 3fa44f7 Compare October 23, 2025 17:24
@yannham yannham force-pushed the rfc007/migrate-term-to-value branch 2 times, most recently from 3b7ae11 to 66b2be4 Compare October 30, 2025 17:51
@yannham yannham force-pushed the rfc007/migrate-term-to-value branch from 5999b59 to c4d37a8 Compare November 3, 2025 13:21
Copy link
Member

@jneem jneem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A bunch of little comments and questions, but nothing blocking.

@yannham yannham force-pushed the rfc007/migrate-term-to-value branch from ba9e448 to 58f4bf9 Compare November 4, 2025 16:41
@yannham yannham added this pull request to the merge queue Nov 4, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

🐰 Bencher Report

Branchrfc007/migrate-term-to-value
Testbedubuntu-latest
Click to view all benchmark results
BenchmarkLatencymicroseconds (µs)
diagnostics-benches/inputs/goto-perf.ncl📈 view plot
🚷 view threshold
10,997.00 µs
diagnostics-benches/inputs/large-record-tree.ncl📈 view plot
🚷 view threshold
187,100.00 µs
diagnostics-benches/inputs/redis-replication-controller.ncl📈 view plot
🚷 view threshold
301.12 µs
diagnostics-benches/inputs/small-record-tree.ncl📈 view plot
🚷 view threshold
429.69 µs
fibonacci 10📈 view plot
🚷 view threshold
212.77 µs
foldl arrays 50📈 view plot
🚷 view threshold
636.78 µs
foldl arrays 500📈 view plot
🚷 view threshold
14,536.00 µs
foldr strings 50📈 view plot
🚷 view threshold
3,710.60 µs
foldr strings 500📈 view plot
🚷 view threshold
33,640.00 µs
generate normal 250📈 view plot
🚷 view threshold
44,406.00 µs
generate normal 50📈 view plot
🚷 view threshold
1,237.40 µs
generate normal unchecked 1000📈 view plot
🚷 view threshold
41,643.00 µs
generate normal unchecked 200📈 view plot
🚷 view threshold
1,844.60 µs
init-diagnostics-benches/inputs/goto-perf.ncl📈 view plot
🚷 view threshold
52,154.00 µs
init-diagnostics-benches/inputs/large-record-tree.ncl📈 view plot
🚷 view threshold
205,490.00 µs
init-diagnostics-benches/inputs/redis-replication-controller.ncl📈 view plot
🚷 view threshold
50,058.00 µs
init-diagnostics-benches/inputs/small-record-tree.ncl📈 view plot
🚷 view threshold
49,886.00 µs
pidigits 100📈 view plot
🚷 view threshold
1,936.60 µs
pipe normal 20📈 view plot
🚷 view threshold
706.29 µs
pipe normal 200📈 view plot
🚷 view threshold
5,968.50 µs
product 30📈 view plot
🚷 view threshold
347.95 µs
requests-benches/inputs/goto-perf.ncl-000📈 view plot
🚷 view threshold
3,014.00 µs
requests-benches/inputs/large-record-tree.ncl-000📈 view plot
🚷 view threshold
575,950.00 µs
requests-benches/inputs/large-record-tree.ncl-001📈 view plot
🚷 view threshold
87.32 µs
scalar 10📈 view plot
🚷 view threshold
592.63 µs
sum 30📈 view plot
🚷 view threshold
349.64 µs
🐰 View full continuous benchmarking report in Bencher

Merged via the queue into master with commit 59095e0 Nov 4, 2025
5 checks passed
@yannham yannham deleted the rfc007/migrate-term-to-value branch November 4, 2025 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants