Skip to content

Performance improvement ideas: tracking issue #1484

@yannham

Description

@yannham

Performance is a tricky matter. We want to avoid spending time on implementing changes wishfully improving performance without:

  1. A concrete use-case showing that a performance issue is either an actual problem in practice, or is subpar when compared to other languages/tools
  2. Principled measurements pointing to functions to be improved, and after implementing a performance-related change, confirming the gain

Still, with time, we've spotted many potential points of congestion and candidates for performance improvements. This issue serves to record those insights, and might be used as a starting point when investigating a performance issue or looking for performance improvements for specific parts of the Nickel pipeline.

This issue isn't concerned with big paradigm shifts such as incremental evaluation or moving towards a bytecode compiler + virtual machine approach. Such topics deserve their own issue/notes/RFC/paper. The ideas here are closely tied to the current implementation, and we want to avoid losing them or ignoring them while they could be relevant.

Feel free to add your own, that I'll try to edit into this description.

Interpreter

  • Get rid of most generated variables #1647
  • Cache lazy contract application. Since RFC005, and previous work on lazy array contracts, Nickel basic data structures (arrays and records) now apply contracts lazily. That is, applying an array contract or a record contract just attaches data to those structures, but doesn't evaluate anything. When merging records, contracts are still accumulated lazily, without being applied. Only when a field, or an array element is requested (either directly, or via transformations, such as mapping a function over), the contracts are actually applied. However, we currently have no place to cache this contract application. This means that extracting a field or an array element several times will apply pending contracts as many times, while we could cache this result in a thunk and guarantee that a contract application is only evaluated once. A simple solution would be, for arrays and records, to store a pair of closures (original, cached) for each element, where original corresponds to the initial value without contracts applied, and cached is a fresh thunk (not aliased by anyone else when allocating the array or record) and can serve to store the field with contracts applied.
  • Use contract equality to avoid double contract application. We implemented contract equality (limited, of course, it's undecidable in general) for typechecking purpose. It wouldn't be very hard to re-use this code to decide contract equality at runtime, and thus avoid double-contract application e.g. at record fields (which lazily record contracts in a vector) or arrays. It might even be adapted to handle stand-alone, inline contract applications, but for that we would need to not evaluate annotated term away to a simple contract application, which we do currently (Contract deduplication for records #1646).
  • Use hash-consing to compute contract equality We currently use an AST-based comparison to deduplicate contracts at runtime. This won't be possible anymore when RFC007 is fully implemented (as all we'll have is bytecode). A more performant and bytecode-compatible way would be to hash-cons all the contract ASTs appearing in the code, and perform a simple hash comparison at runtime.
  • Superseded by RFC007 (stack-based environment) Use a De Bruijn indices for the environment. The environment is a linked list of hashmaps, where the linked list mirrors more or less scopes and sharing points. We can make it a big better (in practice, not asymptotically), by using a linked list of arrays and pre-computing the index of each allocated variables, as in De Bruijn indexing (Try using De Bruijn indices and persistent array for the environment #844)
  • Superseded by RFC007 (stack-based environment) Use a persistant HashMap for the environment. Alternative to De Bruijn indices for faster lookups in the environment (Retry using HAMT as Environment if the closurization model changes. #837)

Typechecker

  • Make types shareable. Currently, the AST for typ::Type is using owned pointers (Box). This means that subtypes aren't shared, and unification of say a -> b with T -> U needs to perform a deep copy of both T and U. Possible solutions are to use Rc instead of Box (as we do for terms), or to use an arena, see below.
  • Use an arena to allocate types. Related to the previous item: instead of using Rc, we could also allocate all types in an arena. Types (or, rather, unification types) have currently a well delimited lifetime: they are allocated as part of a statically typed block, and can't escape this block. Thus, an arena would be created when entering a top-level statically typed block (not included in another one), and dropped when exiting this block.
  • Implement a proper union-find data structure, and in particular use direct pointers for unification variables. Currently, unification variables are indices inside a vector. But they could be direct pointers, eliminating an additional indirection, and eliminating the need to thread the unification table everywhere as well.
  • Implement path compression. When resolving a unification variable, remember all the variables encountered while following the link in the unification table/union-finnd structure and sets them all to the final result. Doing so, the next resolution will succeed in only one step.

Program transformation

  • Store direct pointers to nodes to transform. Some transformations, like imports resolution, concerns statistically only very few nodes. Still, we currently traverse the whole term to look for e.g. an import. We could in fact record pointers to imports at parsing time, and process them directly later. The premise isn't entirely true: we have to perform a traversal anyway (in fact, at least two, one bottom-up, and one top-down), for other transformations (such as the free-var transformation or the share normal form, which are probably better to do as a normal traversal). But it might be the case that at least one full traversal could be saved, if we find out that all of the steps of this traversal can be rather performed by recording the nodes to process during parsing instead, which might be attractive if, like import resolution, the set of nodes to process is expected to be small.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions