Skip to content

frequency is inefficient and wrong #337

@treeowl

Description

@treeowl

Correctness

frequency is a little bit wrong because it relies on the sum of the frequencies being representable by an Int, but does not document this. So frequency [(maxBound `quot` 2, x), ((maxBound `quot` 4) * 3, y), (2034, z)] will do something hard to predict. The best fix is likely to use double-word integers in the calculations:

data DW = DW {hi :: !Word, lo :: !Word}

This is almost certainly available in a library somewhere, but it wouldn't be too hard to roll our own if necessary.

Efficiency

The argument list is traversed for every value generated. When the list is long, that's really lousy. The solution is to build a tree representing the input list, making lookup much faster. A simple hack using DW above would use an IntMap of IntMaps and some lookup functions like lookupLE, lookupGE, etc. There are almost certainly better specialized data structures available for the purpose, but I don't know if they'd be worth the trouble.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions