Skip to content

coord2ind is memory hungry #521

@jefferis

Description

@jefferis

@AlexFragniere reported that she could not use coord2ind with a data frame of about 100M synaptic locations on her machine due to an out of memory issue. Looking at the code it is memory hungry in some quite unnecessary ways.

  • it produces numeric output for the indices (integers should be fine most of the time and would occupy half the memory)
  • it always copies the input coordinates into a new matrix, when it could perfectly well operate on each input column
  • It twice carries out double transpose operations on the coordinate matrix to simplify calculations such a subtracting an origin (which can use simple arithmetic when operating on a 3 x N matrix rather than an N x 3 matrix). Unfortunately this results in 4 memory copies!

A reasonable approach appears to be to use the findInterval function on each column of the incoming data (avoiding 5 copies to double as a result) to produce an Nx3 integer array. Then that integer array could be converted to linear coordinates when requested; for this operation it might be necessary to check if these might exceed .Machine$integer.max i.e. 2147483647 on current R and use a double if necessary.

This change would actually be fairly straightforward were it not for maintaining backwards compatibility with the Clamp and CheckRanges arguments which are designed to deal with coordinates that might map outside the image volume. I'm not sure if they are actually that well implemented currently but some code somewhere might depend on the current implementation ... It may be necessary to add some additional tests before proceeding.

Finally for input data with many coordinates where the desired output data type is linear (1D) coordinates it might have made sense to operate in chunks since the process of generating an Nx3 integer coordinate array will require a big chunk of memory that will then be released. But I think that might have to wait for another day.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions