Skip to content

Optimization: Connected Components Algorithm #20

@nguy8tri

Description

@nguy8tri

Description

The function src/distance/edge.hpp::found::ConnectedComponentsAlgorithm takes a long time to run on a standard image. This is partly due to:

  1. Use of standard template objects (std::map, std::unordered_map, etc.) that both allocate heap data AND also do complex things with it.
  2. The nature of the return type, which is an std::vector whose size does not need to be added to after.
  3. Each element of the return type is a group of connected pixels. It is a struct that contains:
    a) The "top-left" and "bottom-right", or the pixels that define a box that contains all the pixels in the group. These are 2 Vec2, which is inefficient because all pixel locations can be defined via uint64_t as the x and y coordinates, and not decimal (Floating Point types take a lot more math to store and do calculations around).
    b) An unordered_set (hash set) of all the pixels as Vec2. Same inefficiency.

Note

For context, componentPoints used to be a std::map that mapped from pixel index to its group number (L). Of course, not every pixel will have a value (because not all pixels belong to a group), but by instead using a size-constant heap array as the map (much bigger than the internal memory used by std::map, the runtime decreased by half, even though memory usage went up.

Potential Solutions

1) Standard Template Objects

At the beginning of the algorithm we define 3 types:

Components ConnectedComponentsAlgorithm(const Image &image, std::function<bool(uint64_t, const Image &)> Criteria) {
    // Step 0: Setup the Problem
    std::unordered_map<int, Component> components;
    std::unordered_map<int, int> equivalencies;
    std::unique_ptr<int[]> componentPoints(new int[image.width * image. Height]{});
    ...
}
  1. components: Maps from group number to group (from return type)
  2. equivalencies: Maps group numbers to equivalent group numbers.
  3. componentPoints: Maps from pixel index to group number (already optimized).

The first two fields can be optimized. You will notice that their maximum capacity would be the number of possible groups, and the key is always the group number (at the moment, the group number starts at 1, but that can always be changed to be zero based. I don't recommend changing this so you take advantage of zero initialization). Thus, you can convert both fields to arrays + size trackers if you know their maximum capacity. You may conduct a proof on this, but the maximum size would be the ceiling of image area / 4, or

$$\left\lceil \frac{wh}{4} \right\rceil$$

where $w$ and $h$ are the image width and height.

Tip

You don't need to invoke std::ceil for this. Think of how you can natively use / and % operators together.

2) Returning an std::vector with non-increasing size

This eliminates need for a vector really, and if the user doesn't change the size of the array, we should just return a struct with the array and its size (still has to be heap-allocated). This could be integrated with the previous step!

3) Return Type's Element Type

A few optimizations can be performed here:

  1. Make an internal struct definition for a Vec2 (do not call it that) that uses uint32_t as the underlying type and substitute it throughout the return type. Also provide the maximum size to std::unordered_set (the image's area) via std::unordered_set::reserve(size_t count)
  2. Eliminate the std::unordered_set from the return type and just return the std::moveed componentPoints along with the metadata on each group. You will need to modify the return type in this case to be a struct, and properly set equivalent groups in a parallel manner that's being done in the algorithm within the componentPoints.

Metadata

Metadata

Assignees

Labels

PerformanceImproves code by optimizing memory usage or speedgood first issueGood for newcomers

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions