Optimization: Connected Components Algorithm

# Description

The function `src/distance/edge.hpp::found::ConnectedComponentsAlgorithm` takes a long time to run on a standard image. This is partly due to:
1. Use of standard template objects (`std::map`, `std::unordered_map`, etc.) that both allocate heap data AND also do complex things with it.
2. The nature of the return type, which is an `std::vector` whose size does not need to be added to after.
3. Each element of the return type is a group of connected pixels. It is a `struct` that contains:
    a) The "top-left" and "bottom-right", or the pixels that define a box that contains all the pixels in the group. These are 2 Vec2, which is inefficient because all pixel locations can be defined via `uint64_t` as the x and y coordinates, and not `decimal` (Floating Point types take a lot more math to store and do calculations around).
    b) An `unordered_set` (hash set) of all the pixels as `Vec2`. Same inefficiency.

> [!NOTE]
> For context, `componentPoints` used to be a `std::map` that mapped from pixel index to its group number (L). Of course, not every pixel will have a value (because not all pixels belong to a group), but by instead using a size-constant heap array as the map (much bigger than the internal memory used by `std::map`, the runtime decreased by half, even though memory usage went up.

# Potential Solutions

## 1) Standard Template Objects

At the beginning of the algorithm we define 3 types:
```C++
Components ConnectedComponentsAlgorithm(const Image &image, std::function<bool(uint64_t, const Image &)> Criteria) {
    // Step 0: Setup the Problem
    std::unordered_map<int, Component> components;
    std::unordered_map<int, int> equivalencies;
    std::unique_ptr<int[]> componentPoints(new int[image.width * image. Height]{});
    ...
}
```
1. `components`: Maps from group number to group (from return type)
2. `equivalencies`: Maps group numbers to equivalent group numbers.
3. `componentPoints`: Maps from pixel index to group number (already optimized).

The first two fields can be optimized. You will notice that their maximum capacity would be the number of possible groups, and the key is always the group number (at the moment, the group number starts at 1, but that can always be changed to be zero based. I don't recommend changing this so you take advantage of zero initialization). Thus, you can convert both fields to arrays + size trackers if you know their maximum capacity. You may conduct a proof on this, but the maximum size would be the ceiling of image area / 4, or

$$\left\lceil \frac{wh}{4} \right\rceil$$

where $w$ and $h$ are the image width and height.

> [!TIP]
> You don't need to invoke `std::ceil` for this. Think of how you can natively use `/` and `%` operators together.

## 2) Returning an `std::vector` with non-increasing size

This eliminates need for a vector really, and if the user doesn't change the size of the array, we should just return a struct with the array and its size (still has to be heap-allocated). This could be integrated with the previous step!

## 3) Return Type's Element Type

A few optimizations can be performed here:
1. Make an internal `struct` definition for a `Vec2` (do not call it that) that uses `uint32_t` as the underlying type and substitute it throughout the return type. Also provide the maximum size to `std::unordered_set` (the image's area) via `std::unordered_set::reserve(size_t count)`
2. Eliminate the `std::unordered_set` from the return type and just return the `std::move`ed `componentPoints` along with the metadata on each group. You will need to modify the return type in this case to be a `struct`, and properly set equivalent groups in a parallel manner that's being done in the algorithm within the `componentPoints`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimization: Connected Components Algorithm #20

Description

Potential Solutions

1) Standard Template Objects

2) Returning an `std::vector` with non-increasing size

3) Return Type's Element Type

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimization: Connected Components Algorithm #20

Description

Description

Potential Solutions

1) Standard Template Objects

2) Returning an std::vector with non-increasing size

3) Return Type's Element Type

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2) Returning an `std::vector` with non-increasing size