Skip to content

Fix thread safety and collision resolution in Finding fingerprint generation#188

Draft
Copilot wants to merge 3 commits intomasterfrom
copilot/fix-spelling-error-in-warnings
Draft

Fix thread safety and collision resolution in Finding fingerprint generation#188
Copilot wants to merge 3 commits intomasterfrom
copilot/fix-spelling-error-in-warnings

Conversation

Copy link

Copilot AI commented Feb 13, 2026

The Finding class had thread-unsafe fingerprint generation and a collision resolution algorithm that could loop indefinitely. Also fixes a typo in error messaging and documents the magic string prefix.

Changes

  • Thread safety: Wrap fingerprint generation in threading.Lock to protect shared class variable from concurrent access
  • Collision resolution: Replace hash-chaining approach with counter-based iteration (max 1000 attempts) to prevent infinite loops and improve determinism
  • Magic string documentation: Convert "/?" to named constant SPECIAL_QUERY_PREFIX with explanation
  • Spelling fix: Correct 'abolute' → 'absolute' in path conversion error message

Before/After

# Old: Could loop forever, not thread-safe
new_hash = self._md5_hash(str(hashable_string).encode("utf-8"))
while new_hash in self.fingerprints and self.fingerprints[new_hash] != self:
    new_hash = self._md5_hash(f"{hashable_string}{new_hash}".encode())
type(self).fingerprints[new_hash] = self

# New: Bounded iterations, thread-safe
with type(self)._fingerprints_lock:
    new_hash = self._md5_hash(hashable_string.encode("utf-8"))
    attempt = 1
    max_attempts = 1000
    while new_hash in self.fingerprints and self.fingerprints[new_hash] != self:
        if attempt > max_attempts:
            raise RuntimeError(f"Unable to generate unique fingerprint after {max_attempts} attempts")
        new_hash = self._md5_hash(f"{hashable_string}{attempt}".encode("utf-8"))
        attempt += 1
    type(self).fingerprints[new_hash] = self

Note: Fingerprint values will change due to algorithm improvements. Test fixtures updated accordingly.

Original prompt
Please apply the following diffs and create a pull request.
Once the PR is ready, give it a title based on the messages of the fixes being applied.

[{"message":"Spelling error: 'abolute' should be 'absolute'.","fixFiles":[{"filePath":"src/mlx/warnings/code_quality.py","diff":"diff --git a/src/mlx/warnings/code_quality.py b/src/mlx/warnings/code_quality.py\n--- a/src/mlx/warnings/code_quality.py\n+++ b/src/mlx/warnings/code_quality.py\n@@ -74,7 +74,7 @@\n                 try:\n                     path = path.relative_to(Path.cwd())\n                 except ValueError as err:\n-                    raise ValueError(\"Failed to convert abolute path to relative path for Code Quality report: \"\n+                    raise ValueError(\"Failed to convert absolute path to relative path for Code Quality report: \"\n                                      f\"{err}\") from err\n             self._path = str(path)\n         else:\n"}]},{"message":"Class variable `fingerprints` is shared across all instances and not thread-safe. In multi-threaded environments, concurrent access to this dictionary could lead to race conditions during fingerprint generation and storage.","fixFiles":[{"filePath":"src/mlx/warnings/code_quality.py","diff":"diff --git a/src/mlx/warnings/code_quality.py b/src/mlx/warnings/code_quality.py\n--- a/src/mlx/warnings/code_quality.py\n+++ b/src/mlx/warnings/code_quality.py\n@@ -1,6 +1,7 @@\n import hashlib\n from pathlib import Path\n import sys\n+import threading\n \n \n class Finding:\n@@ -8,6 +9,7 @@\n \n     \"\"\"A dictionary mapping fingerprints (unique hashes) to instances of Finding\"\"\"\n     fingerprints = {}\n+    _fingerprints_lock = threading.Lock()\n \n     _severity = \"info\"\n     _path = \"\"\n@@ -33,10 +35,11 @@\n         4. Step 3 is repeated until a unique hash is obtained.\n         \"\"\"\n         hashable_string = f\"{self.severity}{self.path}{self.description}\"\n-        new_hash = self._md5_hash(str(hashable_string).encode(\"utf-8\"))\n-        while new_hash in self.fingerprints and self.fingerprints[new_hash] != self:\n-            new_hash = self._md5_hash(f\"{hashable_string}{new_hash}\".encode())\n-        type(self).fingerprints[new_hash] = self\n+        with type(self)._fingerprints_lock:\n+            new_hash = self._md5_hash(str(hashable_string).encode(\"utf-8\"))\n+            while new_hash in self.fingerprints and self.fingerprints[new_hash] != self:\n+                new_hash = self._md5_hash(f\"{hashable_string}{new_hash}\".encode())\n+            type(self).fingerprints[new_hash] = self\n         return new_hash\n \n     @property\n"}]},{"message":"The collision resolution algorithm has O(n) time complexity in worst case and could create infinite loops if the hash space is exhausted. Consider using a more robust approach like including a counter or timestamp in the hash generation.","fixFiles":[{"filePath":"src/mlx/warnings/code_quality.py","diff":"diff --git a/src/mlx/warnings/code_quality.py b/src/mlx/warnings/code_quality.py\n--- a/src/mlx/warnings/code_quality.py\n+++ b/src/mlx/warnings/code_quality.py\n@@ -29,13 +29,23 @@\n         1. A base string is created by concatenating the severity, path, and description of the finding.\n         2. The base string is hashed to produce an initial hash.\n         3. If the hash already exists (hash collision), a new hash is generated by concatenating the base string\n-           with the current hash and hashing the result.\n-        4. Step 3 is repeated until a unique hash is obtained.\n+           with an incrementing counter and hashing the result.\n+        4. Step 3 is repeated until a unique hash is obtained or a maximum number of attempts is reached.\n         \"\"\"\n         hashable_string = f\"{self.severity}{self.path}{self.description}\"\n-        new_hash = self._md5_hash(str(hashable_string).encode(\"utf-8\"))\n+        # Initial hash based solely on the base string.\n+        new_hash = self._md5_hash(hashable_string.encode(\"utf-8\"))\n+        # Resolve collisions deterministically by appending a counter to the base string.\n+        attempt = 1\n+        max_attempts = 1000\n         while new_hash in self.fingerprints and self.fingerprints[new_hash] != self:\n-            new_hash = self._md5_hash(f\"{hashable_string}{new_hash}\".encode())\n+            if attempt > max_attempts:\n+                raise RuntimeError(\n+                    f\"Unable to generate a unique fingerprint after {max_attempts} attempts \"\n+                    f\"for finding with description {self.description!r}.\"\n+                )\n+            new_hash = self._md5_hash(f\"{hashable_string}{attempt}\".encode(\"utf-8\"))\n+            attempt += 1\n         type(self).fingerprints[new_hash] = self\n         return new_hash\n \n"}]},{"message":"The magic string '/?' is unexplained and unclear. This appears to be a special prefix but lacks documentation explaining its purpose and meaning.","fixFiles"...

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 13, 2026 15:29
…on, and documentation

Co-authored-by: Letme <3614407+Letme@users.noreply.github.com>
…algorithm

Co-authored-by: Letme <3614407+Letme@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix spelling error in code_quality.py Fix thread safety and collision resolution in Finding fingerprint generation Feb 13, 2026
Copilot AI requested a review from Letme February 13, 2026 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants