Fix race condition in PrometheusSink during concurrent metric updates #191

fridrik01 · 2025-10-16T14:46:25Z

Description

The PrometheusSink had a race condition when multiple goroutines concurrently created the same metric for the first time. Each goroutine would create its own separate Prometheus counter/gauge/summary instance, increment it, and then store it using Store(). The last write would win, but all other metric instances with their accumulated values would be lost.

This fix uses LoadOrStore() instead of Store() to ensure only one metric instance exists per key, then updates that single instance regardless of which goroutine won the creation race.

Related Issue

N/A (discovered during testing/usage)

How Has This Been Tested?

Added TestPrometheusRaceCondition which spawns multiple goroutines that concurrently increment the same counter and verifies the final value matches the expected total
The test fails on the original code (demonstrating the race) and passes with the fix

Before the fix:

go test -v ./prometheus -run TestPrometheusRaceCondition$ -count=1
=== RUN   TestPrometheusRaceCondition
    prometheus_race_test.go:63: Race condition detected: got 152 want 200
--- FAIL: TestPrometheusRaceCondition (0.00s)
FAIL
FAIL    github.com/hashicorp/go-metrics/prometheus      0.253s
FAIL

With this fix applied the test always passes

This test demonstrates a race condition that occurs when PrometheusSink is used from multiple goroutines concurrently, resulting in missed counter updates. The test spawns multiple goroutines that increment the same counter simultaneously and verifies that the final value matches the expected total. Example: go test -v ./prometheus -run TestPrometheusRaceCondition$ -count=1 === RUN TestPrometheusRaceCondition prometheus_race_test.go:63: Race condition detected: got 152 want 200 --- FAIL: TestPrometheusRaceCondition (0.00s) FAIL FAIL github.com/hashicorp/go-metrics/prometheus 0.253s FAIL

Fixes a race condition in PrometheusSink where concurrent updates to the same metric from multiple goroutines could result in missed updates or incorrect values. The previous copy-on-write pattern was racy because multiple goroutines could read the old value, create separate copies with their updates, and overwrite each other when storing back to the sync.Map. This change adds a sync.RWMutex to each gauge, summary, and counter struct to protect concurrent access to the underlying Prometheus metric and the updatedAt timestamp. The fix also uses LoadOrStore instead of Load+Store to properly handle creation races when multiple goroutines attempt to create the same metric simultaneously.

hashicorp-cla-app · 2025-10-16T14:46:40Z

All committers have signed the CLA.

Also updated the prometheus_race_test.go to be in line with other tests.

fridrik01 added 2 commits October 16, 2025 14:22

fridrik01 requested a review from a team as a code owner October 16, 2025 14:46

Replace mutex based synchronization with lock free atomic operation

e6ba394

Also updated the prometheus_race_test.go to be in line with other tests.

fridrik01 mentioned this pull request Oct 27, 2025

[Bug] Race Condition in hashicorp/go-metrics PrometheusSink cosmos/cosmos-sdk#25499

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix race condition in PrometheusSink during concurrent metric updates #191

Fix race condition in PrometheusSink during concurrent metric updates #191

fridrik01 commented Oct 16, 2025 •

edited

Loading

Uh oh!

hashicorp-cla-app bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix race condition in PrometheusSink during concurrent metric updates #191

Are you sure you want to change the base?

Fix race condition in PrometheusSink during concurrent metric updates #191

Conversation

fridrik01 commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

How Has This Been Tested?

Uh oh!

hashicorp-cla-app bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fridrik01 commented Oct 16, 2025 •

edited

Loading

hashicorp-cla-app bot commented Oct 16, 2025 •

edited

Loading