Skip to content

Conversation

@encodeous
Copy link

@encodeous encodeous commented Nov 11, 2025

This PR reduces the amount of heap allocations made in the histogram.Add and histogram.trim functions.

In this library, we have a small constant for maxBins, so it's better to trade some compute over heap allocations.

I ran the following test:

go test -bench="BenchmarkMetrics/timeline/histogram" -benchmem -cpuprofile prof.cpu -memprofile prof.mem -blockprofile prof.block -benchtime 5s

Original (bench):

goos: linux
goarch: amd64
pkg: github.com/zserge/metric
cpu: AMD Ryzen 9 7900X 12-Core Processor
BenchmarkMetrics/timeline/histogram-12           6446358               932.0 ns/op          1702 B/op          1 allocs/op
PASS
ok      github.com/zserge/metric        7.124s

Original (Heap):

go tool pprof -top prof.mem
File: metric.test
Build ID: 852f07a4a74371cd52c1b5b7abf96a86eaa46d4c
Type: alloc_space
Time: 2025-11-11 17:23:48 UTC
Showing nodes accounting for 11.77GB, 100% of 11.77GB total
Dropped 28 nodes (cum <= 0.06GB)
      flat  flat%   sum%        cum   cum%
   11.77GB   100%   100%    11.77GB   100%  github.com/zserge/metric.(*histogram).Add
         0     0%   100%    11.76GB 99.89%  github.com/zserge/metric.(*timeseries).Add
         0     0%   100%    11.76GB 99.89%  github.com/zserge/metric.BenchmarkMetrics.func6
         0     0%   100%    11.76GB 99.89%  testing.(*B).launch
         0     0%   100%    11.76GB 99.89%  testing.(*B).runN

Improved (bench):

goos: linux
goarch: amd64
pkg: github.com/zserge/metric
cpu: AMD Ryzen 9 7900X 12-Core Processor            
BenchmarkMetrics/timeline/histogram-12          12500581               488.9 ns/op             0 B/op          0 allocs/op
PASS
ok      github.com/zserge/metric        6.722s

Improved (heap):

 go tool pprof -top prof.mem
File: metric.test
Build ID: 4163c2636b8c890969f4262c30deec95abf6f7e1
Type: alloc_space
Time: 2025-11-11 17:27:18 UTC
Showing nodes accounting for 6198.37kB, 100% of 6198.37kB total
      flat  flat%   sum%        cum   cum%
    1539kB 24.83% 24.83%     1539kB 24.83%  runtime.allocm
 1184.27kB 19.11% 43.94%  1184.27kB 19.11%  runtime/pprof.StartCPUProfile
  902.59kB 14.56% 58.50%   902.59kB 14.56%  compress/flate.NewWriter (inline)
  521.37kB  8.41% 66.91%   521.37kB  8.41%  runtime/pprof.(*profileBuilder).emitLocation
  513.50kB  8.28% 75.19%   513.50kB  8.28%  runtime/pprof.(*protobuf).varint (inline)
  512.88kB  8.27% 83.47%   512.88kB  8.27%  sync.(*Pool).pinSlow
  512.69kB  8.27% 91.74%   512.69kB  8.27%  regexp/syntax.(*compiler).inst (inline)
  512.08kB  8.26%   100%   512.08kB  8.26%  compress/gzip.NewWriterLevel
         0     0%   100%   902.59kB 14.56%  compress/gzip.(*Writer).Write
         0     0%   100%   512.88kB  8.27%  fmt.Fprintf
         0     0%   100%   512.88kB  8.27%  fmt.newPrinter
         0     0%   100%   512.88kB  8.27%  github.com/zserge/metric.BenchmarkMetrics
         0     0%   100%  1696.96kB 27.38%  main.main
         0     0%   100%   512.69kB  8.27%  regexp.Compile (inline)
         0     0%   100%   512.69kB  8.27%  regexp.compile
         0     0%   100%   512.69kB  8.27%  regexp/syntax.Compile
         0     0%   100%  1696.96kB 27.38%  runtime.main
         0     0%   100%     1026kB 16.55%  runtime.mcall
         0     0%   100%      513kB  8.28%  runtime.mstart
         0     0%   100%      513kB  8.28%  runtime.mstart0
         0     0%   100%      513kB  8.28%  runtime.mstart1
         0     0%   100%     1539kB 24.83%  runtime.newm
         0     0%   100%     1026kB 16.55%  runtime.park_m
         0     0%   100%     1539kB 24.83%  runtime.resetspinning
         0     0%   100%     1539kB 24.83%  runtime.schedule
         0     0%   100%     1539kB 24.83%  runtime.startm
         0     0%   100%     1539kB 24.83%  runtime.wakep
         0     0%   100%   521.37kB  8.41%  runtime/pprof.(*profileBuilder).appendLocsForStack
         0     0%   100%  1937.46kB 31.26%  runtime/pprof.(*profileBuilder).build
         0     0%   100%   902.59kB 14.56%  runtime/pprof.(*profileBuilder).flush
         0     0%   100%  1416.09kB 22.85%  runtime/pprof.(*profileBuilder).pbSample
         0     0%   100%   513.50kB  8.28%  runtime/pprof.(*protobuf).int64 (inline)
         0     0%   100%   513.50kB  8.28%  runtime/pprof.(*protobuf).int64s
         0     0%   100%   513.50kB  8.28%  runtime/pprof.(*protobuf).uint64 (inline)
         0     0%   100%   512.08kB  8.26%  runtime/pprof.newProfileBuilder
         0     0%   100%  2449.53kB 39.52%  runtime/pprof.profileWriter
         0     0%   100%   512.88kB  8.27%  sync.(*Pool).Get
         0     0%   100%   512.88kB  8.27%  sync.(*Pool).pin
         0     0%   100%   512.88kB  8.27%  testing.(*B).Run
         0     0%   100%   512.88kB  8.27%  testing.(*B).run
         0     0%   100%   512.88kB  8.27%  testing.(*B).run1.func1
         0     0%   100%   512.88kB  8.27%  testing.(*B).runN
         0     0%   100%  1696.96kB 27.38%  testing.(*M).Run
         0     0%   100%  1184.27kB 19.11%  testing.(*M).before
         0     0%   100%   512.88kB  8.27%  testing.(*benchState).processBench
         0     0%   100%   512.69kB  8.27%  testing.(*matcher).fullName
         0     0%   100%   512.88kB  8.27%  testing.BenchmarkResult.String
         0     0%   100%   512.69kB  8.27%  testing.runBenchmarks
         0     0%   100%   512.69kB  8.27%  testing.simpleMatch.matches
         0     0%   100%   512.69kB  8.27%  testing/internal/testdeps.TestDeps.MatchString
         0     0%   100%  1184.27kB 19.11%  testing/internal/testdeps.TestDeps.StartCPUProfile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant