Skip to content

PromQL: Need to Correlate windows_netframework_clrmemory_gc_time_percent to Process ID (PID) for IIS Worker Process Troubleshooting #2234

@sanjanarampurkottur01

Description

@sanjanarampurkottur01

Current Behavior

We are monitoring the garbage collection (GC) performance of our IIS Application Pools using the windows_netframework_clrmemory_gc_time_percent metric. When an issue occurs, this metric correctly shows the problematic process instance via the process label (e.g., process=AgentService or process=w3wp#1). However, we cannot directly obtain the actual Windows Process ID (PID) of this specific instance from this metric's label set. This prevents quick correlation with OS-level tools (like Task Manager) or targeted process debugging (like taking a memory dump with procdump).

To be more specific:

The problem is that this instance name (w3wp#1) does not correspond to the actual Process ID (PID) visible in Windows Task Manager. It is an index assigned by the operating system for performance counters. Example of the problem: If the metric fires an alert for process=w3wp#1, an operator viewing the Windows Task Manager will see multiple w3wp.exe processes, but has no quick way to know which PID (e.g., 12345, 23456, or 34567) belongs to the w3wp#1 instance. This prevents immediate troubleshooting, such as:Targeting the correct process for a memory dump: (e.g., procdump -accepteula )Isolating CPU/Memory usage in Task Manager or Resource Monitor.

Expected Behavior

We expect the monitoring system to seamlessly provide the actual Windows Process ID (PID) of the IIS worker process that is reporting high GC activity.

When a performance issue is identified, the problematic process is labeled using the Windows Performance Counter Instance Name (e.g., process=w3wp#1, process=w3wp#2). We cannot reliably or immediately determine which live w3wp.exe process (PID) in Task Manager corresponds to the indexed instance name w3wp#X. This critical ambiguity blocks fast troubleshooting.Therefore, when querying the windows_netframework_clrmemory_gc_time_percent metric, the resulting time series must include a label (e.g., pid) containing the corresponding Process ID.

Steps To Reproduce

1. Enable the netframework collector in windows_exporter.

2. Run an application in an IIS Application Pool (e.g., named "MyAppPool").

3. Load the application to intentionally cause high garbage collection activity (e.g., allocating and discarding many large objects rapidly).

4. Query the problematic metric in Prometheus/Grafana:

"windows_netframework_clrmemory_gc_time_percent{process="w3wp#1"}

Environment

  • windows_exporter Version: 0.20.0
  • Windows Server Version: Microsoft Windows Server 2019 Datacenter (Build 17763)
  • Target Application: IIS (ASP.NET/CLR)

windows_exporter logs

10/16/2025 5:20:40 PM Warning          Collection timed out, still waiting for [netframework_clrexceptions
                                       netframework_clrmemory service netframework_clrlocksandthreads]

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions