Skip to content

Want comprehensive framework for observing/report event/thread statistics #609

@taspelund

Description

@taspelund

I would like to see mgd get away from completely unmanaged threads that expose nothing for reporting, insight or control.
I don't think all threads need explicit management/joining since the Arc<AtomicBool> signaling we have today suffices for most situations, but I think having a reusable coding pattern would enable consistency (no bespoke threading logic), control (ability to join where appropriate w/o needing one-off types), and insight (at its most basic, reporting which state a thread is currently in).

I would like to see mgd take a more structured/deliberate approach to thread lifecycle management.
In particular, I would like to start holding onto :

  1. Explicit joining of threads on shutdown where it makes sense
  2. Explicit state reporting for threads (ready, running, shutdown/panicked)
  3. Panic reporting

What I'm envisioning is something akin to FRR's show thread cpu (recently renamed to show event cpu):

lima-ubuntu-22-04# show thread cpu                                                                                                                                                                                 <cr>                                                                                                                                                                                                             FILTER  Display filter (rwtexb)                                                                                                                                                                                lima-ubuntu-22-04# show thread cpu zebra
Thread statistics for zebra:

Showing statistics for pthread default
--------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn  Type   Thread
    1          0.095         1       95        95       96        96         0         0  R      zserv_accept
    1          0.060         3       20        37       21        39         0         0  R      vtysh_accept
    1          1.616         9      179       496      182       500         0         0  R      kernel_read
    0          0.008         2        4         8        5         9         0         0     E   rib_process_dplane_results
    0          0.076         1       76        76       76        76         0         0     E   zserv_process_messages
    1         12.645        75      168       873      170       873         0         0  R      vtysh_read
    0          0.009         1        9         9       10        10         0         0     E   frr_config_read_in


Showing statistics for pthread Zebra dplane thread
--------------------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn  Type   Thread
    1          1.092         9      121       300      124       300         0         0  R      dplane_incoming_read
    0          0.114         2       57        66      225       402         0         0     E   dplane_thread_loop


Showing statistics for pthread Zebra Opaque thread
--------------------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn  Type   Thread
    0          0.002         1        2         2        3         3         0         0     E   process_messages


Showing statistics for pthread Zebra API client thread
------------------------------------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn  Type   Thread
    1          0.014         1       14        14       14        14         0         0  R      zserv_read


Total thread statistics
-------------------------
                               CPU (user+system): Real (wall-clock):
Active   Runtime(ms)   Invoked Avg uSec Max uSecs Avg uSec Max uSecs  CPU_Warn Wall_Warn  Type  Thread
    6         15.731       105      149       873      154       873         0         0  R  E   TOTAL

FRR's implementation here collects stats/usage in each pthread's event loop as each event is handled, then displays it when the CLI command is run.

I'd like to have something similar integrated into mgd so we can simply and easily query the thread status/stats from mgadm/API, as a quick way to get an idea of what's going on without needing to immediately jump into mdb or DTrace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IdeaNew ideas to consider.mgdMaghemite daemonwant

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions