Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions Doc/library/gc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,13 +110,16 @@ The :mod:`gc` module provides the following functions:
to be uncollectable (and were therefore moved to the :data:`garbage`
list) inside this generation;

* ``visited`` is the total number of unique objects visited during each
collection of this generation;

* ``duration`` is the total time in seconds spent in collections for this
generation.

.. versionadded:: 3.4

.. versionchanged:: next
Add ``duration``.
Add ``duration`` and ``visited``.


.. function:: set_threshold(threshold0, [threshold1, [threshold2]])
Expand Down Expand Up @@ -319,6 +322,9 @@ values but should not rebind them):
"uncollectable": When *phase* is "stop", the number of objects
that could not be collected and were put in :data:`garbage`.

"visited": When *phase* is "stop", the number of unique objects visited
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something but if I understand this correctly this counts objects in the collection set, not all objects visited during traversal (which would include objects referenced by containers). Consider maybe renaming to "objects_in_collection" or "initial_objects" or updating implementation to count all visited objects (including referenced ones during subtract_refs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, @nascheme pointed this out too on the issue. I think naming is hard... this can really be any of:

  • Total visits performed (edges in the graph).
  • Total unique objects visited (nodes in the graph, optionally minus immortals, marked-alive objects, untracked objects, etc.).
  • Total objects in the current generation (nodes in the generation subgraph, optionally minus immortals, marked-alive objects, untracked objects, etc.).

I'm really trying not to overthink it... I think the probably most intuitive answer is something along the lines of "How many unique objects did the GC consider before eventually freeing <collected> number of them"? Like I said above, collected / visited should be a meaningful efficiency metric to help see at a glance if you're running at the right times (in addition to memory metrics, of course).

But definitely open to suggestions. I naturally lean towards a simpler API name and more nuance in the docs for it, rather than trying to convey all of the nuance in one long name alone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put another way, it seems like objects that can't count towards collected also shouldn't count towards visited. So I'm pretty sure that would exclude anything reachable from, but not part of, the current generation?

Copy link
Member

@pablogsal pablogsal Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but even keeping the "naming is hard" optics, I would say that anything other than visited is good because at least we can agree that surely this is not "number of visited objects" no? Maybe candidates?

during the collection.

"duration": When *phase* is "stop", the time in seconds spent in the
collection.

Expand All @@ -335,7 +341,7 @@ values but should not rebind them):
.. versionadded:: 3.3

.. versionchanged:: next
Add "duration".
Add "duration" and "visited".


The following constants are provided for use with :func:`set_debug`:
Expand Down
4 changes: 4 additions & 0 deletions Include/internal/pycore_interp_structs.h
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,8 @@ struct gc_collection_stats {
Py_ssize_t collected;
/* total number of uncollectable objects (put into gc.garbage) */
Py_ssize_t uncollectable;
// Total number of objects visited:
Py_ssize_t visited;
// Duration of the collection in seconds:
double duration;
};
Expand All @@ -191,6 +193,8 @@ struct gc_generation_stats {
Py_ssize_t collected;
/* total number of uncollectable objects (put into gc.garbage) */
Py_ssize_t uncollectable;
// Total number of objects visited:
Py_ssize_t visited;
// Duration of the collection in seconds:
double duration;
};
Expand Down
12 changes: 8 additions & 4 deletions Lib/test/test_gc.py
Original file line number Diff line number Diff line change
Expand Up @@ -846,11 +846,14 @@ def test_get_stats(self):
self.assertEqual(len(stats), 3)
for st in stats:
self.assertIsInstance(st, dict)
self.assertEqual(set(st),
{"collected", "collections", "uncollectable", "duration"})
self.assertEqual(
set(st),
{"collected", "collections", "uncollectable", "visited", "duration"}
)
self.assertGreaterEqual(st["collected"], 0)
self.assertGreaterEqual(st["collections"], 0)
self.assertGreaterEqual(st["uncollectable"], 0)
self.assertGreaterEqual(st["visited"], 0)
self.assertGreaterEqual(st["duration"], 0)
# Check that collection counts are incremented correctly
if gc.isenabled():
Expand All @@ -865,7 +868,7 @@ def test_get_stats(self):
self.assertGreater(new[0]["duration"], old[0]["duration"])
self.assertEqual(new[1]["duration"], old[1]["duration"])
self.assertEqual(new[2]["duration"], old[2]["duration"])
for stat in ["collected", "uncollectable"]:
for stat in ["collected", "uncollectable", "visited"]:
self.assertGreaterEqual(new[0][stat], old[0][stat])
self.assertEqual(new[1][stat], old[1][stat])
self.assertEqual(new[2][stat], old[2][stat])
Expand All @@ -877,7 +880,7 @@ def test_get_stats(self):
self.assertEqual(new[0]["duration"], old[0]["duration"])
self.assertEqual(new[1]["duration"], old[1]["duration"])
self.assertGreater(new[2]["duration"], old[2]["duration"])
for stat in ["collected", "uncollectable"]:
for stat in ["collected", "uncollectable", "visited"]:
self.assertEqual(new[0][stat], old[0][stat])
self.assertEqual(new[1][stat], old[1][stat])
self.assertGreaterEqual(new[2][stat], old[2][stat])
Expand Down Expand Up @@ -1316,6 +1319,7 @@ def test_collect(self):
self.assertIn("generation", info)
self.assertIn("collected", info)
self.assertIn("uncollectable", info)
self.assertIn("visited", info)
self.assertIn("duration", info)

def test_collect_generation(self):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Expose a ``"visited"`` stat in :func:`gc.get_stats` and
:data:`gc.callbacks`.
3 changes: 2 additions & 1 deletion Modules/gcmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -358,10 +358,11 @@ gc_get_stats_impl(PyObject *module)
for (i = 0; i < NUM_GENERATIONS; i++) {
PyObject *dict;
st = &stats[i];
dict = Py_BuildValue("{snsnsnsd}",
dict = Py_BuildValue("{snsnsnsnsd}",
"collections", st->collections,
"collected", st->collected,
"uncollectable", st->uncollectable,
"visited", st->visited,
"duration", st->duration
);
if (dict == NULL)
Expand Down
16 changes: 11 additions & 5 deletions Python/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -483,11 +483,12 @@ validate_consistent_old_space(PyGC_Head *head)
/* Set all gc_refs = ob_refcnt. After this, gc_refs is > 0 and
* PREV_MASK_COLLECTING bit is set for all objects in containers.
*/
static void
static Py_ssize_t
update_refs(PyGC_Head *containers)
{
PyGC_Head *next;
PyGC_Head *gc = GC_NEXT(containers);
Py_ssize_t visited = 0;

while (gc != containers) {
next = GC_NEXT(gc);
Expand Down Expand Up @@ -519,7 +520,9 @@ update_refs(PyGC_Head *containers)
*/
_PyObject_ASSERT(op, gc_get_refs(gc) != 0);
gc = next;
visited++;
}
return visited;
}

/* A traversal callback for subtract_refs. */
Expand Down Expand Up @@ -1240,15 +1243,15 @@ flag set but it does not clear it to skip unnecessary iteration. Before the
flag is cleared (for example, by using 'clear_unreachable_mask' function or
by a call to 'move_legacy_finalizers'), the 'unreachable' list is not a normal
list and we can not use most gc_list_* functions for it. */
static inline void
static inline Py_ssize_t
deduce_unreachable(PyGC_Head *base, PyGC_Head *unreachable) {
validate_list(base, collecting_clear_unreachable_clear);
/* Using ob_refcnt and gc_refs, calculate which objects in the
* container set are reachable from outside the set (i.e., have a
* refcount greater than 0 when all the references within the
* set are taken into account).
*/
update_refs(base); // gc_prev is used for gc_refs
Py_ssize_t visited = update_refs(base); // gc_prev is used for gc_refs
subtract_refs(base);

/* Leave everything reachable from outside base in base, and move
Expand Down Expand Up @@ -1289,6 +1292,7 @@ deduce_unreachable(PyGC_Head *base, PyGC_Head *unreachable) {
move_unreachable(base, unreachable); // gc_prev is pointer again
validate_list(base, collecting_clear_unreachable_clear);
validate_list(unreachable, collecting_set_unreachable_set);
return visited;
}

/* Handle objects that may have resurrected after a call to 'finalize_garbage', moving
Expand Down Expand Up @@ -1364,6 +1368,7 @@ static void
add_stats(GCState *gcstate, int gen, struct gc_collection_stats *stats)
{
gcstate->generation_stats[gen].duration += stats->duration;
gcstate->generation_stats[gen].visited += stats->visited;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this matters but what would happen if this overflows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we generally worried about overflowing Py_ssize_t?

I can switch this to size_t if we are so at least the overflow isn't UB. A Python int seems sort of heavy for this, but we could go that route if needed too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we generally worried about overflowing Py_ssize_t?

Nah, I think this has the same problem as the other fields just wanted to raise it here in case we both believe is worth fixing. I don't think is important, but it is possible. WDYT? I am fine to ignore just to be clear

gcstate->generation_stats[gen].collected += stats->collected;
gcstate->generation_stats[gen].uncollectable += stats->uncollectable;
gcstate->generation_stats[gen].collections += 1;
Expand Down Expand Up @@ -1754,7 +1759,7 @@ gc_collect_region(PyThreadState *tstate,
assert(!_PyErr_Occurred(tstate));

gc_list_init(&unreachable);
deduce_unreachable(from, &unreachable);
stats->visited = deduce_unreachable(from, &unreachable);
validate_consistent_old_space(from);
untrack_tuples(from);

Expand Down Expand Up @@ -1844,10 +1849,11 @@ do_gc_callback(GCState *gcstate, const char *phase,
assert(PyList_CheckExact(gcstate->callbacks));
PyObject *info = NULL;
if (PyList_GET_SIZE(gcstate->callbacks) != 0) {
info = Py_BuildValue("{sisnsnsd}",
info = Py_BuildValue("{sisnsnsnsd}",
"generation", generation,
"collected", stats->collected,
"uncollectable", stats->uncollectable,
"visited", stats->visited,
"duration", stats->duration);
if (info == NULL) {
PyErr_FormatUnraisable("Exception ignored while invoking gc callbacks");
Expand Down
13 changes: 9 additions & 4 deletions Python/gc_free_threading.c
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ struct collection_state {
// we can't collect objects with deferred references because we may not
// see all references.
int skip_deferred_objects;
Py_ssize_t visited;
Py_ssize_t collected;
Py_ssize_t uncollectable;
Py_ssize_t long_lived_total;
Expand Down Expand Up @@ -975,6 +976,7 @@ static bool
update_refs(const mi_heap_t *heap, const mi_heap_area_t *area,
void *block, size_t block_size, void *args)
{
struct collection_state *state = (struct collection_state *)args;
PyObject *op = op_from_block(block, args, false);
if (op == NULL) {
return true;
Expand All @@ -991,6 +993,7 @@ update_refs(const mi_heap_t *heap, const mi_heap_area_t *area,
gc_clear_unreachable(op);
return true;
}
state->visited++;

Py_ssize_t refcount = Py_REFCNT(op);
if (_PyObject_HasDeferredRefcount(op)) {
Expand Down Expand Up @@ -1911,7 +1914,7 @@ handle_resurrected_objects(struct collection_state *state)
static void
invoke_gc_callback(PyThreadState *tstate, const char *phase,
int generation, Py_ssize_t collected,
Py_ssize_t uncollectable, double duration)
Py_ssize_t uncollectable, Py_ssize_t visited, double duration)
{
assert(!_PyErr_Occurred(tstate));

Expand All @@ -1925,10 +1928,11 @@ invoke_gc_callback(PyThreadState *tstate, const char *phase,
assert(PyList_CheckExact(gcstate->callbacks));
PyObject *info = NULL;
if (PyList_GET_SIZE(gcstate->callbacks) != 0) {
info = Py_BuildValue("{sisnsnsd}",
info = Py_BuildValue("{sisnsnsnsd}",
"generation", generation,
"collected", collected,
"uncollectable", uncollectable,
"visited", visited,
"duration", duration);
if (info == NULL) {
PyErr_FormatUnraisable("Exception ignored while "
Expand Down Expand Up @@ -2372,7 +2376,7 @@ gc_collect_main(PyThreadState *tstate, int generation, _PyGC_Reason reason)
GC_STAT_ADD(generation, collections, 1);

if (reason != _Py_GC_REASON_SHUTDOWN) {
invoke_gc_callback(tstate, "start", generation, 0, 0, 0);
invoke_gc_callback(tstate, "start", generation, 0, 0, 0, 0.0);
}

if (gcstate->debug & _PyGC_DEBUG_STATS) {
Expand Down Expand Up @@ -2427,6 +2431,7 @@ gc_collect_main(PyThreadState *tstate, int generation, _PyGC_Reason reason)
stats->collected += m;
stats->uncollectable += n;
stats->duration += duration;
stats->visited += state.visited;

GC_STAT_ADD(generation, objects_collected, m);
#ifdef Py_STATS
Expand All @@ -2445,7 +2450,7 @@ gc_collect_main(PyThreadState *tstate, int generation, _PyGC_Reason reason)
}

if (reason != _Py_GC_REASON_SHUTDOWN) {
invoke_gc_callback(tstate, "stop", generation, m, n, duration);
invoke_gc_callback(tstate, "stop", generation, m, n, state.visited, duration);
}

assert(!_PyErr_Occurred(tstate));
Expand Down
Loading