-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
GH-140638: Add a GC "visited" stat #141814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
bb3fc94
d89ecdc
3ee8db9
41814f3
8f4ce45
13a97f3
bdfe660
2d25167
9be8100
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| Expose a ``"visited"`` stat in :func:`gc.get_stats` and | ||
| :data:`gc.callbacks`. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -483,11 +483,12 @@ validate_consistent_old_space(PyGC_Head *head) | |
| /* Set all gc_refs = ob_refcnt. After this, gc_refs is > 0 and | ||
| * PREV_MASK_COLLECTING bit is set for all objects in containers. | ||
| */ | ||
| static void | ||
| static Py_ssize_t | ||
| update_refs(PyGC_Head *containers) | ||
| { | ||
| PyGC_Head *next; | ||
| PyGC_Head *gc = GC_NEXT(containers); | ||
| Py_ssize_t visited = 0; | ||
|
|
||
| while (gc != containers) { | ||
| next = GC_NEXT(gc); | ||
|
|
@@ -519,7 +520,9 @@ update_refs(PyGC_Head *containers) | |
| */ | ||
| _PyObject_ASSERT(op, gc_get_refs(gc) != 0); | ||
| gc = next; | ||
| visited++; | ||
| } | ||
| return visited; | ||
| } | ||
|
|
||
| /* A traversal callback for subtract_refs. */ | ||
|
|
@@ -1240,15 +1243,15 @@ flag set but it does not clear it to skip unnecessary iteration. Before the | |
| flag is cleared (for example, by using 'clear_unreachable_mask' function or | ||
| by a call to 'move_legacy_finalizers'), the 'unreachable' list is not a normal | ||
| list and we can not use most gc_list_* functions for it. */ | ||
| static inline void | ||
| static inline Py_ssize_t | ||
| deduce_unreachable(PyGC_Head *base, PyGC_Head *unreachable) { | ||
| validate_list(base, collecting_clear_unreachable_clear); | ||
| /* Using ob_refcnt and gc_refs, calculate which objects in the | ||
| * container set are reachable from outside the set (i.e., have a | ||
| * refcount greater than 0 when all the references within the | ||
| * set are taken into account). | ||
| */ | ||
| update_refs(base); // gc_prev is used for gc_refs | ||
| Py_ssize_t visited = update_refs(base); // gc_prev is used for gc_refs | ||
| subtract_refs(base); | ||
|
|
||
| /* Leave everything reachable from outside base in base, and move | ||
|
|
@@ -1289,6 +1292,7 @@ deduce_unreachable(PyGC_Head *base, PyGC_Head *unreachable) { | |
| move_unreachable(base, unreachable); // gc_prev is pointer again | ||
| validate_list(base, collecting_clear_unreachable_clear); | ||
| validate_list(unreachable, collecting_set_unreachable_set); | ||
| return visited; | ||
| } | ||
|
|
||
| /* Handle objects that may have resurrected after a call to 'finalize_garbage', moving | ||
|
|
@@ -1364,6 +1368,7 @@ static void | |
| add_stats(GCState *gcstate, int gen, struct gc_collection_stats *stats) | ||
| { | ||
| gcstate->generation_stats[gen].duration += stats->duration; | ||
| gcstate->generation_stats[gen].visited += stats->visited; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this matters but what would happen if this overflows?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we generally worried about overflowing I can switch this to
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Nah, I think this has the same problem as the other fields just wanted to raise it here in case we both believe is worth fixing. I don't think is important, but it is possible. WDYT? I am fine to ignore just to be clear |
||
| gcstate->generation_stats[gen].collected += stats->collected; | ||
| gcstate->generation_stats[gen].uncollectable += stats->uncollectable; | ||
| gcstate->generation_stats[gen].collections += 1; | ||
|
|
@@ -1754,7 +1759,7 @@ gc_collect_region(PyThreadState *tstate, | |
| assert(!_PyErr_Occurred(tstate)); | ||
|
|
||
| gc_list_init(&unreachable); | ||
| deduce_unreachable(from, &unreachable); | ||
| stats->visited = deduce_unreachable(from, &unreachable); | ||
| validate_consistent_old_space(from); | ||
| untrack_tuples(from); | ||
|
|
||
|
|
@@ -1844,10 +1849,11 @@ do_gc_callback(GCState *gcstate, const char *phase, | |
| assert(PyList_CheckExact(gcstate->callbacks)); | ||
| PyObject *info = NULL; | ||
| if (PyList_GET_SIZE(gcstate->callbacks) != 0) { | ||
| info = Py_BuildValue("{sisnsnsd}", | ||
| info = Py_BuildValue("{sisnsnsnsd}", | ||
| "generation", generation, | ||
| "collected", stats->collected, | ||
| "uncollectable", stats->uncollectable, | ||
| "visited", stats->visited, | ||
| "duration", stats->duration); | ||
| if (info == NULL) { | ||
| PyErr_FormatUnraisable("Exception ignored while invoking gc callbacks"); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing something but if I understand this correctly this counts objects in the collection set, not all objects visited during traversal (which would include objects referenced by containers). Consider maybe renaming to "objects_in_collection" or "initial_objects" or updating implementation to count all visited objects (including referenced ones during subtract_refs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, @nascheme pointed this out too on the issue. I think naming is hard... this can really be any of:
I'm really trying not to overthink it... I think the probably most intuitive answer is something along the lines of "How many unique objects did the GC consider before eventually freeing
<collected>number of them"? Like I said above,collected / visitedshould be a meaningful efficiency metric to help see at a glance if you're running at the right times (in addition to memory metrics, of course).But definitely open to suggestions. I naturally lean towards a simpler API name and more nuance in the docs for it, rather than trying to convey all of the nuance in one long name alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put another way, it seems like objects that can't count towards
collectedalso shouldn't count towardsvisited. So I'm pretty sure that would exclude anything reachable from, but not part of, the current generation?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but even keeping the "naming is hard" optics, I would say that anything other than
visitedis good because at least we can agree that surely this is not "number of visited objects" no? Maybecandidates?