Pickleable Database and weak references #443

andlaus · 2025-09-11T14:12:11Z

Besides a few minor cleanups, this PR features two bigger features:

Database objects now play nicely with the pickle module. That said, the speedup of using pickle compared to loading a PDX file from scratch is surprising modest (about factor 2)
weak references can now be used (enabled by default): This hopefully avoids cycles in the reference graph thus making odxtools objects play nicely with reference counting. As a consequence, when deleting a database object using del odx_db_obj, the memory used by the object seems to be freed immediately (in contrast to after calling gc.collect()) and object deletion followed by gc.collect() experiences a speed up of factor 4. Also, unexpected pauses in program execution due to python garbage collecting orphaned Database objects when it feels like it should be a thing of the past.

This PR fixes #421.

Andreas Lauser <[email protected]>, on behalf of MBition GmbH.
Provider Information

Using `pickle` is faster than loading PDX files directly, although the performance gain is surprisingly modest (for the large database which I used for testing, the speedup was less than factor 2). Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>

what they were meant to do is done by the `memo` dictionary... Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>

This has the advantage that odxtools objects now properly work with reference counting, i.e., memory is freed immediately after the respective object is deleted instead of the next time the garbage collector is excercised by python. Also, deleting an object followed by invoking `gc.collect()` is now significantly faster than before: The speedup on my test dataset was about factor 4. On the flipside, the database object must now be kept in order for resolved references to be valid, i.e., ```python db = odxtools.load_file("my_dataset.pdx") service = service.db.ecus.my_ecu.services.my_service del db print(f"DOP of first request param: {service.request.parameters[0].dop}") ``` will not work anymore. Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>

Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>

…ile()` to `Database.add_xml_file()` I think the new names are more concise. The old names still exist, but they are marked as deprecated. Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>

andlaus · 2025-09-11T14:17:06Z

@nada-ben-ali: referenced objects might not be pickled anymore, and might thus point to the void when you unpickle individual sub-objects of the database. So depending on your use case, you might want to either call db.refresh(use_weakrefs=False) before pickling or load your database object using odxtools.load_file(filename, use_weakrefs=False) (or similar). Anyway, it would be great if you could give this PR a spin before it gets merged.

kayoub5 · 2025-09-11T19:53:29Z

odxtools/database.py

                self.add_auxiliary_file(zip_member, pdx_zip.open(zip_member))

-    def add_odx_file(self, odx_file_name: Union[str, "PathLike[Any]"]) -> None:
+    def add_xml_file(self, odx_file_name: Union[str, "PathLike[Any]"]) -> None:


I mainly wanted to get rid of the load_odx_d_file() function in loadfile.py because this can handle an XML file containing any ODX category. I figured that PDX files are also "ODX files", so I renamed it to load_xml_file(). In order to keep (/create) consistency, I figured that the corresponding method in Database ought to be renamed as well...

renaming load_odx_d_file() to load_odx_file() is something I can get behind, using load_xml_file() is misleading, since the index file in pdx is xml but not odx

yes, IMO load_odx_file() is equally misleading, though. I think that a comprehensive name would be something like load_odx_xml_file(), but that's clunky enough that I prefer both of the currently discussed alternatives. @nada-ben-ali: do you have an opinion about this matter?

Also, be aware that the method which Database calls is called add_xml_tree(), i.e., if we want to have consistent naming we need to root up a function name or two anyway...

Why not just have a simple load_file that auto checks what type of file is being passed?

That would be a lot more user friendly

we already have that ;)

on the module level, not on the database class

I think that the database class is not the right abstraction level for this kind of guesswork: If you go the low-level route of creating a database object manually, you should have a good reason for it, i.e., the automagic loading functions are not sufficient for you. For such a use case it is IMO explicitly harmful if it is not 100% clear what a given function does...

kayoub5 · 2025-09-11T19:55:52Z

odxtools/loadfile.py



-def load_file(file_name: str | Path) -> Database:
+@deprecated("use load_xml_file()")  # type: ignore[misc]


why deprecate the old name or why the # type:ignore statement? keeping the old name around for some time is a IMO good idea because it is not a huge effort, and the # type: ignore statement is necessary because without it mypy complains about the @deprecated decorator removing the type annotations of the function...

odxtools/odxlink.py

nada-ben-ali · 2025-09-12T14:50:52Z

@nada-ben-ali: referenced objects might not be pickled anymore, and might thus point to the void when you unpickle individual sub-objects of the database. So depending on your use case, you might want to either call db.refresh(use_weakrefs=False) before pickling or load your database object using odxtools.load_file(filename, use_weakrefs=False) (or similar). Anyway, it would be great if you could give this PR a spin before it gets merged.

@andlaus it worked for only one PDX file. For other PDX files, it didn’t work. I always get the following exception:

I'm checking this....

For load_xml_file(), I think we need to add it to the __init__.py file.

thanks to [at]nada-ben-ali for noticing! Signed-off-by: Andreas Lauser <[email protected]>

andlaus · 2025-09-16T07:53:00Z

@andlaus it worked for only one PDX file. For other PDX files, it didn’t work. I always get the following exception:

do you delete the database object before you call pickle.dump()? If yes, using use_weakrefs=False while loading will fix the issue (but no resources will be made available when the database object runs out of scope).

For load_xml_file(), I think we need to add it to the init.py file.

right. Fixed, thanks!

nada-ben-ali · 2025-09-17T23:00:36Z

@andlaus it worked for only one PDX file. For other PDX files, it didn’t work. I always get the following exception:

do you delete the database object before you call pickle.dump()? If yes, using use_weakrefs=False while loading will fix the issue (but no resources will be made available when the database object runs out of scope).

For load_xml_file(), I think we need to add it to the init.py file.

right. Fixed, thanks!

I’m already loading with use_weakrefs=False, but the issue persists. The root cause is that SnRefContext objects in HierarchyElement and EcuSharedData are still being created with use_weakrefs=True (the default), and this setting isn’t updated anywhere else in the code.

Maybe we can update SnRefContext to inherit the use_weakrefs value from the database. We can add a __post_init__ method to override the default use_weakrefs value with the one from the database. What do you think?

kayoub5 · 2025-09-18T05:34:15Z

@andlaus there are three PRs in one here

renaming load file functions
weakref
deep copy changes

it would be better for changelog, git history, testing and review to split those three topics each into its own PR

andlaus · 2025-09-19T11:44:07Z

I’m already loading with use_weakrefs=False, but the issue persists. The root cause is that SnRefContext objects in HierarchyElement and EcuSharedData are still being created with use_weakrefs=True (the default), and this setting isn’t updated anywhere else in the code.

good catch. I pushed a fix. There also was the issue that the SnRefContext object always used the value from the Database, i.e., it ignored the value passed to Database.refresh(). Can you re-test?

andlaus · 2025-09-19T11:45:06Z

@andlaus there are three PRs in one here [...]

okay. I will split it up once @nada-ben-ali confirms that it works for her...

… snrefs also, properly honor `use_weakrefs` inside `_resolve_snrefs()` of diag layers. thanks to [at]nada-ben-ali for uncovering this! Signed-off-by: Andreas Lauser <[email protected]>

kayoub5 · 2025-09-19T12:26:39Z

odxtools/odxlink.py

    """

-    def __init__(self) -> None:
+    def __init__(self, *, use_weakrefs: bool = False) -> None:


remove default value, if there is a caller who is using the default value, we will end up with the value passed to loading function not being used

external code should IMO still use "strong" references unless it explicitly instructs otherwise, i.e., if you have something like

db = odxtools.load("my_file.pdx") dop_ref = db.ecus.my_ecu.services.my_service.request.parameters[0].dop_ref dop = db.odxlinks.resolve(dop_ref) print(type(dop).__name__)

should IMO print the name of the class of the DOP, not ProxyType. (note that isinstance(x, Foo) works even if x is a weak proxy object pointing to a Foo object.)

nada-ben-ali · 2025-09-19T15:55:34Z

I’m already loading with use_weakrefs=False, but the issue persists. The root cause is that SnRefContext objects in HierarchyElement and EcuSharedData are still being created with use_weakrefs=True (the default), and this setting isn’t updated anywhere else in the code.

good catch. I pushed a fix. There also was the issue that the SnRefContext object always used the value from the Database, i.e., it ignored the value passed to Database.refresh(). Can you re-test?

It’s working now, thank you!

andlaus · 2025-09-22T11:05:07Z

okay, I split up the PR into for almost independent spin-offs:

make 'Database' objects pickleable #444: pickleable database
remove unnecessary __deepcopy__() methods #445: deepcopy cleanup
Clean up the names of the load methods #446: load method name cleanup
Allow to use weak references #447: weak references (depends on Clean up the names of the load methods #446).

the present PR is thus obsolete. closing.

andlaus added 6 commits September 11, 2025 16:09

remove unnecessary __deepcopy__() methods

c8d24fc

what they were meant to do is done by the `memo` dictionary... Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>

make the use of weak references optional

1f1b025

Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>

add unit test for weakrefs

16bbcb2

Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>

andlaus requested review from kayoub5 and nada-ben-ali September 11, 2025 14:12

kayoub5 reviewed Sep 11, 2025

View reviewed changes

odxtools/odxlink.py Show resolved Hide resolved

__init__.py: also import non-deprecated load_xml_file() function

ecfb478

thanks to [at]nada-ben-ali for noticing! Signed-off-by: Andreas Lauser <[email protected]>

andlaus force-pushed the pickleable_db_and_weakrefs branch 2 times, most recently from e946068 to 8bfabf2 Compare September 19, 2025 11:41

Database.refresh(): honor the use_weakrefs function parameter for…

b671e71

… snrefs also, properly honor `use_weakrefs` inside `_resolve_snrefs()` of diag layers. thanks to [at]nada-ben-ali for uncovering this! Signed-off-by: Andreas Lauser <[email protected]>

andlaus force-pushed the pickleable_db_and_weakrefs branch from 8bfabf2 to b671e71 Compare September 19, 2025 11:50

kayoub5 requested changes Sep 19, 2025

View reviewed changes

This was referenced Sep 22, 2025

make 'Database' objects pickleable #444

Merged

remove unnecessary __deepcopy__() methods #445

Merged

andlaus closed this Sep 22, 2025



		def load_file(file_name: str \| Path) -> Database:
		@deprecated("use load_xml_file()") # type: ignore[misc]

Pickleable Database and weak references #443

Pickleable Database and weak references #443

Uh oh!

Conversation

andlaus commented Sep 11, 2025

Uh oh!

andlaus commented Sep 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nada-ben-ali commented Sep 12, 2025

Uh oh!

andlaus commented Sep 16, 2025

Uh oh!

nada-ben-ali commented Sep 17, 2025

Uh oh!

kayoub5 commented Sep 18, 2025

Uh oh!

andlaus commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andlaus commented Sep 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andlaus Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nada-ben-ali commented Sep 19, 2025

Uh oh!

andlaus commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andlaus commented Sep 19, 2025 •

edited

Loading

andlaus Sep 22, 2025 •

edited

Loading