Skip to content

Conversation

@andlaus
Copy link
Member

@andlaus andlaus commented Sep 11, 2025

Besides a few minor cleanups, this PR features two bigger features:

  1. Database objects now play nicely with the pickle module. That said, the speedup of using pickle compared to loading a PDX file from scratch is surprising modest (about factor 2)
  2. weak references can now be used (enabled by default): This hopefully avoids cycles in the reference graph thus making odxtools objects play nicely with reference counting. As a consequence, when deleting a database object using del odx_db_obj, the memory used by the object seems to be freed immediately (in contrast to after calling gc.collect()) and object deletion followed by gc.collect() experiences a speed up of factor 4. Also, unexpected pauses in program execution due to python garbage collecting orphaned Database objects when it feels like it should be a thing of the past.

This PR fixes #421.

Andreas Lauser <[email protected]>, on behalf of MBition GmbH.
Provider Information

Using `pickle` is faster than loading PDX files directly, although the
performance gain is surprisingly modest (for the large database which
I used for testing, the speedup was less than factor 2).

Signed-off-by: Andreas Lauser <[email protected]>
Approved-by: Michael Hahn <[email protected]>
what they were meant to do is done by the `memo` dictionary...

Signed-off-by: Andreas Lauser <[email protected]>
Approved-by: Michael Hahn <[email protected]>
This has the advantage that odxtools objects now properly work with
reference counting, i.e., memory is freed immediately after the
respective object is deleted instead of the next time the garbage
collector is excercised by python. Also, deleting an object followed
by invoking `gc.collect()` is now significantly faster than before:
The speedup on my test dataset was about factor 4.

On the flipside, the database object must now be kept in order for
resolved references to be valid, i.e.,

```python
db = odxtools.load_file("my_dataset.pdx")
service = service.db.ecus.my_ecu.services.my_service
del db
print(f"DOP of first request param: {service.request.parameters[0].dop}")
```

will not work anymore.

Signed-off-by: Andreas Lauser <[email protected]>
Approved-by: Michael Hahn <[email protected]>
Signed-off-by: Andreas Lauser <[email protected]>
Approved-by: Michael Hahn <[email protected]>
Signed-off-by: Andreas Lauser <[email protected]>
Approved-by: Michael Hahn <[email protected]>
…ile()` to `Database.add_xml_file()`

I think the new names are more concise. The old names still exist, but
they are marked as deprecated.

Signed-off-by: Andreas Lauser <[email protected]>
Approved-by: Michael Hahn <[email protected]>
@andlaus
Copy link
Member Author

andlaus commented Sep 11, 2025

@nada-ben-ali: referenced objects might not be pickled anymore, and might thus point to the void when you unpickle individual sub-objects of the database. So depending on your use case, you might want to either call db.refresh(use_weakrefs=False) before pickling or load your database object using odxtools.load_file(filename, use_weakrefs=False) (or similar). Anyway, it would be great if you could give this PR a spin before it gets merged.

self.add_auxiliary_file(zip_member, pdx_zip.open(zip_member))

def add_odx_file(self, odx_file_name: Union[str, "PathLike[Any]"]) -> None:
def add_xml_file(self, odx_file_name: Union[str, "PathLike[Any]"]) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly wanted to get rid of the load_odx_d_file() function in loadfile.py because this can handle an XML file containing any ODX category. I figured that PDX files are also "ODX files", so I renamed it to load_xml_file(). In order to keep (/create) consistency, I figured that the corresponding method in Database ought to be renamed as well...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming load_odx_d_file() to load_odx_file() is something I can get behind, using load_xml_file() is misleading, since the index file in pdx is xml but not odx

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, IMO load_odx_file() is equally misleading, though. I think that a comprehensive name would be something like load_odx_xml_file(), but that's clunky enough that I prefer both of the currently discussed alternatives. @nada-ben-ali: do you have an opinion about this matter?

Also, be aware that the method which Database calls is called add_xml_tree(), i.e., if we want to have consistent naming we need to root up a function name or two anyway...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just have a simple load_file that auto checks what type of file is being passed?

That would be a lot more user friendly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have that ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the module level, not on the database class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the database class is not the right abstraction level for this kind of guesswork: If you go the low-level route of creating a database object manually, you should have a good reason for it, i.e., the automagic loading functions are not sufficient for you. For such a use case it is IMO explicitly harmful if it is not 100% clear what a given function does...



def load_file(file_name: str | Path) -> Database:
@deprecated("use load_xml_file()") # type: ignore[misc]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why deprecate the old name or why the # type:ignore statement? keeping the old name around for some time is a IMO good idea because it is not a huge effort, and the # type: ignore statement is necessary because without it mypy complains about the @deprecated decorator removing the type annotations of the function...

@nada-ben-ali
Copy link
Collaborator

@nada-ben-ali: referenced objects might not be pickled anymore, and might thus point to the void when you unpickle individual sub-objects of the database. So depending on your use case, you might want to either call db.refresh(use_weakrefs=False) before pickling or load your database object using odxtools.load_file(filename, use_weakrefs=False) (or similar). Anyway, it would be great if you could give this PR a spin before it gets merged.

@andlaus it worked for only one PDX file. For other PDX files, it didn’t work. I always get the following exception:

image

I'm checking this....

For load_xml_file(), I think we need to add it to the __init__.py file.

thanks to [at]nada-ben-ali for noticing!

Signed-off-by: Andreas Lauser <[email protected]>
@andlaus
Copy link
Member Author

andlaus commented Sep 16, 2025

@andlaus it worked for only one PDX file. For other PDX files, it didn’t work. I always get the following exception:

do you delete the database object before you call pickle.dump()? If yes, using use_weakrefs=False while loading will fix the issue (but no resources will be made available when the database object runs out of scope).

For load_xml_file(), I think we need to add it to the init.py file.

right. Fixed, thanks!

@nada-ben-ali
Copy link
Collaborator

@andlaus it worked for only one PDX file. For other PDX files, it didn’t work. I always get the following exception:

do you delete the database object before you call pickle.dump()? If yes, using use_weakrefs=False while loading will fix the issue (but no resources will be made available when the database object runs out of scope).

For load_xml_file(), I think we need to add it to the init.py file.

right. Fixed, thanks!

I’m already loading with use_weakrefs=False, but the issue persists. The root cause is that SnRefContext objects in HierarchyElement and EcuSharedData are still being created with use_weakrefs=True (the default), and this setting isn’t updated anywhere else in the code.

Maybe we can update SnRefContext to inherit the use_weakrefs value from the database. We can add a __post_init__ method to override the default use_weakrefs value with the one from the database. What do you think?

@kayoub5
Copy link
Collaborator

kayoub5 commented Sep 18, 2025

@andlaus there are three PRs in one here

  • renaming load file functions
  • weakref
  • deep copy changes

it would be better for changelog, git history, testing and review to split those three topics each into its own PR

@andlaus andlaus force-pushed the pickleable_db_and_weakrefs branch 2 times, most recently from e946068 to 8bfabf2 Compare September 19, 2025 11:41
@andlaus
Copy link
Member Author

andlaus commented Sep 19, 2025

I’m already loading with use_weakrefs=False, but the issue persists. The root cause is that SnRefContext objects in HierarchyElement and EcuSharedData are still being created with use_weakrefs=True (the default), and this setting isn’t updated anywhere else in the code.

good catch. I pushed a fix. There also was the issue that the SnRefContext object always used the value from the Database, i.e., it ignored the value passed to Database.refresh(). Can you re-test?

@andlaus
Copy link
Member Author

andlaus commented Sep 19, 2025

@andlaus there are three PRs in one here [...]

okay. I will split it up once @nada-ben-ali confirms that it works for her...

… snrefs

also, properly honor `use_weakrefs` inside `_resolve_snrefs()` of diag
layers.

thanks to [at]nada-ben-ali for uncovering this!

Signed-off-by: Andreas Lauser <[email protected]>
@andlaus andlaus force-pushed the pickleable_db_and_weakrefs branch from 8bfabf2 to b671e71 Compare September 19, 2025 11:50
"""

def __init__(self) -> None:
def __init__(self, *, use_weakrefs: bool = False) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove default value, if there is a caller who is using the default value, we will end up with the value passed to loading function not being used

Copy link
Member Author

@andlaus andlaus Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

external code should IMO still use "strong" references unless it explicitly instructs otherwise, i.e., if you have something like

db = odxtools.load("my_file.pdx")
dop_ref = db.ecus.my_ecu.services.my_service.request.parameters[0].dop_ref
dop = db.odxlinks.resolve(dop_ref)
print(type(dop).__name__)

should IMO print the name of the class of the DOP, not ProxyType. (note that isinstance(x, Foo) works even if x is a weak proxy object pointing to a Foo object.)

@nada-ben-ali
Copy link
Collaborator

I’m already loading with use_weakrefs=False, but the issue persists. The root cause is that SnRefContext objects in HierarchyElement and EcuSharedData are still being created with use_weakrefs=True (the default), and this setting isn’t updated anywhere else in the code.

good catch. I pushed a fix. There also was the issue that the SnRefContext object always used the value from the Database, i.e., it ignored the value passed to Database.refresh(). Can you re-test?

It’s working now, thank you!

@andlaus
Copy link
Member Author

andlaus commented Sep 22, 2025

okay, I split up the PR into for almost independent spin-offs:

the present PR is thus obsolete. closing.

@andlaus andlaus closed this Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix object lifetime management

3 participants