-
Notifications
You must be signed in to change notification settings - Fork 98
Pickleable Database and weak references #443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Using `pickle` is faster than loading PDX files directly, although the performance gain is surprisingly modest (for the large database which I used for testing, the speedup was less than factor 2). Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>
what they were meant to do is done by the `memo` dictionary... Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>
This has the advantage that odxtools objects now properly work with
reference counting, i.e., memory is freed immediately after the
respective object is deleted instead of the next time the garbage
collector is excercised by python. Also, deleting an object followed
by invoking `gc.collect()` is now significantly faster than before:
The speedup on my test dataset was about factor 4.
On the flipside, the database object must now be kept in order for
resolved references to be valid, i.e.,
```python
db = odxtools.load_file("my_dataset.pdx")
service = service.db.ecus.my_ecu.services.my_service
del db
print(f"DOP of first request param: {service.request.parameters[0].dop}")
```
will not work anymore.
Signed-off-by: Andreas Lauser <[email protected]>
Approved-by: Michael Hahn <[email protected]>
Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>
Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>
…ile()` to `Database.add_xml_file()` I think the new names are more concise. The old names still exist, but they are marked as deprecated. Signed-off-by: Andreas Lauser <[email protected]> Approved-by: Michael Hahn <[email protected]>
|
@nada-ben-ali: referenced objects might not be pickled anymore, and might thus point to the void when you unpickle individual sub-objects of the database. So depending on your use case, you might want to either call |
| self.add_auxiliary_file(zip_member, pdx_zip.open(zip_member)) | ||
|
|
||
| def add_odx_file(self, odx_file_name: Union[str, "PathLike[Any]"]) -> None: | ||
| def add_xml_file(self, odx_file_name: Union[str, "PathLike[Any]"]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly wanted to get rid of the load_odx_d_file() function in loadfile.py because this can handle an XML file containing any ODX category. I figured that PDX files are also "ODX files", so I renamed it to load_xml_file(). In order to keep (/create) consistency, I figured that the corresponding method in Database ought to be renamed as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renaming load_odx_d_file() to load_odx_file() is something I can get behind, using load_xml_file() is misleading, since the index file in pdx is xml but not odx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, IMO load_odx_file() is equally misleading, though. I think that a comprehensive name would be something like load_odx_xml_file(), but that's clunky enough that I prefer both of the currently discussed alternatives. @nada-ben-ali: do you have an opinion about this matter?
Also, be aware that the method which Database calls is called add_xml_tree(), i.e., if we want to have consistent naming we need to root up a function name or two anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just have a simple load_file that auto checks what type of file is being passed?
That would be a lot more user friendly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have that ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the module level, not on the database class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the database class is not the right abstraction level for this kind of guesswork: If you go the low-level route of creating a database object manually, you should have a good reason for it, i.e., the automagic loading functions are not sufficient for you. For such a use case it is IMO explicitly harmful if it is not 100% clear what a given function does...
|
|
||
|
|
||
| def load_file(file_name: str | Path) -> Database: | ||
| @deprecated("use load_xml_file()") # type: ignore[misc] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why deprecate the old name or why the # type:ignore statement? keeping the old name around for some time is a IMO good idea because it is not a huge effort, and the # type: ignore statement is necessary because without it mypy complains about the @deprecated decorator removing the type annotations of the function...
@andlaus it worked for only one PDX file. For other PDX files, it didn’t work. I always get the following exception:
I'm checking this.... For |
thanks to [at]nada-ben-ali for noticing! Signed-off-by: Andreas Lauser <[email protected]>
do you delete the database object before you call
right. Fixed, thanks! |
I’m already loading with Maybe we can update |
|
@andlaus there are three PRs in one here
it would be better for changelog, git history, testing and review to split those three topics each into its own PR |
e946068 to
8bfabf2
Compare
good catch. I pushed a fix. There also was the issue that the |
okay. I will split it up once @nada-ben-ali confirms that it works for her... |
… snrefs also, properly honor `use_weakrefs` inside `_resolve_snrefs()` of diag layers. thanks to [at]nada-ben-ali for uncovering this! Signed-off-by: Andreas Lauser <[email protected]>
8bfabf2 to
b671e71
Compare
| """ | ||
|
|
||
| def __init__(self) -> None: | ||
| def __init__(self, *, use_weakrefs: bool = False) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove default value, if there is a caller who is using the default value, we will end up with the value passed to loading function not being used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
external code should IMO still use "strong" references unless it explicitly instructs otherwise, i.e., if you have something like
db = odxtools.load("my_file.pdx")
dop_ref = db.ecus.my_ecu.services.my_service.request.parameters[0].dop_ref
dop = db.odxlinks.resolve(dop_ref)
print(type(dop).__name__)should IMO print the name of the class of the DOP, not ProxyType. (note that isinstance(x, Foo) works even if x is a weak proxy object pointing to a Foo object.)
It’s working now, thank you! |
|
okay, I split up the PR into for almost independent spin-offs:
the present PR is thus obsolete. closing. |

Besides a few minor cleanups, this PR features two bigger features:
Databaseobjects now play nicely with thepicklemodule. That said, the speedup of usingpicklecompared to loading a PDX file from scratch is surprising modest (about factor 2)del odx_db_obj, the memory used by the object seems to be freed immediately (in contrast to after callinggc.collect()) and object deletion followed bygc.collect()experiences a speed up of factor 4. Also, unexpected pauses in program execution due to python garbage collecting orphanedDatabaseobjects when it feels like it should be a thing of the past.This PR fixes #421.
Andreas Lauser <[email protected]>, on behalf of MBition GmbH.
Provider Information