-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Lazy load extractor resources & utilities #8463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
lazy-loads extractor utilities from extractor/utils/*.py modules
use 'basecategory' instead of plain 'category' as 'key'
remove 'utilsb()'
add 'utils' argument
|
Interesting, where did this idea come from? I never noticed gallery-dl being slow. I do use Do you have benchmarks before vs. after timings? Personally I would only consider doing something like this if there is significant measurable impact (and it actually matters for some use case) or if it actually improves code organization. Can't say I'm a fan of splitting extractors into 2 places but maybe that's just what I'm being used to. |
I was attempting to fix
Well, it is not really "slow" considering it's written in Python, but its startup could be faster and it is getting progressively slower as more and more extractor modules are added.
Exactly, but I assume most users run gallery-dl with only one URL at a time, and making the process of finding a matching
Well... not really. I can measure an insignificant reduction by ~10-20 ms (460ms -> 440ms) on my machine for loading all modules when inputting an unsupported "URL". I had hoped that there would be a more noticeable effect, but oh well. Better than nothing, I guess. It should at least reduce the amount of memory used by gallery-dl since a lot of static resources are no longer loaded by default unless needed, and the overhead is insignificant once loaded for the first time.
One could argue that #4504 was already the first step in this direction, and nobody "complained" then. Removing tests made the code a lot more readable than, for example, the code of yt-dlp extractors, which are usually 50% test data... |
|
Yeah based on that ~10-20 ms reduction I would say this change doesn't make sense as an optimization. On the other hand, if you prefer the code organized this way then sure go for it. Personally I wouldn't do it, but don't really have any good reasons beside subjective taste. And if we might have hundreds to thousands of lines of GraphQL stuff then at least separating that sounds like a good idea. |
|
Guess I'll revert most of the changes and apply this to only GraphQL queries and other larger utility functions like DA Tiptap-to-HTML, Twitter Transaction ID, and Tsumino JSURL, i.e. no API interface code. |
In an attempt at reducing resource usage when running gallery-dl and searching a suitable extractor for a URL, I've moved several static "resources" (mostly GraphQL queries) and functions & classes (mostly API interfaces) out of their main extractor modules into separate
/extractor/utilsmodules so they don't get needlessly loaded when importing extractor modules and matching their RegExp patterns.It is currently 7k lines of code out of 40k or 270 kB of Python bytecode that won't get loaded when searching an extractor class, but only when needed.
Let me know what you think of this idea, what else could/should be exported, and if this whole ordeal broke anything.