Lazy load extractor resources & utilities #8463

mikf · 2025-10-25T20:07:35Z

In an attempt at reducing resource usage when running gallery-dl and searching a suitable extractor for a URL, I've moved several static "resources" (mostly GraphQL queries) and functions & classes (mostly API interfaces) out of their main extractor modules into separate /extractor/utils modules so they don't get needlessly loaded when importing extractor modules and matching their RegExp patterns.

It is currently 7k lines of code out of 40k or 270 kB of Python bytecode that won't get loaded when searching an extractor class, but only when needed.

Let me know what you think of this idea, what else could/should be exported, and if this whole ordeal broke anything.

lazy-loads extractor utilities from extractor/utils/*.py modules

use 'basecategory' instead of plain 'category' as 'key'

remove 'utilsb()'

add 'utils' argument

thatfuckingbird · 2025-10-26T12:29:53Z

Interesting, where did this idea come from? I never noticed gallery-dl being slow. I do use extractor.find in hydownloader to provide some info about whether a URL is supported and by what extractor, but since that's done in a long-running daemon process on request, the load time reduction would only apply at the very first call if I understand correctly.

Do you have benchmarks before vs. after timings? Personally I would only consider doing something like this if there is significant measurable impact (and it actually matters for some use case) or if it actually improves code organization. Can't say I'm a fan of splitting extractors into 2 places but maybe that's just what I'm being used to.

mikf · 2025-10-28T18:04:58Z

where did this idea come from?

I was attempting to fix joyreactor extractors (#6642) and noticed that it would require several hundreds or thousands of lines of GraphQL queries. This is the idea I came up with, although it spiraled a bit out of control.

I never noticed gallery-dl being slow

Well, it is not really "slow" considering it's written in Python, but its startup could be faster and it is getting progressively slower as more and more extractor modules are added.

the load time reduction would only apply at the very first call if I understand correctly.

Exactly, but I assume most users run gallery-dl with only one URL at a time, and making the process of finding a matching pattern is somewhat important in that case. At least it something I like to improve and work on, for what it's worth.

Do you have benchmarks before vs. after timings?

Well... not really. I can measure an insignificant reduction by ~10-20 ms (460ms -> 440ms) on my machine for loading all modules when inputting an unsupported "URL". I had hoped that there would be a more noticeable effect, but oh well. Better than nothing, I guess.

It should at least reduce the amount of memory used by gallery-dl since a lot of static resources are no longer loaded by default unless needed, and the overhead is insignificant once loaded for the first time.

Can't say I'm a fan of splitting extractors into 2 places

One could argue that #4504 was already the first step in this direction, and nobody "complained" then. Removing tests made the code a lot more readable than, for example, the code of yt-dlp extractors, which are usually 50% test data...

thatfuckingbird · 2025-10-30T14:49:51Z

Yeah based on that ~10-20 ms reduction I would say this change doesn't make sense as an optimization. On the other hand, if you prefer the code organized this way then sure go for it. Personally I wouldn't do it, but don't really have any good reasons beside subjective taste. And if we might have hundreds to thousands of lines of GraphQL stuff then at least separating that sounds like a good idea.

mikf · 2025-10-30T16:31:45Z

Guess I'll revert most of the changes and apply this to only GraphQL queries and other larger utility functions like DA Tiptap-to-HTML, Twitter Transaction ID, and Tsumino JSURL, i.e. no API interface code.

mikf added 30 commits October 22, 2025 17:52

create 'extractor/utils' directory

995bf18

[common] add 'utils()' & 'resource()' methods

a5004b5

lazy-loads extractor utilities from extractor/utils/*.py modules

[behance] move GraphQL queries

bb3b1b0

[500px] move GraphQL queries

bb6ca38

[mangapark] move GraphQL queries

fb5a75f

[luscious] move GraphQL queries

a5c9242

[scrolller] move GraphQL queries

b5152e1

[tsumino] move 'parse_jsurl()'

1def377

[pixiv] move API interface

3491b6d

[civitai] move API interface classes

809b877

[instagram] move API interfaces

c070edd

[wallhaven] move API interface

7e29470

[arcalive] move API interface

6ff35a1

[inkbunny] move API interface

c8a2d7b

[bluesky] move API interface

6cb8580

[kemono] move API interface

172a15a

[reddit] move API interface

5453982

[tumblr] move API interface

98a2efe

[common] fix Extractor.utils() for BaseExtractor instances

7930176

use 'basecategory' instead of plain 'category' as 'key'

[mastodon] move API interface

975b56d

[pinterest] move API interface

cca035d

[mangadex] move API interface

df9f5fd

[itaku] move API interface

7640991

[imgur] move API interface

658718a

[iwara] move API interface

f59d648

[blogger] move API interface

a1453af

[boosty] move API interface

08e10ab

[smugmug] move API interface

09a41b8

[common] use 'utilsb()' for basecategory utilities

0e9a5ba

[sankaku] move API interface

07ebbb6

mikf added 18 commits October 24, 2025 19:54

[bilibili] move API interface

d7ce824

[discord] move API interface

06da45c

[fansly] move API interface

5ed0e06

[lofter] move API interface

2d27d1e

[pexels] move API interface

4962236

[imagechest] move API interface

2d1a3ee

[misskey] move API interface

2973c9c

[philomena] move API interface

f0fce33

[twibooru] move API interface

1737c3a

[redgifs] move API interface

628a180

[twitter] move API interface and transaction_id.py

e63a312

[flickr] move API interface

49d2917

[deviantart] move API interfaces & Journal templates

2314960

[oauth] fix API imports

a4c7141

[common] add optional 'name' argument to 'utils()'

d8e0e9d

remove 'utilsb()'

[deviantart] export 'tiptap' functions

ddc326d

[cache] restore lookup paths

ddb66f8

add 'utils' argument

[build] update PyInstaller hiddenimports and py2exe modules

2cfadf2

mikf added the core:enhancement label Oct 25, 2025

mikf changed the title ~~Lazy load extractor resources & utility classes~~ Lazy load extractor resources & utilities Oct 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Lazy load extractor resources & utilities #8463

Lazy load extractor resources & utilities #8463

mikf commented Oct 25, 2025

Uh oh!

thatfuckingbird commented Oct 26, 2025

Uh oh!

mikf commented Oct 28, 2025 •

edited

Loading

Uh oh!

thatfuckingbird commented Oct 30, 2025

Uh oh!

mikf commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Lazy load extractor resources & utilities #8463

Are you sure you want to change the base?

Lazy load extractor resources & utilities #8463

Conversation

mikf commented Oct 25, 2025

Uh oh!

thatfuckingbird commented Oct 26, 2025

Uh oh!

mikf commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thatfuckingbird commented Oct 30, 2025

Uh oh!

mikf commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mikf commented Oct 28, 2025 •

edited

Loading