-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Currently, each pexcz shared library binary contains a copy of __main__.py for CZEX boot, interpreter.py for interpreter identification, venv_pex.py for venv boot and virtualenv.py for seeding Python 2 venvs. That looks like this uncompressed:
:; la -1sh src/lib/*.py src/python/pexcz/__init__.py ../pex/pex/venv/virtualenv_16.7.12_py
108K ../pex/pex/venv/virtualenv_16.7.12_py
32K src/lib/interpreter.py
16K src/lib/venv_pex.py
20K src/python/pexcz/__init__.pyIn a multi-platform CZEX, there will be one pexcz shared library per platform targeted, currently up to 7:
:; zipinfo empty.czex
Archive: empty.czex
Zip file size: 2377997 bytes, number of entries: 9
?rw-r--r-- 6.3 unx 764 b- defN 80-Jan-01 00:00 PEX-INFO
-rw-rw-rw- 6.3 unx 783360 b- defX 80-Jan-01 00:00 .lib/x86_64-windows/pexcz.dll
-rw-rw-rw- 6.3 unx 711680 b- defX 80-Jan-01 00:00 .lib/aarch64-windows/pexcz.dll
-rw-rw-rw- 6.3 unx 677639 b- defX 80-Jan-01 00:00 .lib/x86_64-macos/libpexcz.dylib
-rw-rw-rw- 6.3 unx 769112 b- defX 80-Jan-01 00:00 .lib/x86_64-linux-gnu/libpexcz.so
-rw-rw-rw- 6.3 unx 730880 b- defX 80-Jan-01 00:00 .lib/aarch64-macos/libpexcz.dylib
-rw-rw-rw- 6.3 unx 720880 b- defX 80-Jan-01 00:00 .lib/aarch64-linux-gnu/libpexcz.so
-rw-rw-rw- 6.3 unx 844880 b- defX 80-Jan-01 00:00 .lib/powerpc64le-linux-gnu/libpexcz.so
-rw-rw-rw- 6.3 unx 18162 b- defX 80-Jan-01 00:00 __main__.py
9 files, 5257357 bytes uncompressed, 2376795 bytes compressed: 54.8%This means these 3 files data is duplicated 6x. Consider including these files one time in the CZEX itself. The access will be less convenient and more expensive, but the more expensive access is only incurred when identifying an interpreter for the first time or creating a venv for the first time. There is a space / speed tradeoff here that should be analyzed / experimented with and decided on. It may make sense to selectively dedup, namely just target virtualenv.py since it is >50% of total duplicated .py data and will - presumably - be used rarely amongst all CZEXes in the wild.