-
Notifications
You must be signed in to change notification settings - Fork 54
Description
The CosmoFlow training benchmark fails during data generation when using TensorFlow 2.20.0 due to an ABI (Application Binary Interface) incompatibility between TensorFlow 2.20 and the latest tensorflow-io package (0.37.1).Since pyproject.toml doesn't pin TensorFlow versions, fresh installations pull TensorFlow 2.20.0, which breaks CosmoFlow and other TensorFlow-based workloads (ResNet50).
warnings.warn(f"file system plugins are not loaded: {e}")
/home/sped/perf/mlperfv2/venv/lib/python3.10/site-packages/tensorflow_io/python/ops/init.py:98: UserWarning: unable to load libtensorflow_io_plugins.so: unable to open file: libtensorflow_io_plugins.so, from paths: ['/home/sped/perf/mlperfv2/venv/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so']
caused by: ['/home/sped/perf/mlperfv2/venv/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl5mutex6unlockEv']
This is the fix that worked for me:
pip install tensorflow==2.17.1 tensorflow-io==0.37.1
pip install -e .
Proposed Fix
Pin TensorFlow and tensorflow-io versions in pyproject.toml to ensure compatibility:
dependencies = [
"dlio-benchmark @ git+https://github.com/argonne-lcf/dlio_benchmark.git@mlperf_storage_v2.0",
"psutil>=5.9",
"pyarrow",
"tensorflow>=2.16.0,<2.18.0",
"tensorflow-io>=0.37.0,<0.38.0"
]