Skip to content
This repository was archived by the owner on Jan 6, 2023. It is now read-only.
This repository was archived by the owner on Jan 6, 2023. It is now read-only.

EtcdStore: AttributeError: can't set attribute #152

@vv-p

Description

@vv-p

Hi,

I have the following error when I try to run my code with torchelastic:

Creating EtcdStore as the c10d::Store implementation

Traceback (most recent call last):
 File "/opt/conda/lib/python3.8/site-packages/torchelastic/distributed/launch.py", line 531, in main
 run_result = elastic_agent.run(spec.role)
 File "/opt/conda/lib/python3.8/site-packages/torchelastic/metrics/api.py", line 126, in wrapper
 result = f(*args, **kwargs)
 File "/opt/conda/lib/python3.8/site-packages/torchelastic/agent/server/api.py", line 680, in run
 result = self._invoke_run(role)
 File "/opt/conda/lib/python3.8/site-packages/torchelastic/agent/server/api.py", line 802, in _invoke_run
 self._initialize_workers(self._worker_group)
 File "/opt/conda/lib/python3.8/site-packages/torchelastic/metrics/api.py", line 126, in wrapper
 result = f(*args, **kwargs)
 File "/opt/conda/lib/python3.8/site-packages/torchelastic/agent/server/api.py", line 654, in _initialize_workers
 self._rendezvous(worker_group)
 File "/opt/conda/lib/python3.8/site-packages/torchelastic/metrics/api.py", line 126, in wrapper
 result = f(*args, **kwargs)
 File "/opt/conda/lib/python3.8/site-packages/torchelastic/agent/server/api.py", line 518, in _rendezvous
 store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous()
 File "/opt/conda/lib/python3.8/site-packages/torchelastic/rendezvous/etcd_rendezvous.py", line 157, in next_rendezvous
 store = self._rdzv_impl.setup_kv_store(rdzv_version)
 File "/opt/conda/lib/python3.8/site-packages/torchelastic/rendezvous/etcd_rendezvous.py", line 975, in setup_kv_store
 return EtcdStore(etcd_client=self.client, etcd_store_prefix=store_path)
 File "/opt/conda/lib/python3.8/site-packages/torchelastic/rendezvous/etcd_rendezvous.py", line 997, in __init__
 self.timeout = (
AttributeError: can't set attribute

Steps to reproduce:

>>> from torch.distributed import Store
>>> class A(Store):
...     def __init__(self):
...             super().__init__()
...             self.timeout = 1
...
>>> a = A()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __init__
AttributeError: can't set attribute

I've tried several different versions of torch and torchelastic (latest stable included) but nothing happened, error is still here. Can you help me please, what does this error mean ? How I can fix it ?

os centos 7
python python3.8.3
torch 1.9.0
torchelastic 0.2.2
python-etcd 0.4.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions