Skip to content

Failed to load data from "Console", "Scalars", and "Debug Samples" sections #304

@fedyhajali

Description

@fedyhajali

Hi, I'm having a problem with retrieving data for the Console, Scalars, and Debug Samples sections.

The error is the following:

Image

I've upgraded clearml-server using the (guide) but the error is the same.

I tried to inspect the logs of all docker containers.

in the Elastic container I found this error:
{"@timestamp":"2025-10-24T15:06:02.310Z", "log.level": "WARN", "message":"path: /events-log-d1bd92a3b039400cbafc60a7a5b1e52b/_search, params: {index=events-log-d1bd92a3b039400cbafc60a7a5b1e52b}, status: 503", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[clearml][transport_worker][T#7]","log.logger":"rest.suppressed","elasticsearch.cluster.uuid":"OrRFbiRWSouzPWdpjCwrQg","elasticsearch.node.id":"_aiygnWISgetdkhPNK2w9A","elasticsearch.node.name":"clearml","elasticsearch.cluster.name":"clearml","error.type":"org.elasticsearch.action.search.SearchPhaseExecutionException","error.message":"all shards failed","error.stack_trace":"Failed to execute phase [query], all shards failed; shardFailures {[_na_][events-log-d1bd92a3b039400cbafc60a7a5b1e52b][0]: org.elasticsearch.action.

in apiserver container

[2025-10-27 08:30:02,000] [9] [ERROR] [clearml.queue_metrics] Failed collecting queue metrics: ApiError(503, 'unavailable_shards_exception', '[queue_metrics_d1bd92a3b039400cbafc60a7a5b 1e52b_2025-10][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2025-10][0]] containing [index {[queue_metrics_ d1bd92a3b039400cbafc60a7a5b1e52b_2025-10][ZETIJJoBxdNb2Or0-MCv], source[{"timestamp":1761553561994,"queue":"9b1b026812214d69906dfb7c365d4b1c","average_waiting_time":0,"queue_length":0} ]}]]') Traceback (most recent call last): File "/opt/clearml/apiserver/bll/queue/queue_metrics.py", line 320, in start queue_metrics.log_queue_metrics_to_es(queue.company, [queue]) File "/opt/clearml/apiserver/bll/queue/queue_metrics.py", line 83, in log_queue_metrics_to_es self.es.index(index=es_index, document=queue_doc) File "/usr/local/lib/python3.9/site-packages/elasticsearch/_sync/client/utils.py", line 455, in wrapped return api(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/elasticsearch/_sync/client/__init__.py", line 2470, in index return self.perform_request( # type: ignore[return-value] File "/usr/local/lib/python3.9/site-packages/elasticsearch/_sync/client/_base.py", line 271, in perform_request response = self._perform_request( File "/usr/local/lib/python3.9/site-packages/elasticsearch/_sync/client/_base.py", line 352, in _perform_request raise HTTP_EXCEPTIONS.get(meta.status, ApiError)( elasticsearch.ApiError: ApiError(503, 'unavailable_shards_exception', '[queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2025-10][0] primary shard is not active Timeout: [1m], request: [ BulkShardRequest [[queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2025-10][0]] containing [index {[queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2025-10][ZETIJJoBxdNb2Or0-MCv], source [{"timestamp":1761553561994,"queue":"9b1b026812214d69906dfb7c365d4b1c","average_waiting_time":0,"queue_length":0}]}]]') [2025-10-27 08:30:07,241] [9] [WARNING] [elastic_transport.node_pool] Node <Urllib3HttpNode(http://elasticsearch:9200)> has failed for 2 times in a row, putting on 2 second timeout [2025-10-27 08:30:07,242] [9] [WARNING] [elastic_transport.transport] Retrying request after non-successful status 503 (attempt 1 of 3)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions