Skip to content

Intermittent Template Download Failure (NOT_DOWNLOADED) in CloudStack 4.20.1.0 with Linstor Primary Storage #11905

@gw769

Description

@gw769

Hi all,

I’m encountering an intermittent issue when uploading templates to CloudStack 4.20.1.0 where Linstor is used as Primary Storage. Here’s the detailed breakdown:

CloudStack Version: 4.20.1.0
Primary Storage:
Linstor-controller 1.32.3-1ppa1-noble1
Linstor-satellite 1.32.3-1ppa1-noble1
DRBD_KERNEL_VERSION=9.2.14
System: Ubuntu 24.04.3 LTS
Hypervisor: KVM

When uploading a template (e.g., an Ubuntu 22.04 qcow2 image), the process intermittently fails with NOT_DOWNLOADED in Primary Storage. In my tests, 2 out of 3 upload attempts fail, while 1 succeeds randomly.

Two critical symptoms:

  1. qemu-img Binary Not Found: The CloudStack agent not execute qemu-img convert (verified via strace ):
    ps -ef | grep cloudstack-agent
    strace -f -s 256 -p 1318808 -o /tmp/agent_trace.log

can't find execute qemu-img convert like this:

execve("/usr/local/sbin/qemu-img", ["qemu-img", "convert", "-n", "--target-is-zero", "-W", "-S", "1M", "-O", "raw", "-t", "none", "-U", "--image-opts", "driver=qcow2,file.filename=/mnt/.../xxx.qcow2", "/dev/drbd1105"], ...) = -1 ENOENT (No such file or directory)

DRBD Devices Stay “Unused”: Linstor resource listings show DRBD devices as Unused (instead of one of device InUse):

root@NODE76:~# linstor v l |grep  80bc7788-638e-4211-9b31-d76d037b50b3
| cs-80bc7788-638e-4211-9b31-d76d037b50b3    | NODE158 | DfltDisklessStorPool |     0 |    1105 | /dev/drbd1105 |            | Unused |   Diskless | Established(2) |
| cs-80bc7788-638e-4211-9b31-d76d037b50b3    | NODE76  | x86_pool_zfs_ssd     |     0 |    1105 | /dev/drbd1105 |  20.00 GiB | Unused |   UpToDate | Established(2) |
| cs-80bc7788-638e-4211-9b31-d76d037b50b3    | NODE83  | x86_pool_zfs_ssd     |     0 |    1105 | /dev/drbd1105 |  20.00 GiB | Unused |   UpToDate | Established(2) |

Steps to Reproduce:

  1. Upload a qcow2 template (e.g., ubuntu-22.04.4-amd64-base-UEFI.qcow2) to CloudStack.
  2. Monitor the Primary Storage (Linstor) download status.
  3. Repeat the upload 3+ times and observe intermittent NOT_DOWNLOADED failures.

Expected Behavior:
The template should download to Primary Storage successfully every time, with qemu-img convert executing properly and DRBD devices showing InUse.

Actual Behavior:

  • Intermittent failures (2 out of 3 attempts fail).
  • DRBD resources remain Unused during failed attempts.

Questions:

  • What configuration or troubleshooting steps should I take to resolve this intermittent behavior?

Any insights or guidance would be greatly appreciated!

Thanks,

Image

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions