Skip to content

Error: Peer-to-peer access is unsupported on this platform. #57

@hh001-cyber

Description

@hh001-cyber

python examples/offline.py
INFO 11:29:56 For some LLaMA-based models, initializing the fast tokenizer may take a long time. To eliminate the initialization time, consider using 'hf-internal-testing/llama-tokenizer' instead of the original tokenizer.
INFO 11:29:56 Initializing placement group
2025-02-21 11:29:57,538 INFO worker.py:1832 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
INFO 11:29:58 Initializing context stage LLM engine
INFO 11:29:58 For some LLaMA-based models, initializing the fast tokenizer may take a long time. To eliminate the initialization time, consider using 'hf-internal-testing/llama-tokenizer' instead of the original tokenizer.
INFO 11:29:59 Initializing decoding stage LLM engine
INFO 11:29:59 For some LLaMA-based models, initializing the fast tokenizer may take a long time. To eliminate the initialization time, consider using 'hf-internal-testing/llama-tokenizer' instead of the original tokenizer.
INFO 11:29:59 Initializing CONTEXT workers
INFO 11:29:59 Initializing workers
INFO 11:29:59 Initializing DECODING workers
INFO 11:29:59 Initializing workers
INFO 11:30:01 Initializing DECODING models
INFO 11:30:01 Initializing CONTEXT models
(ParaWorker pid=1410260) INFO 11:30:01 Worker decoding.#0 created on host iZ1pp01libkyqvxvj169tyZ and gpu #1
INFO 11:30:22 Initializing DECODING kvcaches
INFO 11:30:22 Profiling available blocks
INFO 11:30:22 Profiling result: num_gpu_blocks: 3492, num_cpu_blocks: 128
INFO 11:30:22 Allocating kv cache
(ParaWorker pid=1410260) INFO 11:30:22 (worker decoding.#0) model /data/zz/llama2 loaded
(ParaWorker pid=1410260) INFO 11:30:22 runtime peak memory: 12.721 GB
(ParaWorker pid=1410260) INFO 11:30:22 total GPU memory: 44.451 GB
(ParaWorker pid=1410260) INFO 11:30:22 kv cache size for one token: 0.50000 MB
(ParaWorker pid=1410260) INFO 11:30:22 num_gpu_blocks: 3492
(ParaWorker pid=1410260) INFO 11:30:22 num_cpu_blocks: 128
(ParaWorker pid=1410261) INFO 11:30:01 Worker context.#0 created on host iZ1pp01libkyqvxvj169tyZ and gpu #0
INFO 11:30:23 Scheduler: FCFS(max_batch_size=4, max_tokens_per_batch=16384)
INFO 11:30:23 Block manager: BlockManager(max_num_gpu_blocks=3492, max_num_cpu_blocks=128, blocksize=16)
INFO 11:30:23 Initializing CONTEXT kvcaches
INFO 11:30:23 Profiling available blocks
INFO 11:30:23 Profiling result: num_gpu_blocks: 3492, num_cpu_blocks: 128
INFO 11:30:23 The engine performs context stage, setting num_cpu_blocks to 1
INFO 11:30:23 Allocating kv cache
INFO 11:30:23 Scheduler: FCFS(max_batch_size=4, max_tokens_per_batch=16384)
INFO 11:30:23 Block manager: BlockManager(max_num_gpu_blocks=3492, max_num_cpu_blocks=1, blocksize=16)
(ParaWorker pid=1410260) Error: Peer-to-peer access is unsupported on this platform.
(ParaWorker pid=1410260) In the current version of distserve, it is necessary to use a platform that supports GPU P2P access.
(ParaWorker pid=1410260) Exiting...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions