why the result of KI slower than the result of SA ? 

As described in figure 8 of _Offloading communication control logic in GPU accelerated applications_ article, KI model is faster than SA model. But I use libmp benchmark mp_pingpong_all  in my ubuntu with P4 gpu and mlx5 nic, I get a result showing KI is almost double latency of SA. So, I wonder if the result of this article is not tested under the benchmark of libmp? If yes, what test samples dose the article use ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

why the result of KI slower than the result of SA ? #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

why the result of KI slower than the result of SA ? #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions