Skip to content

Address distributed non-ordered indexing #914

@ClaudiaComito

Description

@ClaudiaComito

Related
All kind of stuff depends on distributed non-ordered indexing. Here a sample of issues/PRs where the problem has come up in various forms:

#607 #760 #903 #703 #824 #902 #177 #621 #749 #271 #857

Feature functionality

We want to be able to index a distributed DNDarray with a distributed, non-ordered key, and return the correct, stable result. Examples below. An implementation of this functionality via Alltoallv is available in ht.sort(), needs to be generalized.

>>> a = ht.arange(50, split=0)
>>> a
DNDarray([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
          27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49], dtype=ht.int32, device=cpu:0, split=0)
>>> b = ht.random.randint(0,50,(20,), dtype=ht.int64, split=0)
>>> b
DNDarray([46,  5, 44,  8, 14, 10, 30, 15, 34, 30, 41, 44, 28, 26, 11, 20, 16,  7,  9,  8], dtype=ht.int64, device=cpu:0, split=0)
>>> c = a[b]
>>> c
DNDarray([46,  5, 44,  8, 14, 10, 30, 15, 34, 30, 41, 44, 28, 26, 11, 20, 16,  7,  9,  8], dtype=ht.int32, device=cpu:0, split=0)

In the current implementation, c = a[b] returns a distributed DNDarray populated by whichever subset of the key is process-local.

On 2 processes:

c =  DNDarray([ 5,  8, 14, 10, 15, 11, 20, 16,  7,  9,  8, 46, 44, 30, 34, 30, 41, 44, 28, 26], dtype=ht.int32, device=cpu:0, split=0)

On 3 processes:

c =  DNDarray([ 5,  8, 14, 10, 15, 11, 16,  7,  9,  8, 30, 30, 28, 26, 20, 46, 44, 34, 41, 44], dtype=ht.int32, device=cpu:0, split=0)

Metadata

Metadata

Assignees

Labels

High priority, urgentMPIAnything related to MPI communicationbugSomething isn't workingenhancementNew feature or requestindexingredistributionRelated to distributed tensorsstale

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions