Optimize algorithm for the best contraction permutation. #30

tyzhang1993 · 2018-09-10T20:20:35Z

This PR optimizes the algorithm to choose the best contraction permutation.

Summary

Corrected error in computation of cpu_cost.
Modify the algorithm to consider permutation cost.
Avoid avoidable permutation of intermediate tensors in contraction.
Avoid permuting PA, PB, PC in contraction if they are already the outer most indices.
Avoid double permutation of the same tensor in the batched algorithm.
Taking care of permutation symmetry.

Detail and Examples

Corrected error in computation of `cpu_cost`.

Example: C2["ijrs"] += B["gar"] * B["gbs"] * T2["ijab"]
Dimensions: i(4), j(8), r(60), s(8), g(280), a(60), b(60)
The two ways of contraction permutations are:

C2[ijrs]+={$[arbs]=B[gar]*B[gbs]}*T2[ijab]
Original code will give cpu_cost = g(ar+bs) + ab(ij+rs)= 2,985,600
Corrected code will give cpu_cost = garbs + abijrs= 539,136,000 FLOPs
C2[ijrs]+={$[ijags]=T2[ijab]*B[gbs]}*B[gar]
Original code will give cpu_cost = b(ija+gs) + ga(ijs+r)= 5,558,400
Corrected code will give cpu_cost = bijags + gaijsr= 516,096,000 FLOPs

In this situation, the original code will choose contraction permutation 1, but the corrected code will choose 2.
Note here that in the original code, cpu_cost actually give the memory cost of all the contracted tensors. (eg. in 1, the total number of elements in $[arbs], B[gar], B[gbs], T2[ijab] is 2,985,600)

Avoid avoidable permutation of intermediate tensors in contraction.

Example: D["efgh"] = A["abdc"] * B["cdef"] * C["bagh"]
Assuming the optimizer decide to contract A and B first, then the result with C.
According to the current implementation, the intermediated tensor will be decided to be $[abef], and the contraction will be performed in two steps:
(1) $[abef] = A["abdc"] * B["cdef"]
(2) D["efgh"] = $[abef] * C["bagh"]
In case the size of A["abdc"] < B["cdef"], $[abef] < C["bagh"]:
In (1), A["abdc"] will be permuted to A["abcd"], then in (2), $[abef] will be permuted to $[baef].
Actually, the latter can be avoided if the intermediate tensor is decided to be $[baef]:
(1) $[baef] = A["abdc"] * B["cdef"]
(2) D["efgh"] = $[baef] * C["bagh"]
The computational cost of the permutation from A["abdc"] to A["abcd"] is the same as permutation to A["bacd"].
The computational cost of the intermediate permutation can be avoided by a wiser decision on the index order of the intermediate tensor.

Avoid permuting `PA`, `PB`, `PC` in contraction if they are already the outer most indices.

Example: C["abij"] = A["baik"] * B["abkj"]
Since a and b are Hadamard indices and are in different orders in C["abij"] and A["baik"], the current implementation will permute A["baik"] to A["abik"] first.
However, this permutation can be avoided by locating the correct ik batch in A when looping over Hadamard indices rather than explicitly permute them.

NOTE: Currently, the decision of permuting regarding P is after the decision of i,j,k, but this should be moved before the decision of i,j,k permutation.

Avoid double permutation of the same tensor in the batched algorithm.

Example: C["abcd"] = batched("a", A["eadf"] * B["efbc"])
In the current implementation, A["eadf"] will be first permute to A["aedf"]. Then in the contraction, each batch A["edf"] need to be permuted again to A["def"].
The latter can actually be avoided if in the first permutation A["eadf"] is permuted to A["adef"].

Status

Ready to go.

jturney · 2018-09-11T14:29:27Z

Do you have any timings that support this? I'm just curious.

tyzhang1993 · 2018-09-11T14:47:30Z

@jturney I have a little. I will definitely add some benchmarks in the PR.
This PR is intended to be one investigating more ways to promote the performance. For example, some permutations may have the same CPU_cost and memory_cost by the current code, but they may experience different numbers of index permutations which is not taken into account in the current algorithm.

Corrected computation of result_size.

96f8891

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize algorithm for the best contraction permutation. #30

Optimize algorithm for the best contraction permutation. #30

Uh oh!

tyzhang1993 commented Sep 10, 2018 •

edited

Loading

Uh oh!

jturney commented Sep 11, 2018

Uh oh!

tyzhang1993 commented Sep 11, 2018 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimize algorithm for the best contraction permutation. #30

Are you sure you want to change the base?

Optimize algorithm for the best contraction permutation. #30

Uh oh!

Conversation

tyzhang1993 commented Sep 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Detail and Examples

Corrected error in computation of cpu_cost.

Avoid avoidable permutation of intermediate tensors in contraction.

Avoid permuting PA, PB, PC in contraction if they are already the outer most indices.

Avoid double permutation of the same tensor in the batched algorithm.

Status

Uh oh!

jturney commented Sep 11, 2018

Uh oh!

tyzhang1993 commented Sep 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tyzhang1993 commented Sep 10, 2018 •

edited

Loading

Corrected error in computation of `cpu_cost`.

Avoid permuting `PA`, `PB`, `PC` in contraction if they are already the outer most indices.

tyzhang1993 commented Sep 11, 2018 •

edited

Loading