Optimize algorithm for the best contraction permutation. #30
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR optimizes the algorithm to choose the best contraction permutation.
Summary
cpu_cost.PA,PB,PCin contraction if they are already the outer most indices.Detail and Examples
Corrected error in computation of
cpu_cost.Example:
C2["ijrs"] += B["gar"] * B["gbs"] * T2["ijab"]Dimensions: i(4), j(8), r(60), s(8), g(280), a(60), b(60)
The two ways of contraction permutations are:
C2[ijrs]+={$[arbs]=B[gar]*B[gbs]}*T2[ijab]Original code will give
cpu_cost = g(ar+bs) + ab(ij+rs)= 2,985,600Corrected code will give
cpu_cost = garbs + abijrs= 539,136,000 FLOPsC2[ijrs]+={$[ijags]=T2[ijab]*B[gbs]}*B[gar]Original code will give
cpu_cost = b(ija+gs) + ga(ijs+r)= 5,558,400Corrected code will give
cpu_cost = bijags + gaijsr= 516,096,000 FLOPsIn this situation, the original code will choose contraction permutation 1, but the corrected code will choose 2.
Note here that in the original code,
cpu_costactually give the memory cost of all the contracted tensors. (eg. in 1, the total number of elements in$[arbs],B[gar],B[gbs],T2[ijab]is 2,985,600)Avoid avoidable permutation of intermediate tensors in contraction.
Example:
D["efgh"] = A["abdc"] * B["cdef"] * C["bagh"]Assuming the optimizer decide to contract
AandBfirst, then the result withC.According to the current implementation, the intermediated tensor will be decided to be
$[abef], and the contraction will be performed in two steps:(1)
$[abef] = A["abdc"] * B["cdef"](2)
D["efgh"] = $[abef] * C["bagh"]In case the size of
A["abdc"] < B["cdef"],$[abef] < C["bagh"]:In (1),
A["abdc"]will be permuted toA["abcd"], then in (2),$[abef]will be permuted to$[baef].Actually, the latter can be avoided if the intermediate tensor is decided to be
$[baef]:(1)
$[baef] = A["abdc"] * B["cdef"](2)
D["efgh"] = $[baef] * C["bagh"]The computational cost of the permutation from
A["abdc"]toA["abcd"]is the same as permutation toA["bacd"].The computational cost of the intermediate permutation can be avoided by a wiser decision on the index order of the intermediate tensor.
Avoid permuting
PA,PB,PCin contraction if they are already the outer most indices.Example:
C["abij"] = A["baik"] * B["abkj"]Since
aandbare Hadamard indices and are in different orders inC["abij"]andA["baik"], the current implementation will permuteA["baik"]toA["abik"]first.However, this permutation can be avoided by locating the correct
ikbatch inAwhen looping over Hadamard indices rather than explicitly permute them.NOTE: Currently, the decision of permuting regarding P is after the decision of i,j,k, but this should be moved before the decision of i,j,k permutation.
Avoid double permutation of the same tensor in the batched algorithm.
Example:
C["abcd"] = batched("a", A["eadf"] * B["efbc"])In the current implementation,
A["eadf"]will be first permute toA["aedf"]. Then in the contraction, each batchA["edf"]need to be permuted again toA["def"].The latter can actually be avoided if in the first permutation
A["eadf"]is permuted toA["adef"].Status