Positional Encoding Shape Mismatch in Clay Encoder #377
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
RuntimeError: The size of tensor a (1024) must match the size of tensor b (1020) at non-singleton dimension 2
Root Cause
The error was caused by incorrect dimensionality in the positional metadata encoding. Specifically:
The time and latlon tensors were incorrectly shaped as [B, 2] instead of [B, 4], resulting in a combined metadata tensor of shape [B, L, 8] instead of the expected [B, L, 12].
This led to a mismatch when adding positional encodings to the patch embeddings, which are shaped [B, L, 1024].
Fixes Applied
Corrected the normalize_timestamp() and normalize_latlon() functions to return 4 components each: sin/cos(week/hour) and sin/cos(lat/lon) respectively.
Ensured the metadata tensors are properly stacked and repeated to match the patch sequence length (L).
Verified that the final positional encoding tensor matches the expected shape [B, L, 1024] before addition.
Validation
Encoder now runs successfully with wall-to-wall Sentinel-2 inputs.