Skip to content

Conversation

@E-Rum
Copy link
Contributor

@E-Rum E-Rum commented Aug 20, 2025

At some point, if we want to proceed with batching, we need to introduce padding to the inputs. This is the first commit that introduces initial padding only for the direct calculator and the Coulomb potential.

Contributor (creator of pull-request) checklist

  • Tests updated (for new features and bugfixes)?
  • Documentation updated (for new features)?
  • Issue referenced (for PRs that solve an issue)?

Reviewer checklist

  • CHANGELOG updated with public API or any other important changes?

📚 Documentation preview 📚: https://torch-pme--199.org.readthedocs.build/en/199/

@E-Rum E-Rum linked an issue Oct 10, 2025 that may be closed by this pull request
@E-Rum
Copy link
Contributor Author

E-Rum commented Oct 10, 2025

I think this is it.
The main changes I’ve made so far, besides adding batching, are: I removed tests that check for a non-zero cell. Honestly, I think we can actually get rid of these permanently, because at some point an error will be raised anyway when we take cell.inv— and it will throw an error.

Another thing I’ve removed for now is the check for the periodicity boolean tensor. Previously, we allowed only 2D and 3D tensors. To enable batching, I removed these checks, so now if a 1D or 0D tensor is passed to ewald, we just silently return a 0 contribution. I think this is fine for the purposes of training models, but of course it’s not ideal from a design or user-friendly perspective.

Note: batching is now allowed only for systems of the same type: either all systems are PBC or all are non-PBC. Probably @sirmarcel implemented something fancier in the JAX PME version, but I think it’s impossible to do it here without creating a separate calculator. Therefore, training for mixed PBC systems would probably require two dataloaders, which I think is fine.

@E-Rum E-Rum requested a review from PicoCentauri October 10, 2025 14:16
Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

I think we should add one tutorial to show how the batching works in a very simple training?

pair_mask,
kvectors,
)
batched_time = time.time() - start_batched
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always problematic to fail. Maybe we do it over 5 runs and average?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add tests for the potentials?

@E-Rum
Copy link
Contributor Author

E-Rum commented Oct 14, 2025

Resolved most of the comments; in the next commit, I will introduce an example script

@E-Rum E-Rum requested a review from PicoCentauri October 15, 2025 15:45
Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing. I just have some comments regarding the example. Maybe also update the changelog?

This example demonstrates how to compute Ewald potentials for a batch of systems
with different numbers of atoms using padding. The idea is to pad atomic positions,
charges, and neighbor lists to the same length and use masks to ignore padded entries
during computation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add that batching and the allowed padding will increase the training of a model. Should be known but doesn't hurt to say again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also show a speed comparison for a batched one and a looped one...

Might be nice :-)

@E-Rum
Copy link
Contributor Author

E-Rum commented Oct 20, 2025

I had to add 3 more systems to yeild 5 in total in the example, otherwise the loop implementation, is faster :)

@E-Rum E-Rum requested a review from PicoCentauri October 20, 2025 11:51
Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Can you update the changelog and we merge this beauty!

@PicoCentauri PicoCentauri merged commit 3f16e8b into main Oct 20, 2025
13 checks passed
@PicoCentauri PicoCentauri deleted the padding_inputs branch October 20, 2025 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Batched calculator over structures

3 participants