Skip to content

Conversation

@HanatoK
Copy link
Member

@HanatoK HanatoK commented Nov 11, 2025

This PR depends on #826, and will be marked as ready once #826 is merged. (Done)

This PR implements the GPU-resident RMSD calculation. However, using the GPU-resident RMSD as a sub CV is not supported yet.

@HanatoK HanatoK marked this pull request as ready for review November 14, 2025 20:10
@HanatoK
Copy link
Member Author

HanatoK commented Nov 14, 2025

This commit also tries to fix #871. I am still waiting @jhenin's test about the Jacobian term with atomPermutation...

Comment on lines 987 to +1062
for (size_t ia = 0; ia < atoms->size(); ia++) {
const cvm::atom_pos pos_ia(
atoms->pos_x(ia), atoms->pos_y(ia), atoms->pos_z(ia));
value += (pos_ia - ref_pos[ref_pos_index++]).norm2();
}
if (value < x.real_value) {
x.real_value = value;
best_perm_index = ip;
const cvm::real diff_x = atoms->pos_x(ia) - ref_pos_x[ia];
const cvm::real diff_y = atoms->pos_y(ia) - ref_pos_y[ia];
const cvm::real diff_z = atoms->pos_z(ia) - ref_pos_z[ia];
permutation_msds[ip] += diff_x * diff_x + diff_y * diff_y + diff_z * diff_z;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant removing this loop here. Couldn't the difference of ref_pos_soa and atom_pos be computed in a single 3n-vector operation? This would require an accessor to the whole atom_pos vector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atom_pos is indeed continuous and in size of 3 * atoms->size(). You can access the beginning of the vector by using &(atom->pos_x(0)), but the problem is how to access ref_pos_soa by multiple permutations. The current memory layout of ref_pos_soa is xxx...xxxyyy...yyyzzz...zzz, and the lenght of x, y and z coordinates are all 3 * atoms->size() * n_permutations. I don't know if it is worthwhile changing it to a mixture of SOA and AOS like xxxyyyzzz...xxxyyyzzz where there are n_permutations xxxyyyzzz segements.

Copy link
Member

@jhenin jhenin Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of all permutations in a single vector is not a very important design choice. ref_pos_soa may be better as a vector of one xxxyyyzzz vector for each permutation, which would give them the same shape as atom_pos.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhenin I'm trying what you suggested. While the CPU code is fine to use std::vector<cvm::ag_vector_real_t>, it's difficult to use the same layout for the GPU code when using CUDA graphs. When using CUDA graphs, it's difficult to select the reference position buffer that has the minimal RMSD. There are two solutions:

  1. Update the graph parameter every step. This would be slow;
  2. Use a double device pointer. But then the copy of reference positions from host to device would be cumbersome. The current GPU code circumvents this issue by just passing best_perm_index into the GPU kernel for indexing in calc_gradients_rmsd_kernel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants