-
Notifications
You must be signed in to change notification settings - Fork 60
Implement RMSD on GPU #881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| for (size_t ia = 0; ia < atoms->size(); ia++) { | ||
| const cvm::atom_pos pos_ia( | ||
| atoms->pos_x(ia), atoms->pos_y(ia), atoms->pos_z(ia)); | ||
| value += (pos_ia - ref_pos[ref_pos_index++]).norm2(); | ||
| } | ||
| if (value < x.real_value) { | ||
| x.real_value = value; | ||
| best_perm_index = ip; | ||
| const cvm::real diff_x = atoms->pos_x(ia) - ref_pos_x[ia]; | ||
| const cvm::real diff_y = atoms->pos_y(ia) - ref_pos_y[ia]; | ||
| const cvm::real diff_z = atoms->pos_z(ia) - ref_pos_z[ia]; | ||
| permutation_msds[ip] += diff_x * diff_x + diff_y * diff_y + diff_z * diff_z; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant removing this loop here. Couldn't the difference of ref_pos_soa and atom_pos be computed in a single 3n-vector operation? This would require an accessor to the whole atom_pos vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atom_pos is indeed continuous and in size of 3 * atoms->size(). You can access the beginning of the vector by using &(atom->pos_x(0)), but the problem is how to access ref_pos_soa by multiple permutations. The current memory layout of ref_pos_soa is xxx...xxxyyy...yyyzzz...zzz, and the lenght of x, y and z coordinates are all 3 * atoms->size() * n_permutations. I don't know if it is worthwhile changing it to a mixture of SOA and AOS like xxxyyyzzz...xxxyyyzzz where there are n_permutations xxxyyyzzz segements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure of all permutations in a single vector is not a very important design choice. ref_pos_soa may be better as a vector of one xxxyyyzzz vector for each permutation, which would give them the same shape as atom_pos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhenin I'm trying what you suggested. While the CPU code is fine to use std::vector<cvm::ag_vector_real_t>, it's difficult to use the same layout for the GPU code when using CUDA graphs. When using CUDA graphs, it's difficult to select the reference position buffer that has the minimal RMSD. There are two solutions:
- Update the graph parameter every step. This would be slow;
- Use a double device pointer. But then the copy of reference positions from host to device would be cumbersome. The current GPU code circumvents this issue by just passing
best_perm_indexinto the GPU kernel for indexing incalc_gradients_rmsd_kernel.
This PR depends on #826, and will be marked as ready once #826 is merged.(Done)This PR implements the GPU-resident RMSD calculation. However, using the GPU-resident RMSD as a sub CV is not supported yet.