Skip to content

Conversation

@estrozi
Copy link

@estrozi estrozi commented May 15, 2025

Dear Developers,

I have make some changes to the applyHelicalSymmetry function (backprojector.cpp) to make it faster with the use of threads.

In many helical projects, it's common to evaluate numerous (>50) different symmetry parameters through full 3D auto-refinement jobs. This scenario becomes particularly computationally demanding for structures characterized by very small axial rise values. A notable example is EMD-42436 (https://www.ebi.ac.uk/emdb/EMD-42436), which has an axial rise of just 0.09892 Å. Such structures typically have over 500 unique helical asymmetric units per segment box, making the symmetry imposition step a significant portion of the total processing time.

With these modifications, I was able to reduce the processing time for auto-refinement from approximately 7 hours to 4.5 hours (for the same number of iterations).

I hope this (made into a separated branch of mine) could be integrated to the main project alongside the previously discussed (far from perfect but still useful) pull request (#1185) which adds functionality for calculating the average helical PS in the displayer.

Best wishes,

Leandro F. Estrozi.

@biochem-fan
Copy link
Member

With these modifications, I was able to reduce the processing time for auto-refinement from
approximately 7 hours to 4.5 hours (for the same number of iterations).

The effect is indeed impressive. How many threads did you use here?

My concern is that this duplicates the volume for every thread.
Is this function called by all MPI processes or only one per class (or half)?
People typically run 5 MPI (actually 4 working processes) x 8-12 threads.

Having a few copies of the volume might be fine but 32 to 48 copies might cause out-of-memory crashes on 128 GB RAM machines. Cannot you use locks or atomic operations to avoid duplications (as we do in the ALTCPU code path)? Users expect that the memory usage scales with the number of MPI processes but not threads. This is how most RELION jobs (MotionCorr, Polish, Refine3D etc) work.

P.S. I think the relion_display patch should remain on the separate PR.

@estrozi
Copy link
Author

estrozi commented May 15, 2025

With these modifications, I was able to reduce the processing time for auto-refinement from
approximately 7 hours to 4.5 hours (for the same number of iterations).

The effect is indeed impressive. How many threads did you use here?

I tested with 5 MPI processes with 4 threads each.

My concern is that this duplicates the volume for every thread. Is this function called by all MPI processes or only one per class (or half)? People typically run 5 MPI (actually 4 working processes) x 8-12 threads.

Having a few copies of the volume might be fine but 32 to 48 copies might cause out-of-memory crashes on 128 GB RAM machines. Cannot you use locks or atomic operations to avoid duplications (as we do in the ALTCPU code path)? Users expect that the memory usage scales with the number of MPI processes but not threads. This is how most RELION jobs (MotionCorr, Polish, Refine3D etc) work.

I will run tests with another version using locks and if the performance is also good I will replace it.

P.S. I think the relion_display patch should remain on the separate PR.

OK, I make them separated.

Thanks !

@estrozi
Copy link
Author

estrozi commented May 22, 2025

Hi ! The rotations being exclusively are around Z the threads could be made z-slice independent with 'small' 2D buffers. So now it goes faster and it does not have a significant impact on the overall memory consumption. I hope this becomes acceptable.

Best wishes,

Leandro.

@biochem-fan
Copy link
Member

@estrozi Thank you very much. Your 2D local array is a very clever trick :)

@scheres I think this is a nice contribution. Because this touches the core code, I suggest merging this into the private 5.1 branch under development to allow more testing at LMB. What do you think?

@scheres
Copy link
Contributor

scheres commented May 28, 2025

Hi Leandro @estrozi,
Why do you return if data.getDim() != 3 in applyHelicalSymmetry in the backprojector? Now the function no longer works for 2D helical data. Is that intended?

@estrozi
Copy link
Author

estrozi commented May 28, 2025

Hi Leandro @estrozi, Why do you return if data.getDim() != 3 in applyHelicalSymmetry in the backprojector? Now the function no longer works for 2D helical data. Is that intended?

Hi Sjors @scheres, I did that because when I was debugging with "data.printShape()" I was getting "Empty MultidimArray!" coming from the rank0 (master) process. I was also getting "data.getDim==0" from that same process so I returned. But I'm realizing now that this is not necessary. We can revert to the original line:

"if ( (nr_helical_asu < 2) || (ref_dim != 3) ) return;"

Sorry about that. :-)

@scheres
Copy link
Contributor

scheres commented May 28, 2025 via email

@estrozi
Copy link
Author

estrozi commented May 28, 2025

ok, great! I am adding this code to the master branch of our private relion-devel. It will then become available to the public later this year, or the beginning of next. S

Thanks a lot ! I would also really appreciate if the other PR about the PS average could be incorporated. But if it's not possible, that's fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants