RELION 3.0: Refine3D issue with Intel PSXE and CUDA #449

Fravadona · 2019-03-04T12:56:08Z

Edit: After finding the bug and resolving the issue I changed the title of the post.

Hello,
I'm not really sure if my issue is a bug but I'm stuck with it because the log doesn't say anything useful.

RELION was built on Linux CentOS 7 with Intel Compilers 2018.3 + Intel MPI + Intel MKL + CUDA 10.0

The pixel size of the micrographs is 1.2115A and the particles box size should be 1024 pixels but I downscaled them to 256 pixels when extracting. All the prior processing (Class2D, InitialModel, Class3D) was made with the downscaled particles and now I'm trying to launch a 3D refinement with the downscaled particles too.

The job was run locally using the GUI:

  *** The command is:
`which relion_refine_mpi` --o Refine3D/job133/run --auto_refine --split_random_halves --i Class3D/job113/run_it025_data.star --ref Class3D/job113/run_it025_class001.mrc --ini_high 60 --dont_combine_weights_via_disc --pool 3 --pad 2  --ctf --ctf_corrected_ref --particle_diameter 1240 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 5 --offset_step 2 --sym I1 --low_resol_join_halves 40 --norm --scale  --j 1 --gpu "0:1"

As you can see I use a 3D class generated with RELION, and the particles are the ones of this model.

Here's the output log (the error log is empty) :

RELION version: 3.0 
Precision: BASE=double, CUDA-ACC=single 

 === RELION MPI setup ===
 + Number of MPI processes             = 3
 + Master  (0) runs on host            = gpulab02
 + Slave     1 runs on host            = gpulab02
 + Slave     2 runs on host            = gpulab02
 =================
 uniqueHost gpulab02 has 2 ranks.
 Using explicit indexing on slave 1 to assign devices  0
 Thread 0 on slave 1 mapped to device 0
 Using explicit indexing on slave 2 to assign devices  1
 Thread 0 on slave 2 mapped to device 1

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 6321 RUNNING AT gpulab02
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 6321 RUNNING AT gpulab02
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
   Intel(R) MPI Library troubleshooting guide:
      https://software.intel.com/node/561764
===================================================================================

What I'm guessing is that Refine3D doesn't work with rescaled particles, isn't it ?

Cheers,
Rafael.

The text was updated successfully, but these errors were encountered:

biochem-fan · 2019-03-04T13:01:57Z

This is strange. Class3D and Refine3D use very similar codes. If Class3D worked fine on your down-sampled particles, Refine3D should also run fine.

Fravadona · 2019-03-04T13:56:08Z

I agree, Refine3D should work...

Here's the Class3D job:

 *** The command is:
`which relion_refine_mpi` --o Class3D/job113/run --i Select/job084/particles.star --ref InitialModel/job109/run_it300_class001.mrc --ini_high 60 --dont_combine_weights_via_disc --pool 3 --pad 2  --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 1240 --K 1 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym I1 --norm --scale  --j 1 --gpu "0:1"

RELION version: 3.0 
Precision: BASE=double, CUDA-ACC=single 

 === RELION MPI setup ===
 + Number of MPI processes             = 3
 + Master  (0) runs on host            = gpulab02
 + Slave     1 runs on host            = gpulab02
 + Slave     2 runs on host            = gpulab02
 =================
 uniqueHost gpulab02 has 2 ranks.
 Using explicit indexing on slave 1 to assign devices  0
 Thread 0 on slave 1 mapped to device 0
 Using explicit indexing on slave 2 to assign devices  1
 Thread 0 on slave 2 mapped to device 1
 Running CPU instructions in double precision. 
 Estimating initial noise spectra 
   1/   1 sec ............................................................~~(,_,">
WARNING: There are only 3 particles in group 3
WARNING: There are only 3 particles in group 18
WARNING: You may want to consider joining some micrographs into larger groups to obtain more robust noise estimates. 
         You can do so by using the same rlnMicrographName for particles from multiple different micrographs in the input STAR file. 
         It is then best to join micrographs with similar defocus values and similar apparent signal-to-noise ratios. 
 Estimating accuracies in the orientational assignment ... 
   0/   0 sec ............................................................~~(,_,">
 Auto-refine: Estimated accuracy angles= 0.984 degrees; offsets= 0.71 pixels
 CurrentResolution= 59.075 Angstroms, which requires orientationSampling of at least 5.45455 degrees for a particle of diameter 1240 Angstroms
 Oversampling= 0 NrHiddenVariableSamplingPoints= 1512
 OrientationalSampling= 15 NrOrientations= 72
 TranslationalSampling= 2 NrTranslations= 21
=============================
 Oversampling= 1 NrHiddenVariableSamplingPoints= 48384
 OrientationalSampling= 7.5 NrOrientations= 576
 TranslationalSampling= 1 NrTranslations= 84
=============================
 Expectation iteration 1 of 25
  11/  11 sec ............................................................~~(,_,">
 Maximization ...
   9/   9 sec ............................................................~~(,_,">
 Estimating accuracies in the orientational assignment ... 
   0/   0 sec ............................................................~~(,_,">
 Auto-refine: Estimated accuracy angles= 1.631 degrees; offsets= 0.763 pixels
 CurrentResolution= 47.7145 Angstroms, which requires orientationSampling of at least 4.39024 degrees for a particle of diameter 1240 Angstroms
 Oversampling= 0 NrHiddenVariableSamplingPoints= 1512
 OrientationalSampling= 15 NrOrientations= 72
 TranslationalSampling= 2 NrTranslations= 21
=============================
... 
=============================
 Oversampling= 1 NrHiddenVariableSamplingPoints= 48384
 OrientationalSampling= 7.5 NrOrientations= 576
 TranslationalSampling= 1 NrTranslations= 84
=============================
 Expectation iteration 24 of 25
1.58/1.58 min ............................................................~~(,_,">
 Maximization ...
  53/  53 sec ............................................................~~(,_,">
 Estimating accuracies in the orientational assignment ... 
   1/   1 sec ............................................................~~(,_,">
 Auto-refine: Estimated accuracy angles= 0.934 degrees; offsets= 0.338 pixels
 CurrentResolution= 18.7966 Angstroms, which requires orientationSampling of at least 1.73077 degrees for a particle of diameter 1240 Angstroms
 Oversampling= 0 NrHiddenVariableSamplingPoints= 1512
 OrientationalSampling= 15 NrOrientations= 72
 TranslationalSampling= 2 NrTranslations= 21
=============================
 Oversampling= 1 NrHiddenVariableSamplingPoints= 48384
 OrientationalSampling= 7.5 NrOrientations= 576
 TranslationalSampling= 1 NrTranslations= 84
=============================
 Expectation iteration 25 of 25
1.40/1.40 min ............................................................~~(,_,">
 Maximization ...
  53/  53 sec ............................................................~~(,_,">

Let me know if you need other job outputs or me to change some job parameters or to turn on a few debug flags in the source code.

Fravadona · 2019-03-06T18:55:42Z

I have some feedback:

The problem is not related to the downscaling of the particles because I tried an other project with the original box size (256) and it crashes too.
I recompiled RELION with Intel PSXE 2017.7 + Intel MPI + Intel MKL + CUDA 9.2 and the job starts some calculations but crashes after that; here's the log:

RELION version: 3.0 
Precision: BASE=double, CUDA-ACC=single 

 === RELION MPI setup ===
 + Number of MPI processes             = 3
 + Master  (0) runs on host            = gpulab02
 + Slave     1 runs on host            = gpulab02
 + Slave     2 runs on host            = gpulab02
 =================
 uniqueHost gpulab02 has 2 ranks.
 Using explicit indexing on slave 1 to assign devices  0
 Thread 0 on slave 1 mapped to device 0
 Using explicit indexing on slave 2 to assign devices  1
 Thread 0 on slave 2 mapped to device 1
 Running CPU instructions in double precision. 
 Estimating initial noise spectra 
   4/   4 sec ............................................................~~(,_,">
WARNING: There are only 1 particles in group 3 of half-set 1
WARNING: There are only 3 particles in group 5 of half-set 1
WARNING: You may want to consider joining some micrographs into larger groups to obtain more robust noise estimates. 
         You can do so by using the same rlnMicrographName for particles from multiple different micrographs in the input STAR file. 
         It is then best to join micrographs with similar defocus values and similar apparent signal-to-noise ratios. 
 Auto-refine: Iteration= 1
 Auto-refine: Resolution= 59.075 (no gain for 0 iter) 
 Auto-refine: Changes in angles= 999 degrees; and in offsets= 999 pixels (no gain for 0 iter) 
 Estimating accuracies in the orientational assignment ... 
   0/   0 sec ............................................................~~(,_,">
 Auto-refine: Estimated accuracy angles= 1.778 degrees; offsets= 0.926 pixels
 CurrentResolution= 59.075 Angstroms, which requires orientationSampling of at least 5.53846 degrees for a particle of diameter 1220 Angstroms
 Oversampling= 0 NrHiddenVariableSamplingPoints= 1512
 OrientationalSampling= 15 NrOrientations= 72
 TranslationalSampling= 2 NrTranslations= 21
=============================
 Oversampling= 1 NrHiddenVariableSamplingPoints= 48384
 OrientationalSampling= 7.5 NrOrientations= 576
 TranslationalSampling= 1 NrTranslations= 84
=============================
 Expectation iteration 1
  12/  12 sec ............................................................~~(,_,">

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 6837 RUNNING AT gpulab02
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
   Intel(R) MPI Library troubleshooting guide:
      https://software.intel.com/node/561764
===================================================================================

I also tried to remove my optimisation flags (-O3 -ip -xCORE-AVX2 -restrict) but nothing changed.

Thanks for your help

AL: We don't have CUDA 10.1 compatible driver installed so I cannot compile RELION with Intel PSXE 2019 for now

bforsbe · 2019-03-07T09:42:33Z

Did you try compiling with gcc and openmpi? Not sure how well the intel/CUDA mix works atm, since there's not much reason for it. Most locations where avx would beused to great benefit are on the GPU using a CUDA-build/run anyway, and it's really only the vectorization that icc does better than gcc.

Fravadona · 2019-03-12T23:38:33Z

Indeed, I tested building RELION with GCC 7.3.1 / OpenMPI 3.1.3 / CUDA 10.0 and the GPU acceleration of Refine3D works fine with it.
That said, I finally fixed the issue with CUDA and Intel ICC/MKL/MPI. There was a bug in ml_optimizer_mpi.cpp that makes all MPICH variants crash. I'm not sure how OpenMPI handles it because there's a buffer overflow in a call to MPI_Recv, so... Well, here's the patch (I'm not sure that's the best way to do it but defining a MACRO for MY_MPI_SIZE_T seems reasonable) :

--- relion-3.0.1/src/ml_optimiser_mpi.cpp.orig	2019-03-11 16:48:22.000000000 +0100
+++ relion-3.0.1/src/ml_optimiser_mpi.cpp	2019-03-12 22:37:58.505551279 +0100
@@ -414,14 +414,14 @@
 					size_t t = b.checkFixedSizedObjects(cudaDeviceShares[i]);
 					boxLim = ((t < boxLim) ? t : boxLim );
 				}
-				node->relion_MPI_Send(&boxLim, sizeof(size_t), MPI_INT, 0, MPITAG_INT, MPI_COMM_WORLD);
+				node->relion_MPI_Send(&boxLim, 1, MY_MPI_SIZE_T, 0, MPITAG_INT, MPI_COMM_WORLD);
 			}
 			else
 			{
 				size_t boxLim, LowBoxLim(10000);
 				for(int slave = 1; slave < node->size; slave++)
 				{
-					node->relion_MPI_Recv(&boxLim, sizeof(size_t), MPI_INT, slave, MPITAG_INT, MPI_COMM_WORLD, status);
+					node->relion_MPI_Recv(&boxLim, 1, MY_MPI_SIZE_T, slave, MPITAG_INT, MPI_COMM_WORLD, status);
 					LowBoxLim = (boxLim < LowBoxLim ? boxLim : LowBoxLim );
 				}
 
--- relion-3.0.1/src/macros.h.orig	2019-03-11 16:48:22.000000000 +0100
+++ relion-3.0.1/src/macros.h	2019-03-12 23:24:23.457103049 +0100
@@ -47,6 +47,7 @@
 
 #define RELION_VERSION "3.0.1"
 
+#include <stdint.h>
 #include <math.h>
 #include <signal.h>
 #include "src/error.h"
@@ -78,6 +79,21 @@
 #define MY_MPI_COMPLEX MPI_DOUBLE_COMPLEX
 #endif
 
+//https://stackoverflow.com/a/40808411
+#if SIZE_MAX == UCHAR_MAX
+   #define MY_MPI_SIZE_T MPI_UNSIGNED_CHAR
+#elif SIZE_MAX == USHRT_MAX
+   #define MY_MPI_SIZE_T MPI_UNSIGNED_SHORT
+#elif SIZE_MAX == UINT_MAX
+   #define MY_MPI_SIZE_T MPI_UNSIGNED
+#elif SIZE_MAX == ULONG_MAX
+   #define MY_MPI_SIZE_T MPI_UNSIGNED_LONG
+#elif SIZE_MAX == ULLONG_MAX
+   #define MY_MPI_SIZE_T MPI_UNSIGNED_LONG_LONG
+#else
+   #error "<MPI_Datatype> not found for <size_t>"
+#endif
+
 #if defined CUDA and DEBUG_CUDA
 #define CRITICAL(string) raise(SIGSEGV);
 #else

Cheers,
Rafael.

biochem-fan · 2019-03-13T12:07:57Z

@Fravadona Thank you very much for finding this. I will merge your suggestion (I will change size_t to unsigned long long, which makes things easier).

Fravadona · 2019-03-13T12:38:47Z

@biochem-fan You're welcome, and you're right, casting size_t to unsigned long long should do the trick ;-)

@Fravadona

…issue #449; thanks to @Fravadona for finding this and suggesting the fix) Call this 3.0.2 because we touched relion_refine.

Fravadona changed the title ~~RELION 3.0: Refine3D issue with rescaled particles ?~~ RELION 3.0: Refine3D issue with Intel PSXE and CUDA (rescaled particles ? Mar 13, 2019

Fravadona changed the title ~~RELION 3.0: Refine3D issue with Intel PSXE and CUDA (rescaled particles ?~~ RELION 3.0: Refine3D issue with Intel PSXE and CUDA Mar 13, 2019

biochem-fan self-assigned this Mar 13, 2019

biochem-fan added the bug label Mar 13, 2019

biochem-fan added a commit that referenced this issue Mar 13, 2019

relion_refine_mpi: Fix buffer overflow during initialisation (solves …

f873543

…issue #449; thanks to @Fravadona for finding this and suggesting the fix) Call this 3.0.2 because we touched relion_refine.

biochem-fan closed this as completed Mar 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RELION 3.0: Refine3D issue with Intel PSXE and CUDA #449

RELION 3.0: Refine3D issue with Intel PSXE and CUDA #449

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RELION 3.0: Refine3D issue with Intel PSXE and CUDA #449

RELION 3.0: Refine3D issue with Intel PSXE and CUDA #449

Comments

Uh oh!

Edit: After finding the bug and resolving the issue I changed the title of the post.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!