HOME GUIDE OPERATIONS DOCS FAQ TECHNIQUES FORMATS INSTALL NEW TIPS WEB LINKS WADSWORTH

Benchmarking: Alpha Multiprocessors

We have completed a series of tests on SPIDER running on an Compaq Alpha with four processors. Richard Readings of Compaq England provided access to an Alpha EV6.7 (21264A) with four 667 MHz processors running True64 Unix 4.0

The current SPIDER release compiled and linked cleanly for both single and multiple processors and our usual test set worked OK. All tests were done on idle machines. The parallel compilation on both SGI and Compaq machines did not use automatic parallelization, only our embedded parallel directives.

We ran several performance tests on both single and multiple processor SPIDERs. In the results that follow the reader should keep in mind that we compiled/linked SPIDER with the following sets of flags on the Alpha:

spider_dec : -automatic -c -fullwarn -cpp -col72
spidermp_dec : -DSP_MP -DSP_F90 -automatic -c -fullwarn -cpp -col72 -O3 -omp -mp
Significantly better performance may be possible using other flags particularly: -fast -arch

Also effecient use of multiple processors under Open Multiprocessing (OMP) on Shared memmory computers is still an art requiring experimentation to obtain good performance. The choice of how to parallelize the code (at which loop levels) interacts with system scheduling algorithms and number of competing processes to have a large effect on performance. It may be hard to manage a multi-user parrallel machine so that competing users obtain any benefit from running parallel jobs.

Documentation on parallel operation on both SGI and Compaq is limited. One may need to tune the system to overcome problems. Two examples: On SGI systems under Irix 6.5 one may need to set the environment variable MPC_GANG to ON in order to prevent serious performance problems.

On Compaq Alpha's we discovered that the fine grained parallel requests that are used heavily in SPIDER (inner loop parallel) may cause two competing 4 processor SPIDER jobs to take 20 times longer than a single job. By adjusting the environment variable MP_SPIN_COUNT to 5000 instead of default (10000000?) performance returned to the rate for a single job. (Thanks to Richard Readings for solving this!)

Following are some timing comparisons for Alpha and SGI using SPIDER operations that are often time critical in a reconstruction.


machine  processor   MHz    time                    spider 
 
PJ 3q (10*400img)-----(quite a bit of disk IO) ------------------------
bali     R12000     1*300  46.9u  0.3s  0:50  94%   spider4.383 
                    2*300  54.9u  0.5s  0:30 183%   spider4mp.466 

alpha    EV6.7      1*667  56.30u 0.35s 0:56  99%   spider_dec
                    2*667  53.96u 0.39s 0:27 197%   spidermp_dec
                    4*667  58.16u 0.35s 0:14 391%   spidermp_dec


AP MQ ----------------( no disk IO ) ----------------------------------
bali     R12000     1*300  299.6u  0.9s  5:09  97%  spider4.383 
                    2*300  318.5u  1.7s  2:47 191%  spider4mp.466 

alpha    EV6.7      1*667  188.05u 0.16s 3:08  99%  spider_dec
                    2*667  170.37u 0.17s 1:25 199%  spidermp_dec
                    4*667  170.44u 0.19s 0:43 396%  spidermp_dec


fft  (20x 625 * 325 * 20 3d FFT forward & back using SPIDER & FFTW )---------------------------------------
     (note: FFTW is using the "Fastest Fourier in the West" library
      which can be called from SPIDER instead of using SPIDER's or 
      SGI's Fourier Library)
      
bali     R12000     1*300  68.5u  0.7s  1:13  94%   spider4.383  
                    2*300  88.4u  0.8s  0:47 186%   spider4mp.466 
 
                    1*300  63.4u  0.4s  1:06  95%   spider4fftw  
                    2*300  93.6u  0.8s  0:52 181%   spider4mpfftw  

alpha    EV6.7      1*667  51.24u 0.20s 0:51  99%   spider_dec
                    2*667  60.97u 0.21s 0:31 196%   spidermp_dec
                    4*667  76.06u 0.21s 0:19 382%   spidermp_dec

                    1*667  29.79u 0.21s 0:30  99%   spider_decfftw
                    2*667  39.94u 0.20s 0:20 192%   spidermp_decfftw
                    4*667  53.42u 0.23s 0:21 252%   spidermp_decfftw

Possible observations:

We are hesitant to draw overall conclusions based on such limited testing. With that caveat:

SGI may handle SPIDER operations with disk IO well. It showed similar performance in one such SPIDER job to Compaq Alpha. If disk IO is limited (through use of SPIDER's "inline" files) or if the operation is computationally dependent, under the conditions we tested, this Compaq Alpha may be 2-2.5 times faster than the SGI we used. Cost of the Alpha system may be less than a competitive SGI system. Parallel performance of the Compaq is good but may require tuning if multiple jobs are running.

Pawel Penczek and Dean Leith


Source file: alpha.html     Last update: 26 May 2000    


© Copyright Notice /       Enquiries: spider@wadsworth.org

BACK TO SPIDER BACK TO WADSWORTH