Using SPIDER on IBM SP Clusters with Job Scheduling
This page illustrates the usage of some SPIDER operations that are helpful in
creating and controlling the execution of multiple SPIDER jobs running
in parallel on a loosely coupled clustered system which has a complex job
scheduling system to assign nodes to a user and regulate processing time.
Example: Alignment of Single Particles
Schedules node use using LoadLeveler on IBM SP clusters.
To run : llsubmit bjob
#@ job_name = bjob
#@ output = bjob.out
#@ error = bjob.err
#@ job_type = parallel
#@ network.MPI = css0,not_shared,us
#@ node_usage = not_shared
#@ environment = COPY_ALL
#@ notification = complete
#@ class = regular
#@ tasks_per_node = 1
#@ node = 25
#
#@ wall_clock_limit= 7:50:00
#@ queue
date
#reserve one node for each SPIDER job
poe -nodes 25 -procs 25 -pgmmodel mpmd -cmdfile bjob.cmd
date
Starts master and idle tasks on each node.
#- command --- Project/ Data -- intial ----- results --- reg.
# extension procedure file number setting
./spider pam/acn @b_master 0 x11=11
./spider pam/acn @b_idle 1 x77=16
./spider pam/acn @b_idle 2 x11=17
./spider pam/acn @b_idle 3 x11=18
..
..
..
./spider pam/acn @b_idle 24 x11=40
Master task. Started on one node only. Coordinates and
synchronizes all tasks.
; ArDean Leith Nov 2000
; INPUT:
; x12 (Starting micrograph number)
; x13 (Ending micrograph number)
; myinput/reference_volume (3-D input file)
; myinput/win_part@***** (2-D projections)
; myinput/ngood{***mcg} (Selection doc files)
; OUTPUT:
; out/prj**** (Projections)
; select (Doc file)
; refangles (Doc file)
; out/apmq{***x77} (Doc files)
x12 = 16 ; starting micrograph number
x13 = 40 ; ending micrograph number
MD
TR OFF ; decrease output to results file
MD
VB OFF ; decrease output to results file
MD
SET MP ; use SMP on 2 processors per node
2
VM ; dir: out{..} NEEDED
mkdir out
x11=1
; activate slave task for each micrograph
DO LB1 x77=x12,x13
- ; create sync document files with register settings for each slave
- @b_startslave[x11,x41,x42,x51,x52,x55,x66,x76,x77]
- LB1
VM
echo "b_master waiting for all alignments"
MY FL ; flush results
; wait for alignments to finish
@b_wait[x11,x12,x13,x47,x66,x76]
; alignments finished, signal slaves to end
x11=99
DO LB3 x77=x12,x13
@b_startslave[x11,x41,x42,x51,x52,x55,x66,x76,x77]
LB3
EN
b_startslave.pam
SPIDER procedure called by: b_master that
creates a doc. files for each group. The doc. file is used to signal the
startup of processing by the b_idle tasks and also
passes info to the b_idle tasks.
[x11,x41,x42,x51,x52,x55,x66,x76,x77]
; ArDean Leith Nov 2000
; Creates doc files used to wake up and pass info to idle tasks
; INPUT
; reg: 41
; reg: 42
; reg: 51
; reg: 52
; reg: 55
; reg: 66
; reg: 76
; reg: 77
; remove any existing document file for settings & to sync files
VM ; remove old sync doc. file for this group
\rm -f jnkdoc{***x77}.$DATEXT
; create document file with register settings
SD 11,x11 (contains type of slave flag)
jnkdoctmp{***x77}
SD 41,x41
jnkdoctmp{***x77}
SD 42,x42
jnkdoctmp{***x77}
SD 51,x51
jnkdoctmp{***x77}
SD 52,x52
jnkdoctmp{***x77}
SD 55,x55
jnkdoctmp{***x77}
SD 66,x66
jnkdoctmp{***x77}
SD 76,x76
jnkdoctmp{***x77}
SD E
jnkdoctmp{***x77}
VM
mv jnkdoctmp{***x77}.$DATEXT jnkdoc{***x77}.$DATEXT
RE
b_idle.pam
Started on each node execept for the master node. This
task waits for the existence of a start-up file: jnkdoc{***x77}
created by: b_master
When the signal (file) arrives, this procedure
calls SPIDER procedure b12.pam which carries out the
alignment for this group. When the alignment is finished, this
procedure creates a new doc. file: jnkdocparamout{***x77}
which signals _bmaster that it
can re-awaken.
; ArDean Leith Nov 2000
; INPUT:
; reg: 77 (group, on command line)
; jnkdoc{ } (doc file created by b_master & b_startslave)
; OUTPUT:
; jnkdocparmout (signal file contains x11 & x47)
x77 ; group must be on command line!!!!!
MD
TR OFF ; decrease output to results file
MD
VB OFF ; decrease output to results file
MD
SET MP ; use SMP on 2 processors per node
2
; Awakens on signal from b_master ----------------------------
; Runs following operations for each awakening (100000=infinite)
DO LB1 i=1,100000
- IQ SYNC ; wait for wake-up signal (file: jnkdoc{***grp}
- jnkdoc{***x77}
- (10 36000)
- ; retrieve registers stored in doc file: jnkdoc{***x77}
- UD IC,11,X11
- jnkdoc{***x77}
- IF (x11.GE.99) THEN
- ; signal to kill this slave task
- EN
- ENDIF
- UD IC,41,X41
- jnkdoc{***x77}
- UD IC,42,X42
- jnkdoc{***x77}
- UD IC,51,X51
- jnkdoc{***x77}
- UD IC,52,X52
- jnkdoc{***x77}
- UD IC,55,X55
- jnkdoc{***x77}
- UD IC,66,X66
- jnkdoc{***x77}
- UD IC,76,X76
- jnkdoc{***x77}
- UD ICE
- jnkdoc{***x77}
- VM ; remove this sync. doc file
- \rm -f jnkdoc{***x77}*
- VM
- date
- VM
- echo "starting step: {**x76} group: {**x77}"
- X11
- MY FL ; flush results file
<\P>
- IF (x11 .EQ. 1) THEN
- @b12[x77] ; runs alignment for this group.
- ENDIF
- ; Signal b_master to re-awaken now
- ; (b_master wakes when it sees jnkdocparamout{***x77})
- SD 11,X11 ; set sync file output
- jnkdocparamout{***x77}
- SD E
- jnkdocparamout{***x77}
- VM
- echo "ending iteration: {**x76} group: {**x77}"
LB1
EN
b12.pam
Started on each processor by b_idle.pam
Aligns particles to reference projections.
[x77]
; ArDean Leith Nov 2000
; Aligns particles to reference projections.
; Multireference alignment of an image series. For
; project with multiple defocus settings, run this program
; separately for particles from each individual micrograph.
; If pixel size is different than 4.78, expected size of object and
; first and last ring parameters should be changed
; INPUT:
; out/prj**** (2-D ref. images)
; select (Selection doc file for refs. from b11.pam)
; scratch/leith/win_part@***** (Windowed images)
; myinput/ngood{***grp} (Selection doc files for windowed images)
; OUTPUT:
; out/apmq{***x77} (Alignment doc files)
MD
TR OFF ; decrease ouput to results file
MD
VB OFF ; decrease ouput to results file
MD
SET MP ; use SMP on 2 processors per node
2
MY FL ; flush output
AP MQ ; Alignment - 3D, multi reference
out/prj**** ; Template for 2-D reference image names (input)
select ; Selection doc. file for reference imgs. (input)
(10,1) ; Accuracy of the search
(5,47) ; First and last ring
/scratch/leith/win_part@***** ; Windowed images (input)
myinput/ngood{***x77} ; Windowed images selection doc. file (input)
out/apmq{***x77} ; Angles output file (output)
MY FL ; Flush output
RE
b_wait.pam
b_master running on the master node calls this
procedure after awakening the b_idle tasks to
carry out the alignment. When an alignment is finished,
b_idle creates a new doc. file: jnkdocparamout{***x77}.)
This procedure causes b_master
to wait for the creation of these files from each of the
b_idle tasks.
[x11,x12,x13,x47,x66,x76]
; ArDean Leith Nov 2000
; Used in b_master. Waits for slaves to finish.
; For step id=2, accumulates register 47 contents from
; sync doc file.
; INPUT:
; reg: 11 (step id)
; reg: 12 (startinggroups)
; reg: 13 (ending groups)
; reg: 66 (number of groups)
; reg: 76 (step number)
; jnkdocparamout{***grp}*
; OUTPUT:
; reg: 47 (acummulated reg #47)
x12 ; echo reg 12
x13 ; echo reg 13
x47=0 ; initialize return value
; wait for all micrograph groups -------------
DO LB3 x76=x12,x13
- X77=56-x76 ; count down since group 16 is so long
- x77
- MY FL ; flush results
- IQ SYNC
- jnkdocparamout{***x77}
- (10 36000)
- VM
- date
- VM
- echo "synced step: {**x76} group: {**x77} "
- ;
- IF (X11 .EQ. 2) THEN
- ; b_defloopa sets x47 in jnkdocparamout{***x77}
- UD 47,x12
- jnkdocparamout{***x77}
- x47=x47+x12
- UD E
- jnkdocparamout{***x77}
- ENDIF
- DE
- jnkdocparamout{***x77}
- ;
- MY FL ; flush results
LB3 ; end wait loop over groups -------
RE
Source: techs/parallel/parallel_ibm.html
Last update: 26 April 2001
ArDean Leith