(Updated 2009 May 20)
Particle verification for single-particle, reference-based reconstruction
using multivariate data analysis and classification
As described in J Struct Biol (2008) 161: 41-48.
Outline
Links:
Return to top
General notes:
- Changes from the "normal" flow will be shown in teal.
- Data filenames will be in bold, and menu options will be in italic.
- Data extension is assumed to be .dat. Adjust accordingly.
Console commands will be in Courier.
Return to top
Recent modifications:
- 2009-05-14 -- histgoodccc.spi -- added output histogram of bad particles
- 2009-05-14 -- recheck.spi -- added optional step to add or subtract particles after re-screening
- 2009-05-14 -- selectbyviewall.spi -- uses procedure reversedoc_7col.spi
- 2009-05-13 -- combinegoodclasses.spi -- sorts good and bad particle lists
- 2009-04-22 -- filterbyview.spi -- added optional average for each view
- 2009-04-14 -- goodparticlesbydf.spi -- default fractional cutoff is 0
- 2009-04-14 -- dfgoodapsh.spi -- output now called sel_particles_***, works without defocus groups
- 2009-04-07 -- included backup script backup.sh
- 2009-02-26 -- added quick-start guide
- 2009-02-26 -- added viewaverage.spi (optional) -- analogous to average.spi
- 2008-11-12 -- classify.spi -- added flag to choose between CA, PCA, or iterative PCA
- 2008-10-08 -- listallparticles.spi -- added global particle number
- 2008-09-29 -- added settings file .montagefromdoc in Power_Spectra/
- 2008-09-29 -- added mkfilenums.py, shrink.spi, and .montagefromdoc in Micrographs/
- 2008-07-02 -- added to
SPIDER Techniques page
- 2008-02-01 -- bps-by-df.spi -- added options for BP 32F and BP RP
- 2008-02-01 -- added doc-file formatting information to headers of some batch files
- 2008-02-01 -- selectbyviewall.spi -- added output stack2particle*** with global particle number,
to compensate for (another?) change to sel_particles. The following also changed accordingly:
- histgoodccc.spi
- big-goodparticlesbydf.spi
- goodparticlesbydf.spi
- dfgoodapsh.spi
- 2008-01-29 -- pnums.spi -- uses sel_micrograph as an input instead of filenums
- 2007-09-06 -- filtered particles are stacked for hopefully better portability.
filtershrinksh.spi gave way to filterbyview.spi, and classify.spi changed accordingly.
- 2007-08-22 -- archive of tarballs linked to projection-matching tarballs
- 2007-08-22 -- changed hard-to-read italic passages to less-hard-to-read teal.
- 2007-03-21 -- consecprepare.spi -- added to substitute for copyin.pam
- 2007-01-31 -- significant re-write of batch files in verification section.
Changes are summarized here.
- 2007-01-23 -- started archive of tarballs
- 2006-08-29 -- verifybyview.py -- remembers settings from text file
- 2006-08-28 -- montagefromdoc.py -- error-checking for selection file
- 2006-07-27 -- added SPIRE configuration file to
tarball
- 2006-06-27 -- created tarball of batch files
- 2006-05-12 -- AP SH version is now the default
- 2005-03-03 -- posted documentation on Python/Tkinter interface
- 2005-01-21 -- changed extensions from .bat to .spi
Return to top
Quick-start guide
Options limited for the sake of simplicity. For more details, see below.
In toplevel directory (e.g., myproject/):
- Unpackage
projection-matching batch files
- Unpackage
verification batch files
- Copy params file or spider spi/dat @makeparams
- Copy reference volume and/or spider spi/dat @resizevol
In Micrographs/:
- mkfilenums.py ../filenums.dat mic*.dat
- spider spi/dat @shrink
- montagefromdoc.py ../sel_micrograph.dat sm-mic*
In Power_Spectra/:
- montagefromdoc.py ../sel_micrograph.dat power/pw_avg*
- spider spi/dat @defocus
- spider spi/dat @defsort
- ctfgroup.py def_sort.dat
- spider spi/dat @defavg
In Particles/:
- Copy noise file, or spider spi/dat @noise
- spider spi/dat @lfc_pick
- spider spi/dat @pnums
- Edit order_picked.dat using montagefromdoc.py
- spider spi/dat @listallparticles
In Alignment/:
- spider spi/dat @refproj
- spider spi/dat @sel_by_group
- spider spi/dat @win2stk
- spider spi/dat @apshgrp
In Reconstruction/:
- spider spi/dat @selectbyviewall
- spider spi/dat @filterbyview
- spider spi/dat @classify
- verifybyview.py
- spider spi/dat @combinegoodclasses
- (optional) montagefromdoc.py prj001/goodsel.dat and spider spi/dat @recheck
- (optional) spider spi/dat @viewaverage
- spider spi/dat @goodparticlesbydf
- spider spi/dat @dfgoodapsh
- (optional) spider spi/dat @select
- (optional) spider spi/dat @average
- (optional) spider spi/dat @plotview
- (optional) spider spi/dat @bestim
- spider spi/dat @bps-by-df
- (optional) spider spi/dat @slices
- spider spi/dat @ctf
- spider spi/dat @res
- spider spi/dat @plotres
- spider spi/dat @filt
- spider spi/dat @consecprepare
In Refinement/:
- spider pam/dat @pub_refine
Return to top
Getting started
- Download "normal" projection-matching tarball and supplemental, verification tarball
both archived here.
Unpackage spiproject.tar.gz first, and in the top-level directory
(e.g., myproject/), unpackage the verification tarball.
- If using SPIRE:
spire &
Type something, anything, under Project title.
Enter data extension under data extension.
Here, I will assume .dat, so adjust accordingly.
Under Directory for this project,
make sure that the current directory is entered.
Sometimes the data extension is appended, thus defining a new directory.
Under Configuration file, select verify.xml,
using the Browse button if necessary.
Uncheck the buttonCreate directories and load batch files.
I haven't chosen a convention for where to store the batch files.
makeparams.spi
Skip makefilelist.spi --
The Python script in Micrographs/ is more general.
get reference volume
(optional) resizevol.spi -- interpolates reference-volume
Return to top
Micrographs -- in Micrographs/
directory
- Micrographs are assumed to have the file pattern mic****
- To generate a SPIDER doc file containing a list of micrographs, type
(substitituing the appropriate data extension):
mkfilenums.py ../filenums.dat mic*.dat
Shrink the micrographs
- shrink.spi
- PARAMETER: decimation factor (so that
the micrograph fits on the screen at 1X)
- INPUT: mic****
- OUTPUT: sm-mic****
Screen the micrographs using montagefromdoc.py.
montagefromdoc.py
The first popup window will contain hopefully reasonable settings, or
you can enter filenums and the micrograph file-pattern on the command line
(in that order).
Alternatively, to keep all the micrographs, in the
top-level directory, copy filenums to sel_micrograph.
Return to top
CTF estimation -- in Power_Spectra/
directory
- power.spi (slow)
- INPUT: ../Micrographs/mic****
- OUTPUT: power/pw_avg***, power/roo***
- Screen the power spectra visually using montagefromdoc.py.
- Run:
montagefromdoc.py
The first popup window will contain hopefully reasonable settings.
If not, the input doc file is sel_micrograph and
the image file pattern is power/pw_avg****.
- To enhance the contrast more than is possible in
montagefromdoc.py, you may need to view a power spectrum in JWEB.
- The output selection doc file, sel_micrograph,
will overwrite the original, however a backup copy will be saved.
- Determine the defocus.
- I recommend defocus.spi, which uses command TF ED
Using CTF from doc in WEB, def.spi, and mrc.spi are options also.
- INPUT: power/pw_avg***
- OUTPUT: power/ctf***, defocus
-
ctfmatch.py is a nice program to display the fitting.
If removing bad micrographs, name the list of remaining good micrographs sel_micrograph.
- defsort.spi
- INPUT: defocus
- OUTPUT: def_sort
- Check the defocus groups
-
ctfgroup.py is a nice program to edit the defocus group.
- To run ctfgroup.py from the command line, type:
ctfgroup.py def_sort.dat
If you need to remove odd micrographs, save whatever new groupings in ctfgroup.py,
remove the offending micrographs from def_sort in a text editor,
and re-open ctfgroup.py. If you've removed many micrographs, to save time,
you may want to remove these filenumbers from sel_micrograph
in the top-level directory. If you don't, however, it won't cause any errors downstream.
defavg.spi
- INPUT: def_sort
- OUTPUT: def_avg, order_defgrps
Return to top
Particle-picking -- in Particles/
directory
- Get noise file, from previous project or from noise.spi
- Window particles (slow).
- I recommend lfc_pick.spi, which calls pickparticle.spi.
pick.spi or selecting particles from micrographs in WEB are also options.
- INPUT: ../Micrographs/mic****, ../reference
- OUTPUT: win/winser_****, coords/sndc****
- Generate table of first and last particles for each micrograph
- pnums.spi
- INPUT: coords/sndc****
- OUTPUT: order_picked
- Skip renumber.spi
- Skip snums.spi
- (Optional) Truncate particle lists to omit bad particles.
lfc_pick.spi sorts particles by high cross-correlation to
worst, and will typically lead off each micrograph with
ice-condensation blobs and end with noise. You can exclude these bad images at the
extremes by specifying the contiguous range that includes all of the good particles.
This step will save time and disk space and will help the classification.
There are two ways to visualize the particles: using WEB or the
Python utility montagefromdoc.py
- Using montagefromdoc.py --
this will require less preprocessing
(e.g., filtershrink.spi) but Python display is slower than WEB.
- Run:
montagefromdoc.py
The first popup window will contain hopefully reasonable settings, or
you can enter the doc-file and particle-file names on the command line (in that order).
- Using WEB
- Under Options/Image turn on filenames, and
you may need to use a small font, set under Options/Font.
Open the stacks win/winser_****.
- There used to be batch files
filtershrink.spi and
negmontagedocs.spi
to filter and break the stacks into a bite-sized number of images, respectively, but
I haven't updated them since lfc_pick started using stacks. See me if you would like
to have these batch files updated.
- Look through some montages.
If there is a similar number of good particles, note that number for the next step,
listallparticles.spi.
- If the number of good particles varies widely, then:
- Using a text-editor, replace in order_picked the first & last particle-number with
the first & last good particle-number.
Recalculating total particle-number (2nd column) is unnecessary.
- Subsequent procedures that refer to order_picked will thus
ignore the excluded particles.
- See here for an illustrated
example using WEB.
- Make total-particle list for alignment
- listallparticles.spi
- PARAMETER: maximum particles per micrograph
(it is probably safe to err on the side of too many)
- INPUT: order_picked
- OUTPUT: coords/docall**** coords/mic2global
- subsequent alignment will require coords/docall****
instead of good/ngood****
Return to top
Alignment -- in Alignment/ directory
- refproj.spi
- INPUT: reference, ../Power_Spectra/order_defgrps
- OUTPUT: refangles, prj_####@****
- sel_by_group.spi
- INPUT: use ../Particles/coords/docall{****[mic]}
(output of listallparticles.spi)
instead of ../Particles/good/ngood{****[mic]}
- OUTPUT: sel_particles_***, sel_group
- win2stk.spi
- INPUT: ../Particles/win/winser_***@, ../Particles/coords/docall{****[mic]}
instead of ../Particles/good/ngood{****[mic]}
- OUTPUT: data***
- apshgrp.spi
- INPUT: prj_####@****, data***, sel_particles_***
- OUTPUT: align_01_***, dala01_***
- To run it using PubSub, change the [pubsub] flag in the batch file, and
copy the SPIDER executable to the local directory, i.e., ./spider
The syntax will be:
./spider spi/dat @apshgrp 1 > log1.txt &
where:
- 1 -- is the number of the master results file, and
- log1.txt -- is a file that contains the screen output,
so that you can easily monitor the progress remotely.
To monitor this file in real time, type:
tail -f log1.txt
Return to top
Verify Particles --
in Reconstruction/ directory
- Make selection doc for each reference view.
- Filter and (optionally) shrink particles.
- filterbyview.spi
- INPUT: dala01_***, align_01_***
- OUTPUT: select/prj***/stkfilt
- The goal with the filter parameters was to be able to ignore CTF effects.
So, I chose the first CTF zero of the most-defocused micrograph as the Butterworth stop-band.
If you're not sure what filter radii to use, try
findctfminima.spi
- To test the filter parameters,
try running this batch file on just the particles in one reference view,
by setting parameter [last-view] to 1.
- If you're using old, X-Window WEB, which
can't montage a doc file from stacks, run
filtershrinksh.spi,
which instead writes ../Particles/flt/flt******
- (Optional) Screen particles without classification.
- You can optionally screen the particles without classification.
This could be useful for small data sets that don't warrant classification,
or if you otherwise want to see all particles before classification. Run:
montagefromdoc.py
The first popup window will hopefully have reasonable settings.
If not, enter select/sortsel001.dat, select/prj001/stkfilt.dat, and
select/prj001/goodsel.dat for the input selection filename, particle filename,
and output selection file, respectively.
- If you perform this step, skip ahead past combinegoodclasses.spi
- The particles are sorted by correlation coefficient, from highest to lowest.
You can display the CCROT values by clicking the checkbutton under Display/Labels,
but you'll probably need to resize the window.
- Run correspondence analysis and separate particles into classes.
- classify.spi
- INPUT: select/sel***, select/prj***/stkfilt
- OUTPUT: select/prj***/{docclass###, classavg###,
classes_by_ccc}
- If you're running X-Windows WEB (or for some other reason ran
filtershrinksh.spi) instead run
unstacked-classify.spi,
which uses ../Particles/flt/flt****** as an input.
- You can select good class-averages for a reference-view as
soon as the batch file starts on a subsequent reference-view (as
printed to the screen). As of 2004, you can probably sift through
classes faster than SPIDER can calculate them.
- If you would like to verify particles on a different computer,
you should be able to copy the contents of the Reconstruction/select/ directory.
There are a couple thousand files there total, including the particle stacks,
in contrast to ~100,000 files in the case of unstacked particles.
- There are four options for keeping particles,
depending on how much control you want/need:
- Option using Python/Tkinter:
Select classes and particles therein using verifybyview.py.
More information found here.
- Options using WEB:
- First, select good class-averages in WEB using Categorize/Sequential
montage (to show them in sequential order) or Categorize/Doc.
file montage (using classes_by_ccc) to show them
from worst cross-correlation to best.
- Name the resulting list of good classes goodclasses and
click on the good classes. WEB by default will write a separate
document file for each reference view.
- When not sure about class, check member particles -- in
separate WEB window -- using Montage from doc file. using
the appropriate document file docclass{***class#}.
The Image file template in WEB should be
"../../../Particles/flt******"
Particles will be sorted from worst cross-correlation to best.
See here for an illustrated example.
- There are three levels of control using WEB:
- Simply click on the good classes as described above.
This will keep all of the particles in the good classes.
- Manually click on the first good particle in each class.
I recommend doing this until you get a feel for which classes are bad.
See here for an illustrated example.
- Click on the good classes as described above.
- Instead of displaying class-montages with Montage
from doc file, use Categorize/Doc. file montage.
- Name the output file firstgoodparticle. There
should be one file for each reference-view.
- When prompted for the key, enter the class-number.
If there isn't a key for each good class, the next batch
file (combinegoodclasses.spi) will crash.
- Click on the first good particle for each class that has
particles you want to keep, e.g., the first one to keep all of them.
- Manually click on the good particles in each class.
The only advantages over fully manual-particle verification is
that the particles are separated by view and aligned.
This is useful if classification didn't do a good job on your particles.
- Click on the good classes as described above.
- Display the class-montages using Categorize/Doc. file montage.
- Name the output file byhand{***class#}.
- Click on each particle that you would like to keep.
If using WEB, for a given reference-view, you have to use the same method for all
classes.
That is, if you use the "whole class" mode (option 1) for one class in a
reference-view, you must use it for all classes.
The same goes for options 2 and 3.
However, you can mix-and-match methods within different reference-views.
The next batch file will write to the screen which method was used for each view.
- Combine particles from good classes.
- combinegoodclasses.spi
- SUBROUTINE: reversedoc_7col.spi
- INPUT: select/prj***/{docclass###, goodclasses,select/sel***,
firstgoodparticle (optional), byhand (optional)}
- OUTPUT: select/prj***/goodsel (one for each reference-view),
select/combinestats
- (Optional) Re-screen the particles by view.
- Screen particles using .montagefromdoc
- In select/, I included a .montagefromdoc file with hopefully reasonable settings.
If not, type:
montagefromdoc.py prj001/goodsel.dat prj001/stkfilt.dat
For the output filename, use prj001/notgood.
- To salvage bad particles, do the converse, i.e.,
use prj001/badsel.dat and prj001/notbad.dat as the input and output.
- recheck.spi
- SUBROUTINE: reversedoc_7col.spi
- INPUT: select/prj***/sortsel, select/prj***/goodsel,
select/prj***/badsel select/prj***/notbad, select/prj***/notgood
- OUTPUT: select/prj***/goodselB, select/prj***/badselB
- For the output doc files, I used a letter to distinguish from prior output.
A number might cause problems for montagefromdoc.py.
- (Optional) Average images by view
- viewaverage.spi
- INPUT: select/prj***/goodsel
If you ran recheck.spi, add the tiebreaker (e.g., B) to the filename.
- OUTPUT: select/prj***/goodavg,select/prj***/goodvar (optional)
- One advantage of this batch file over average.spi is that it combines all defocus groups.
- You can use verifybyview.py to link the averages to the retained particles.
To do so, run verifybyview.py from Reconstruction/select/
(so as to not overwrite the settings in Reconstruction/).
The default settings in .verifybyview will hopefully be reasonable.
In principle, you could re-screen your particles if there are still bad ones.
- Compute CCC histogram of particles
- histgoodccc.spi
- INPUT: select/prj***/goodsel, select/prj***/badsel
If you ran recheck.spi, add the tiebreaker (e.g., B) to the filename.
- OUTPUT: combinedgood, histcccgood, histcccbad
- Check histogram using fit.gnu,
modifying file extension and fit, if needed. Type
gnuplot
and then at the prompt, type (including the single quotes):
load 'fit.gnu'
Normally, the histogram should look Gaussian. If there is a tail or a second mode
at the low-CCC end, there may be non-particles remaining. You can filter them out
by using a fractional cutoff in a later step, or you can go back and more stringently
go through the particles.
- Separate total good-particle list by defocus-group.
- goodparticlesbydf.spi
- PARAMETER: fractional cutoff (optional, use 0.0 to keep all)
- INPUTS: combinedgood, stack2particle***
- OUTPUT: df***/goodparticles, sel_group_cclim, sel_group_cclim_sorted
- For large particle sets, this batch file may crash, in which case try
big-goodparticlesbydf.spi
- Generate alignment documents with only good particles
- dfgoodapsh.spi
- INPUT: ../Alignment/align_01_***, df***/goodparticles
- OUTPUT: ../Alignment/goodalign_01_***, sel_particles***
- The sel_particles*** files are essentially equivalent to
df***/goodparticles, lacking only the CCROT column.
Return to top
Compute averages --
also in Reconstruction/ directory
- select.spi (optional, necessary for display.spi or plotview.spi)
-- for each defocus-group, separates particles by reference-view
- INPUT: Use goodalign_01_*** instead of align_01_***
- OUTPUT: df***/select/sel###, df***/how_many, how_many
- Skip average.spi -- The equivalent was performed above by viewaverage.spi
- Skip cchistogram.spi -- The equivalent was performed above by histgoodccc.spi
- Skip ccthresh.spi -- The equivalent was performed above by histgoodccc.spi
- Skip dftotals.spi -- The equivalent was performed above by goodparticlesbydf.spi
- plotview.spi, display.spi (optional) -- graphically show distribution of views
- INPUT: df***/how_many
- OUTPUT: plotview, display/cndis***
- bestim.spi (optional) -- truncate seltotal files for overrepresented views
I don't think I've tested this step.
Return to top
3D reconstruction --
still in Reconstruction/ directory
- Generate two half-set, 3D reconstructions for each defocus-group
- instead of deffsc.spi, run
bps-by-df.spi,
which calls bpcg.spi.
- PARAMETER: backprojection method: BP CG, BP 32F, or BP RP
- INPUT: dala01_***, ../Alignment/align_01_***,
sel_particles_***
- OUTPUT: df***/{vol001_odd, vol001_even}, df***/doccmp001
- slices.spi (optional)
- INPUT: df***/vol001_odd
- OUTPUT: slices/slice***
- ctf.spi
- INPUT: df***/{vol001_odd, vol001_even}
- OUTPUT: ctf/ctf***, dfselect, combires, vol001
- res.spi
- INPUT: combires
- OUTPUT: resolution
- plotres.spi
- INPUT: combires, resolution
- OUTPUT: plot_res
- filt.spi
- INPUT: resolution
- OUTPUT: volfq001
- Prepare files for refinement
- consecprepare.spi
- INPUT: lots
- OUTPUT: lots
- This was adapted from Refinement/copyin.pam, but
I didn't want to put new files in yet another directory.
So, this refinement batch file is also kept in Reconstruction.
Return to top
Refinement -- in Refinement/
directory
Return to top
Additional batch files
I have some miscellaneous batch files here.