CORAV.DOC 3/3/93 CORRELATION AVERAGING OF CRYSTALS
====================================================================
Note: 4/7/93 the "WV P" option has a bug and is not working
properly. It will be fixed in the next release
This document is an introduction into correlation averaging of
crystalline lattices using both global and patchwork averaging.
1) OVERVIEW
===========
The method of Correlation Averaging, in general, works on the
principle of locating the precise position of each repeat within
a crystal by cross-correlation with a reference. In this way,
dislocations, fault lines, and any local translational variations
in the position of the lattice repeat will not result in a resolution
loss.
After the peaks of the cross-correlation function have been located,
there are two ways to proceed:
A) Real-space averaging using a "hopping window" that is placed
into each of the peak positions (wherever the peak height exceeds
a certain threshold) (SPIDER method and method adopted by Saxton);
B) Fourier averaging after a process of "unbending" whereby the
raw image is un-distorted so that the resulting image depicts the
crystal in perfect order (MRC method, developed by Henderson).
In Albany, the Real-space method is implemented. The real-space
summation of windows is done either globally, over the entire
field (WV) or locally, over each of a set of patches (WV P; patch-
work averaging). In the former case, a single average is produced;
in the latter case, an entire set of averages is produced, one for
each patch.
In the patchwork averaging, the set of averages is subjected to
multivariate statistical analysis (MSA), and according to the
results of this analysis, statistically meaningful subset averages
are formed, each depicting the structure in a different state,
conformation, or orientation.
The global average or the subset averages can then be used to extract
crystallographically meaningful information.
======================================================================
======================================================================
2) OUTLINE OF THE SEQUENCE OF STEPS
===================================
(STEPS (a) THROUGH (c) ARE SHARED BY GLOBAL AND PATCHWORK AVERAGING;
STARTING WITH (d), THE TWO METHODS DIVERGE, TO COME BACK TOGETHER
AGAIN IN STEP (e))
a) prepare a reference. This is either a small subfield containing
a well-ordered portion of the crystal (single-layer case), or
a small subfield of a quasi-optically filtered version of the
crystal (multiple-layer case).
b) run procedure CORAV1. This pads the subfield into a field whose
size matches the size of the raw data field (but only powers of
two allowed, see below), computes the high-pass filtered CCF,
and produces a list of refined peak positions and heights, which
is stored in a document file. It also produces a version of the
CCF that has the peaks found overlaid.
c) inspect the CCF in order to determine the sensible value range
of the peaks to be used in correlation averaging. The CCF with
overlayed peak map is displayed on the monitor with a special
color table that allows the range to be assessed in one glance.
------------------------------------------------------------------
GLOBAL:
======
d) run WV. This produces a global average, or two subset averages
picked according to even and odd peak numbers.
PATCHWORK:
=========
d0) run WV P. Decide on a regular division of the field into patches.
A set of "local" averages is produced, one for every patch.
d1) display the local averages in the form of a patchwork.
Decide if the variations are strong enough to warrant application
of MSA. If not, then form the average of all patch averages, and
proceed with step (e).
d2) Submit the set of patch averages to MSA: CA SI (initialize),
then CA S (run CORAN), followed by CA SM (map). Inspect the map
or maps for evidence of clustering or obvious data trends.
If clustering suspected, run CA CLA.
d3) Based on (d2), form subset averages by reconstitution (CA SR).
These subset averages are virtually equivalent in terms of
statistical definition, to the global average resulting from
global averaging.
----------------------------------------------------------------------
END OF SEPARATE BRANCH. BOTH LOCAL AND GLOBAL AVERAGING CONTINUE HERE.
e) Extract crystallographic parameters from the average, or averages.
This is done in the following steps:
Mask file; Fourier-transform; run MMBOX (MRC) or an equivalent
operation.
======================================================================
======================================================================
3) THE STEPS IN DETAIL
======================
a) prepare a reference
===================
There are basically two kinds of references that can be used
for cross-correlating with the crystal field: either a
small, well-ordered subfield of the crystal or a small
subfield from a quasi-optically filtered Fourier average of
all or part of the crystal.
The unprocessed reference is adequate for single layer
crystals with recognizable repeating units (i.e. good
signal-to-noise situation).
However, if the contrast is low (frozen-hydrated or
glucose-embedded specimens) or more than one independent
crystal layer is present (collapsed membrane vesicle), a
Fourier average should be used.
The reference subfield should contain 4-9 unit cells, i.e.
one unit cell (centered) bounded by one-half to one unit
cell in each lattice direction.
The first step in preparing a reference is to display the raw field
from which a subfield is to selected:
In WEB/IMAGES:
SHOW IMAGE: RAW001 (name of the "raw" field)
Next, there are two optional ways to prepare a reference, as described
above:
Option (i) To prepare an unprocessed subfield, simply window an
appropriately sized section from a region of the crystal that appears
to be representive and well-ordered.
In DRIVER:
WI ; window
RAW001 ; input file
REF001 ; reference file
32,32 ; size of reference
230,200 ; starting coos
Option (ii) To prepare a quasi-optically filtered subfield, the
procedures QUASIOPT1 and QUASIOPT2 are used in conjunction with
the operation WT TV.
MD ; Mode
TR ON ; Trace on (to see the steps in the procedure
as they are executed)
@QUASIOPT1
1 ?RAW IMAGE? RAW001
[the name of the image file from which a subfield is
to be selected for filtering.]
2 ?FINAL SIZE? 256,256
[the size of the subfield to be quasi-optically
filtered, restricted in this version to less than or
equal to 256 x 256.]
3 ?UPPER LEFT COOS? 230,200
[the starting coos in the raw field.]
4 ?FOURIER FILE? FOU001
[the name of the Fourier transformed subfield.]
5 ?POWER SPECTRUM? POW001
[the name of the power spectrum of the transform.]
The subfield will be circularly padded into a 512 x 512
file, transformed, and its power spectrum (diffraction
pattern) created. (The zero-order maximum in the power
spectrum will also be masked out for display purposes.)
Next, the power spectrum is displayed on the workstation
monitor with WEB/IMAGES:
SHOW IMAGE: POW001
If a well-ordered region of the crystal was chosen, the
power spectrum will contain strong, sharp maxima
falling on one or more reciprocal lattices. If the
diffraction pattern is not satisfactory, repeat QUASIOPT1
for a different subfield.
(It is possible to automatically determine the power spectrum for
subfields in the image using the EDIT/POWER SPECTRUM option in WEB.)
At this point, the two-dimensional reciprocal lattice in the
power spectrum must be indexed. This can often be done
visually on the monitor. However, in the case of multiple
crystal layers, a hard copy of the power spectrum may be
needed to sort out the different lattices.
In WEB/IMAGES, select the REFLECTIONS option:
DOCFILE: DOC001 (name of file in which the indices and
coos of the selected reflections will be
stored)
KEY NUMBER: 1
MARKER RAD: 5 (radius of circle in which the desired
reflection is located)
Cursor is placed over a strong reflection with unambiguous
lattice indices and left button clicked.
2,0 indices of this maximum are entered in box
Cursor is clicked on 2nd reflection.
2,2 indices of 2nd reflection entered
.
.
.
Additional strong, unambiguous reflections are entered, at least
3 or 4 but the more the better. (Be sure they are not all on
the same h,0 or 0,k lattice line.) Click the right button after all
reflections are entered.
Next, return to DRIVER:
WT TV
POW001 ; power spectrum
FOU001 ; Fourier transform
DOC001 ; file containing lattice indexing information
7,5 ; size of box in reciprocal space in which
a search will be made for true maxima,
size of box over which refinement will be done
At this point the program will list 3 columns of coos:
Column 1 = entered coos
Column 2 = coos transformed to reciprocal
space coo system, i.e. x*
positive down, y* positive right)
Column 3 = coos of pixel with highest value
inside the box searched.)
180 ; enter the maximum radius in Fourier
space to be searched for maxima (i.e.
usually the radius within which all the
visible maxima in the power spectrum are
contained)
(At this point, the program returns
unit vector information first on a
crude lattice, then on the refined
lattice derived from the positions of
the entered maxima and those found inside
the maximum radius. Refined parameters
should be recorded for future reference.)
FIL001 ; name of the filter file which will
contain information about the refined
lattice
Next, FL may be used to check the refined lattice.
FL ; Fourier list
FOU001 ; name of transform of subfield
F ; file-select mode
1 ; scaling factor
FIL001 ; name of filter file
11 ; window size
The program will window out the amplitudes and phases inside
an 11x11 box for each maximum at the coordinates specified by
the filter file. These are stored in the RESULTS file,
which can be printed out and checked at this point. In
general, most (70% or so) of the "energy" at each maximum
should be contained in a 3x3 or 5x5 box centered on the
indicated coordinates and the phases within the box should be
relatively constant. If this is not the case, double check
your choice of reflections and indexing back in the REFLECTIONS option
of WEB.
Now, the procedure QUASIOPT2 is run which applies a mask to
the Fourier transform (FOU001) passing only maxima and phases
at the coos specified by the filter file, FIL001, and then
calculates the inverse transform. The output file is a
quasi-optically filtered version of the original subfield.
In addition, the power spectra of the filtered transform is
computed (POG999) and may be examined with TV W. (This is a fast way
to check the quality of the lattice mask.)
@QUASIOPT2
1 ?FOURIER FILE? FOU001
[name of transform of subfield being filtered]
2 ?FILTER FILE? FIL001
[name of filter file]
3 ?MASK SIZE? 3
[size of smallest square box which passes most of
"energy" at each maximum, usually 3x3 or 5x5]
4 ?OUTPUT FILE? OUT001
[name of filtered output file]
Now, an appropriate reference subfield may be windowed out
of the filtered field using WEB/IMAGES:
SHOW IMAGE: OUT001
WINDOW FILE/FIXED SIZE: 32,32
Position box on filtered crystal image so that it is centered on
one unit cell. Best region is normally at the middle of the
filtered field. Push left button on mouse.
WINDOW FILE NAME: REF001
Press right button on mouse to end windowing operation.
SHOW IMAGE: REF001 (display the reference field; repeat
windowing procedure if unit cell
poorly centered).
b) run procedure CORAV1
====================
Following are instructions on how to run the procedure CORAV1, which
is used to produce the peak list from the raw data file and the
reference file. The actual procedure is listed at the end of this
section.
As the first step, the raw data file must be cut down to a size
that is equal or smaller than the working size. The dimensions of
the working image must be powers of two. Use the window (WI) command.
Let's say the dimensions of the original image (RAW001) are 894x1232.
The closest power-of-two dimensions are 1024x1024:
WI ; window
RAW001 ; input file
WIN001 ; subfield to be averaged
894,1024 ; size of area cut out
130,420 ; starting coos
Here the y-starting coo is chosen such that the strips cut off on
either side are equal. Usually, the edges of the crystal field
are of inferior quality, so if something needs to be sacrificed,
it might as well be edge regions.
Next, the procedure is called (the input statements are numbered
consecutively):
@CORAV1
1 ?WORKING DIM (POWERS OF 2)? 1024
[the working size used in the procedure. Currently,
this is laid out for square format only]
2 ?RAW FIELD (DIMS .LE. WORKING DIMS)? WIN001
[the image subfield created above]
3 ?REFERENCE IMAGE? REF001
[the image created in step (a) of this manual.
This is allowed to have dimensions that are not
powers of two.]
4 ?NAME OF CCF? CCF001
[the name of the file where the CCF is to be stored.
Even though the only important output of the procedure
is the peak list, obtained by searching the CCF, it
is useful to save the CCF at least temporarily, to
be able to repeat the peak search (PK C) operation with
different values of the search parameters (e.g., neigh-
borhood exclusion).]
The file CCF001 will now be created with working dims 1024,1024,
and the input image WIN001 will be padded into CCF001 in such
a way that the margins are equal on both sides.
Next, a file PIC999 is created with the same working dims, and
the reference field REF001 is padded into PIC999, with the average
of REF001 used to fill the background.
Next, the two files CCF001 and PIC999 are cross-correlated.
In the process of computing the CCF, a high-pass filter is
applied:
5 ?FOURIER RADIUS FOR CCF HIGH-PASS FILTRATION? 45.
[the radius of a high-pass filter. This prevents back-
ground fluctuations unrelated to the crystal structure
from interfering with the peak ranking. Be sure to
chose a radius that is smaller than the smallest
reciprocal vector, so that no first order reflection
is eliminated. To be sure, you may wish to check after-
wards, by displaying the power spectrum of the FT of
the working file (to compute power spectrum, use PW).
The CCF is stored in the file CCF001. This file is now searched
by operation PK C. The next four questions have to do with
parameters of the searching:
6 ?MAX. NUMBER OF PEAKS EXPECTED? 1000
[currently, the largest number that can be entered
here is 4000. If a larger number is entered, it will be
changed to 4000. Peaks will be searched in the order of
descending heights. As a guide for the number of peaks,
consider the number of unit cells that fit into the field
analyzed, but increase this estimate, allowing for the fact
that a good number of incorrect peaks (not on the lattice)
will be found whose heights are larger than the heights of
some of the correct ones.]
7 ?NEAREST NEIGHBOR EXCL. DIST.? 9
[enter the radius of the neighborhood exclusion. The
significance of this parameter is as follows: for a
peak to be accepted into the final list, which is written
into the peak list file (a document file, see below), it
has to fulfill the following criteria:
p(x,y)> p(x+1,y)
p(x,y)> p(x-1,y)
p(x,y)> p(x,y+1)
(i) p(x,y)> p(x+1,y+1)
p(x,y)> p(x-1,y+1)
p(x,y)> p(x+1,y-1)
p(x,y)> p(x,y-1)
p(x,y)> p(x-1,y-1)
(ii) (x-xn)**2 + (y-yn)**2 > (neighborhood excl.)**2
(iii) x > (x-exclusion)
x < NSAM-(x-exclusion)
y > (y-exclusion)
y < NROW-(y-exclusion)
Condition (i) is the basic 8-neighbor criterion.
Condition (ii) eliminates peaks that lie so close to peaks
already found that they cannot be possibly on the same
lattice. This exclusion works on the assumption that the
initial list of peaks already found is correct. Under
unfortunate circumstances, it may cause the search to
"lock" into an incorrect lattice, or exclude a portion
of the correct lattice around a high artifactual peak.
Condition (iii) requires no decision, but follows from the
geometry of the correlation search, which requires exclusion
of a margin whose size is equal to half the size of the
reference plus any padding around WIN001.]
8 ?NAME OF PEAK FILE? DOC010
[enter the name of the document file into which the peak
values will be written. The important parameters to be
picked up later by PP as well as WV are
KEY, XPOS, YPOS, HEIGHT
where KEY is the peak with rank KEY in the list of
all peaks that have passed conditions (i),
(ii), and (iii);
XPOS is the x-position of the peak relative
to the origin of the working field;
YPOS is the y-position
HEIGHT is the absolute height of the peak.]
After the peak list has been obtained, it is used to make two
point maps (using operation PP), which give a survey of peaks found
and allows a judgement to be made if the peaks, or a subset
are lying on the lattice. The first map has the points superimposed
on the CCF. The second map shows the peaks against a uniform background.
9 ?CCF/PEAK MAP? CCP001
[the name of the file where the modified CCF with peaks
overlaid is to be stored. This file is used in step (c)
below to assess the peak value range to be used in
the averaging step (step (d)).]
10 ?RANGE OF PEAKS TO BE MARKED (e.g., 1-500)? 1-1000
[give the range of peaks to be marked as points in the
CCF/PEAK map.]
11 ?PLAIN PEAK MAP? MAP001
[the name of the point map with a uniform background.
This map is useful for assessing lattice order.]
================================================================
3) c) interactive assessment of peak range
====================================
The CCF/PEAK map is identical to the CCF except that in each of
the peak positions found and selected for marking, a single
point with highest intensity is inserted. This point appears
white on a normal display with TV W. In order to assess the
useful value range of the peaks, the CCF/PEAK map must be
displayed with the color table CAM02. This has the value range
between MAX and MIN equally divided into 8 colors. In descending
order, the colors are:
WHITE
YELLOW
ORANGE
RED
PURPLE
VIOLET
BLUE
BLACK
In addition, the peaks selected are marked with a GREEN point in
the center.
-----------------------------------------------------------------
AVERAGING PROCEDURES
====================
At this point, the user has the choice of making a global average
using the operation WV or proceding with a patchwork analysis of local
averages in the crystal field.
3) d) global average
==============
The operation WV can be used to sum windows extracted from the entire
field specified in CORAV1 at the coos contained in the peak list file:
WV ; Window aVerage
PIC999 ; The padded version of the raw field produced
in CORAV1
AVE001 ; Name of correlation average file
32,32 ; Dimensions of average
DOC010 ; Name of file containing peak list
1-250 ; Peaks to be included in average
N ; Fix the number of peaks (Y/N)? Peaks that are too
close to the edge of the field are rejected. "Y"
causes additional peaks to be included in the average
to compensate for those skipped
1 ; Peak number increment: 1= use each peak in order
2= use every 2nd peak, etc.
Y ; Make "control windows", i.e. image files containing
the areas windowed-out
1 ; Control interval, i.e. make a control window for:
1= every peak used, 2= every 2nd peak used, etc.
WND*** ; Prefix of control windows
The decision of how many peaks to use in the final average may be based
on examination of the peak maps generated in CORAV1 or by information in
the peak list file (e.g. the user may decide to use only peaks with a
relative or absolute height greater than some cutoff).
A more general approach is to make control windows for all the
peaks in the peak list file and to use an objective criterion to decide
at which peak number inclusion of additional peaks no longer improves
the quality of the average. For this, the procedures CORAV2, CORAV3,
CORAV4 are used. CORAV2 makes an odd and an even series of images from
the control windows, CORAV3 makes a series of masked odd/even
subaverages from the windows, each i-th subaverage containing the first
i*100 windows in its respective series (i.e. odd or even), CORAV4
finally calculates phase residuals between the cumulative odd and even
sums:
@CORAV2
1 ?TOTAL NUMBER OF PEAKS (TO NEAREST 100) TO BE USED? 800
[this must be .le. number of peaks specified in WV]
@CORAV3
1 ?NUMBER OF SUBAVERAGES DESIRED PER ODD/EVEN SERIES? 4
[enter number of averages to be created per series]
2 ?AVERAGING INCREMENT? 100
[enter number of images per average]
@CORAV4
1 ?NUMBER OF SUBAVERAGES PER ODD/EVEN SERIES? 4
[re-enter the number of averages computed in CORAV3]
2 ?NAME OF DOC FILE TO CONTAIN RESULTS? DOC100
[enter name of file to contain results]
The three columns in the DOC file correspond to:
1) the index i (1=1st 200 windows, 100 odd + 100 even
2=1st 400 windows, 200 odd + 200 even...)
2) the inner radius of the annulus (3 Fourier units wide)
in which the phase residual is calculated
3) the phase residual in the annulus.
Examination of the results in the DOC file indicates at which point
inclusion of groups of windows (subaverages) corresponding to higher
peak numbers no longer improves the agreement between the ODD and EVN
series (i.e. phase residuals no longer decrease and may increase). Once
this number of peaks is determined, AS may be used to make a final
average using only the corresponding number of control windows. (For
example, if phase residuals increase from i = 6 to i = 7, the number of
peaks that should be included in the final average is (6 x 50) x 2 =
600.)
3) d) analysis by patchwork averaging
===============================
This computation consists of four steps:
compute set of averages
display set of averages and decide if the next two steps needed
use MSA (and classification if required)
compute reconstituted images
d0) Compute set of averages.
=======================
First, decide on the size of patches to be used. For frozen-hydrated
material, around 200 unit cells should be in a patch. For stained
material, a much smaller number still works; e.g., 50.
Now, run the procedure PATCHAV. According to the number of patches
across the scanned data file, the field is subdivided into square
patches.
@PATCHAV
1 ?WORKING RAW DATA FILE? PIC999
[the working version of the raw data file, created by
procedure CORAV1 in step (b)]
2 ?FILE TO STORE AVERAGE SET? AVG001
[name of the file where the average set is stored in 3-D
format]
3 ?DIMENSIONS OF AVERAGE (SQUARE FIELD)? 64
[self-explanatory. Give single number equal or close to the
size of the reference field]
4 ?NUMBER OF PATCHES ACROSS? 4
[this number determines the patch size and the total number
of patches used]
5 ?PEAK LIST FILE? DOC001
[the document file into which the peak coos have been saved
in the CORAV1 procedure, step (b)]
6 ?RANGE OF PEAK VALUES TO BE PASSED? 0.0,1.0
[this may be estimated from the false color display of the
CCF/PEAK map generated by CORAV1]
Check the RESULTS file for the number of peaks left over as the
program passes through various steps of selection. These selection
steps are:
value range
rank range (unused in above example)
eliminate peaks within margin
check if total number exceeds total specified by user
At the very end of the RESULTS listing, the numbers of windows
actually used for each patch are listed. This number varies from
one patch to the other because of the difference in quality,
which is reflected in a difference in the number of peaks found
in step (3b) or in the number of peaks passed in the above four
selection steps of the patchwork averaging procedure.
d1) Display the set of averages stored in AVG001 as a montage,
by placing each average into a position that corresponds
to the position of its associated patch. This is done in
DRIVERW either by applying TV M (interactive montage) directly
to the 3-D file, or by extracting the averages by a PS Z
(pick slice in z-direction) operation.
Direct display:
TV M
AVG ; 3-letter prefix
1 ; file number
<CR> ; first and last x-column numbers to be displayed [all]
<CR> ; first and last y-row numbers [all]
1-16 ; first and last z-slice numbers
2,2 ; x and y margins
<CR> ; slice axis (x, y, or z) [z]
<CR> ; normalize slice by slice? [no]
4 ; number of pictures per row; should correspond
to the number of patches per row used in WV P.
<CR> ; default display origin [0,0]
Display after extracting images from 3-D file:
DO LB1 I=1,16
PS Z ; pick slice in z-direction
AVG001 ; file name of 3-D file
PAT00I ; file name given to extracted patch average
X0 ; use DO-LOOP index for picking
LB1 ; end of DO-LOOP
RE ; return from batch
now display series with TV M:
TV M
PAT ; file prefix
1-16 ; file numbers
<CR> ; first and last x-column numbers
<CR> ; first and last y-row numbers
4 ; number of pictures per row
2,2 ; x and y margins
<CR> ; default display origin
The resulting montage will appear on the screen. Now check for
presence of variations. Visual check may indicate absence of
variations. In that case, proceed by summing the image series
to obtain the global average, and skip the next two steps.
d2) In the presence of variations, proceed by running
an MSA analysis: The initialization operation (CA SI) accepts
both a file series and a 3-D formatted file, so it is not
necessary to go through the step of PS Z file extraction.
CA SI creates a single Sequential File in which each image
of the series is stored as a single record.
Immediately after the initialization, the actual analysis is
run.
CA SI ; initialize Correspondence analysis
AVG001 ; average stack file
1-16 ; averages to be analysed
MSK001 ; mask file defining the pixels to be used
SEQ001 ; name of sequential file
Y ; yes, all files selected exist (no gap)
0.0 ; additive constant to assure non-negativity
CA S ; run Correspondence analysis
SEQ001 ; sequential file created above
S ; stochastic version of algorithm
8,1 ; number of factors to be used, file code
The file code is attached to three generic
files created in the analysis. They have
the prefices IMC, PIX, and EIG. In our
example, IMC001, PIX001, and EIG001 are
created.
N ; inactive image input
N ; no background correction
0 ; number of inactive images to be set inactive.
To interpret the results, the eigenvalue histogram, factor maps,
and selected reconstitutions are inspected. Printouts of the
eigenvalue histogram and selected factor maps are created by
operation CA SME, which can be run interactively:
CA SME ; create factor map
I ; map of images
1,4 ; file code used in CA S, no. of patches across
1,2 ; factors to be used
I ; use image ID on map
N ; no plot file to be created
<CR> ; default no. of page width (1)
<CR> ; default no. of lines (60)
<CR> ; default no. of standard deviations (2.3)
<CR> ; default handedness of map (no flipping)
TITLE (up to 80 chars of title info to appear on map)
The resulting eigenvalue histogram and the map selected will be
written into the RESULTS file, and this can be printed on the
line printer.
© Copyright Notice / Enquiries: spider@wadsworth.org