Multivariate statistical Analysis (MSA)


There are essentially only two steps here:

  1. Dimension-reduction -- expression of a set of mxn images using only a few terms of an expansion into eigenvectors, or factors. This expansion results from an analysis of the interimage variability of the entire image set. A low-dimensional space spanned by only afew factors is often sufficient to represent each image of the set.
  2. Classification of the images in the low-dimensional factor space.

For more details, consult pp. 145-192 in Frank, Oxford University Press (2006).

Note that for the purpose of classification, the dimension-reduction step is optional. In principle, one could classify the raw images (which is what SPIDER operation 'AP CM' does). Dimension-reduction has two purposes: (1) it greatly reduces the amount of data that needs to be analyzed, and (2) it removes a large amount of noise, or information without any systematic trend among the images.

The example given below uses correspondence analysis for the dimension-reduction. A similar method is principal component analysis (PCA); to run PCA, one needs to change an option under SPIDER operation 'CA S' in the batch file ca-pca.spi.

There are three methods for classification presented here: Diday's method, Ward's method, and K-means.

Use of the individual SPIDER operations are described in more depth here.

There is a Python utility, classavg.py, that upon clicking on a class average, displays the constituent individual particles.

Source: techs/MSA/index.htm     Page updated: 01/20/05     Tanvir Shaikh