Classification and Clustering

Table of ContentsSPIDER operation
Correspondence Analysis or Principal Component Analysis CA S
Eigenvalues : determine what variations are attribute or noise basedview with Gnuplot
Factor Maps CA SM
Clustering and Hierarchical Classification CL HC
Reconstitute images from eigenvectors CA SR
Create "virtual" images from eigenvectors CA SR
Difference images CA SRD
Dendrograms View in WEB
Subgrouping images CA SMI
Viewing Eigenimages CA SRE
Reconstitute Arbitary (Virtual) Images CA SRA
References

Automatic Classification/Clustering -- General Steps

Create Orthogonal Vectors Determine What Factors Are Important Cluster & Classification View Images WEB Tricks

Source Data

Makefaces.bat and face.bat were used to create the eight original faces below. The faces differ in three ways: oval vs. round head, left vs. right eyes, and big vs. small mouth. Ten copies of each face with random noise created the sample data set. These procedure files create four kinds of files. Scr* files are the face templates, seen here. Sma* files are the noise-filled data set, example below. Sca* files carry the average of the ten noise images for each template, and scv* hold the variance for each template.



Correspondence Analysis(CA) or Principal Component Analysis(PCA)

CA is the prefered method of finding variations and we will principally be discussing inter-image variance. PCA computes the distance between data vectors with Euclidean distances. While CA uses Chi-squared distance. This is superior because it ignores differences in exposure between images, eliminating the need to rescale between images.
Cas.bat is a procedure file that runs the CA S command. The procedure file assumes you prefer CA and creates a user-defined circular mask. Cas.bat also creates eigendoc.dat

CA S Hints

Description of Output Files (links lead to output from faces example)