HOME GUIDE OPERATIONS DOCS FAQ TECHNIQUES FORMATS INSTALL NEW TIPS WEB LINKS Wadsworth Labs

CL HC - Classification - Hierarchical Clustering

(5/30/09)

PURPOSE

Performs hierarchical clustering according to one of several clustering criteria, on factors produced by CORAN. Produces a dendrogram Postscript plot and dendrogram description doc. file These files can be used to determine the images/elements assigned to each cluster (class).

SEE ALSO

CL CLA [Classification - Diday's Clustering]
CL HD [Classification - Hierarchical, get number of classes]
CL HE [Classification - Hierarchical, create cluster selection files]
CL KM [Classification - K Means clustering ||]

USAGE

.OPERATION: CL HC

.CORAN/PCA FILE (e.g. CORAN_01_IMC) : coran_t_IMC
[Enter name of the raw image data sequential file (_SEQ), image factor coordinate file (_IMC), or pixel factor coordinate file (_PIX) file containing your data. These files were created by 'CA S. (This operation is usually used with _IMC files).

.FACTOR NUMBERS: 1,3,4,6
[Enter the factors to be included in the hierarchical clustering.]

.FACTOR WEIGHT: 1.5
[Enter a weight for each factor selected. If the answer zero is given at any point, all weights from the current factor onwards are set to one. This question is asked as many times as the number of factors specified, or is terminated by the answer zero.]

.FACTOR WEIGHT: 0

.CLUSTERING CRITERION (1-5): 2
[Enter the number indicating clustering criterion to be used. Possible choices are:
Option 1:   Single linkage
Option 2:   Complete linkage
Option 3:   Average linkage
Option 4:   Centroid method
Option 5:   Ward's method]

.DENDROGRAM POSTSCRIPT OUTPUT FILE: HC_DEND_PLOT
[Enter name of file where the dendrogram will be stored. Enter '*' to skip creating this file and skip the next question.]

.ENTER PLOT CUTOFF (0 ..100): 30
[Enter the scale value value at which the dendrogram will be cut/truncated. Only top portion will be produced.]

.DENDROGRAM DOC. FILE? (Y/N): HC_DEND

[Enter the document file name where the UNTRUNCATED dendrogram information will be stored. This file contains the class numbers and height of all dendrogram branches. Using this information, one can retrieve the images/elements which
are present in each of the classes. Enter '*' to skip creating this file.]

NOTES

  1. See: Classification and Clustering Summary and Classification and Clustering Tutorial for further info.

  2. The RESULTS file contains the following information:

    A) Aggregation history:
    For each of the NP partitions, the NS seeds and sizes of clusters are listed. The non-empty clusters in the crossed partition are given by size and cummulative percentage. A total of NS**NP clusters are possible, but in practice, only 10% of these are non-empty.

    B) Description of hierarchy nodes:
    The nodes are numbered starting from the number of the highest cluster. For each node, the Senior, Junior, size, weight and hierarchy index are given. The hierarchy index is printed out as a histogram.

    C) Description of the classes:
    For each node the constituent classes are listed.

    D) Dendrogram:
    The class relationships are represented in the form of a dendrogram (tree structure). The lengths of the branches (in horizontal print direction) are proportional to the heirarchy indices.

    E) List of class members:
    The members are listed for each of the basic non-empty classes.

    F) List of class center coordinates:
    For each class, the NFAC coordinates of its center is listed. These are contained in the cluster file.

    G) Re-classification lookup table:
    Each cutoff point in the dendrogram, from right to left, defines a new classification scheme with the number of classes increasing by 1 each time. The table gives the new class memberships for any cutoff point selected.

  3. To get the classification on a given "cutting" level use operation: 'CL HD'.

  4. To get the selection doc files corresponding to the given "cutting" use operation: 'CL HE'.

  5. To calculate averages for the classes use operation 'AS DC' and selection doc. files.

  6. Every vertical line at the bottom of the drawing represents an image that you input into 'CL HC'. Each vertical line is an average of the images, or vertical lines, below it.
    The threshold is a scaled value from 0 to 100 that informs 'CL HC' how far "up" the dendrogram you wish it to look. A threshold set at the bottom would result in the number of classes being equal to the number of input images. A median threshold value of 50 results in fewer classes. And a top level threshold gives a single class containing of all the inputs.

  7. With untruncated results and or very many classes the Postcript plot lines/labels may be overwritten, sorry.

  8. Implemented by P. Penczek.

SUBROUTINES

HCLS, HCLP, DIST_P, CHAVA, DENDRO

CALLER

UTIL1

© Copyright Notice /       Enquiries: spider@wadsworth.org