Classification of Images
see the tutorial on classification and clustering
Classification is a
computational procedure that sorts images into groups
("classes") according to their similarities. Images can be similar in all kinds
of ways, but in EM-related image processing we use a very strict measure of
similarity that is based on a pixel-by-pixel comparison: the mean squared
difference, a.k.a. generalized Euclidean distance.
Images represented by N x N arrays of density values can be thought of as points
in an N x N-dimensional space. Points that are close to each other in that
space represent images that are "similar" since the mean squared difference
between their pixel values is small.
There are two different approaches to classification: supervised and
unsupervised. Both make use of the similarity measure introduced above,
but one (supervised) classifies a set of images according to their similarity
(speak: closeness in our high-dimensional space) with certain pre-given images
("references" or "templates"), the other (unsupervised) classifies the images
according to their intrinsic grouping or clustering within the set.
This is demonstrated schematically in the figure. The same set of images,
represented by a set of dots, is either classified by comparing each image with
a set of references (represented by fat dots), or by dividing the whole cloud
of dots into clusters (indicated by dashed line).
For simplification of the analysis, or for the purpose of increasing the
signal-to-noise ratio, classification is often carried out in a space that
is of much lower dimensionality than the initial N x N space. This reduction
of dimensionality is achieved by Multivariate Data Analysis (also known as
multivariate statistical analysis).
The two most common techniques in EM are
correspondence analysis
and principal component analysis.