Next: Classification
Up: Materials and Methods
Previous: Haralick texture features
Feature Selection
>>>>
In order to choose a subset of features from the combined Haralick and
Zernike sets described above, two feature selection methods were
applied to the training data. The first used was the STEPDISC
procedure in SAS (SAS Institute, Cary, NC, USA). The default
parameters of the procedure, which is an implementation of stepwise
discriminant analysis (SDA) [30], were used.
>>>>
The goal of stepwise discriminant analysis is to sequentially identify
those variables (features) that maximize a criterion which describes their
ability to separate classes from one another while at the same time
keeping the individual classes as tightly clustered as possible. The
criterion used is Wilks' >>>>
which is defined as
>>>>
 |
(2.7) |
where >>>>
is a vector of the features that are
currently included in the system,
>>>>
 |
(2.8) |
is the matrix of within-groups sums of squares and
cross products for the features under consideration, and
>>>>
 |
(2.9) |
is the matrix of total sums of squares and cross products. >>>>q is the
number of classes, >>>>ng is the number of samples in class >>>>g,
>>>>xigt is the value of feature >>>>i for sample >>>>t of class >>>>g,
>>>>
is the mean of feature >>>>i over class >>>>g, and
>>>>
is the mean of feature >>>>i over all classes.
>>>>
Low values of >>>>
indicate features that better discriminate the
classes. To accommodate the stepwise nature of the process, the
partial >>>>
statistic is used. This statistic describes the
increase in the discrimination ability of a system after adding a new
feature, >>>>xp+1>>>>
![\begin{displaymath}\Lambda(x_{p+1}\cdot\mathbf{x}) =
\frac{\Lambda([\mathbf{x},x_{p+1}])}{\Lambda(\mathbf{x})}
\end{displaymath}](boland_img54.gif) |
(2.10) |
>>>>
To facilitate the ability to decide whether adding a new feature to
the system will increase discrimination significantly, Wilks'
partial->>>>
is converted to an F-statistic for which it is
possible to assign a level of statistical significance: what is the
probability, given the null hypothesis that there is no separation
between groups, that one would obtain a value larger than
>>>>
 |
(2.11) |
where >>>>n is the number of data samples in all classes, >>>>p is the
number of features currently in the analysis, and >>>>q is the number
of classes. Large values of F indicate better discrimination for
a particular feature. This version of the F-statistic is called the
F-to-enter criterion because it is used to decide whether feature
>>>>xp+1 should be entered into the system. A corresponding value,
the F-to-remove criterion is used to decide whether a feature should
be taken out of the system.
>>>>
The process of stepwise discriminant analysis involves the
following steps [31,32]:
>>>>
- 1.
- Calculate the within-groups (
W) and total (
T)
sums of squares and cross-products matrices for all features. Let
wii=W(i,i),
tii=T(i,i), and
.
Vi is therefore analogous to
Wilks'
.
- 2.
- Calculate the F-to-remove statistic for each feature i already
included in the system:
 |
(2.12) |
The value of
Fremove(i) is used to calculate a significance level
for an F random variable with degrees of freedom (n-p-q+1) and
(q-1). The feature with the lowest
Fremove value that also
corresponds to a significance level (p) greater than an assigned
threshold (p=0.15 in STEPDISC) is removed from the list of
features. NOTE: this step is skipped when entering the first feature
(i.e., when no features have yet been entered).
- 3.
- If a feature was removed in step 2, the
W and
T matrices must be updated to reflect
that change. Jennrich [31] describes ``sweep''
operators that can be applied to these matrices such that a feature
is changed from a dependent variable to a predictor variable (on
entry), or from a predictor to dependent (on removal). This
alleviates the need for recalculating the
W and
T matrices at each step in this process. The modified
matrices are then passed to the next step.
- 4.
- Calculate the F-to-enter statistic for each feature j not
already included:
 |
(2.13) |
This time, a significance level for an F random variable with degrees
of freedom (n-p-q) and (q-1) is calculated, and that feature with
the largest Fenter which has significance below a threshold
(p=0.15 in STEPDISC) is included.
- 5.
- Again, the
W and
T matrices are updated
using the sweep operator to change any included feature from
dependent to predictor status.
- 6.
- Return to step 2 with the modified
W and
T matrices. When no features can be
removed or entered (i.e., the significance tests all fail to achieve
their respective thresholds), the process stops.
>>>>
The second method for identifying a subset of features used a modified
version of the multiple discriminant analysis criterion described by
Duda and Hart [10, p. 120]. Features selected were those
that had the largest ratio of the variance of that feature calculated
using all samples in the training set to the sum of the variances of
that feature calculated for each class (i.e., image type) in the
training set
>>>>
 |
(2.14) |
where >>>>
fc contains only feature values from class >>>>c and
>>>>
f contains features from all image classes. The goal of
this criterion is to identify features that widely separate the
classes from one another (total variance) while keeping the classes
themselves as tightly clustered as possible (sum of within class
variances).
>>>>
>>>>
Next: Classification
Up: Materials and Methods
Previous: Haralick texture features
Copyright ©1999 Michael V. Boland
1999-09-18