In a further attempt to reduce the dimensionality of the feature set,
a subset of 10 features was selected from the combined Zernike and
Haralick features using the stepwise discriminant analysis
functionality (i.e. the STEPDISC procedure) of SAS. This
method uses Wilks' lambda statistic to iteratively determine which
features are best able to separate the classes in feature space. The
10 features selected using this method are listed in Table
2.7. Using these 10 features as inputs to a BPNN
containing 20 hidden nodes resulted in correct classification rates of
97% for giantin, 93% for Hoechst, 82% for LAMP2, 88% for NOP4, and
54% for tubulin. Although the performance on the first four classes
is identical to the Haralick features alone, the performance on the
tubulin images is significantly worse (81% vs. 54%) and drops the
average classification rate to >>>>.
Performance of the
stepwise discriminant procedure was unsatisfactory in this case.
>>>>
As an alternative, a different subset of features was identified using Equation 2.14. This procedure selects those features that, on average, widely separate the classes from each other while at the same time keep the individual classes tightly clustered. The 10 features selected using this method are included in Table 2.7. Although 7 of the 10 features selected using this approach are the same ones selected using stepwise discriminant analysis, the performance of the BPNN using these 10 features (Table 2.8) was better (88% vs. 83%). This result is important because it indicates that it is possible to achieve performance at least equal to the best single feature set using a smaller number of features selected from both feature sets. >>>>
>>>>
>>>>
True | Output of the BPNN | ||||
Classification | Giantin | Hoechst | LAMP2 | NOP4 | Tubulin |
Giantin | 97% | 0% | 3% | 0% | 0% |
Hoechst | 3% | 97% | 0% | 0% | 0% |
LAMP2 | 12% | 0% | 83% | 2% | 3% |
NOP4 | 0% | 0% | 13% | 88% | 0% |
Tubulin | 0% | 0% | 19% | 4% | 77% |
>>>>