Given that the performance of a BPNN is nearly as good with just the
ad hoc features as it is with all of the features, it is a
reasonable goal to find an ``optimal'' subset of the features.
Optimal is placed in quotes because the identification of the true
best subset of features requires a search of all such
subsets. As one might expect, this approach has been identified as an
NP-complete problem and therefore only suboptimal solutions are
readily available. Such solutions typically use some criterion other
than classification rate (but related to it) to define a best subset.
One of these approaches is stepwise discriminant analysis (SDA). The
goal of SDA is to identify those variables in a system containing
several classes that are best able to separate the classes from one
another while at the same time keeping the classes themselves as
tightly clustered as possible (see Section
2.2.5, p. for
details).
>>>>
Stepwise discriminant analysis was applied to the complete 10 class HeLa data set with all 84 Zernike, Haralick, and ad hoc features. Using the default significance level (>>>>p=0.15) for the tests on the F-statistics used by SDA, 54 features were returned as contributing significantly to the separation of the classes. To further reduce the number of features used in classification, the features output from the stepwise discriminant analysis were considered to be an ordered list in which the ``best'' features were at the top. A BPNN was trained and tested using subsets of the features returned by SDA. The results of these trials are summarized in Table 3.10. The number 37 in Table 3.10 was not chosen arbitrarily, but was selected because that was the number of features with >>>>p-values less than 0.0001. In other words, these features are very unlikely to produce the values they do for the F-statistic if, in fact, the null hypothesis is true (i.e., the class means are identical). The 37 best features are listed in Table 3.11. >>>>
>>>>
Number of Features | Classification Rate |
---|---|
(mean ![]() |
|
5 |
![]() |
10 |
![]() |
15 |
![]() |
20 |
![]() |
37 |
![]() |
54 |
![]() |
>>>>
|
Corrigendum - 10 April 2001, Michael Boland: An error was made in creating Table 3.11 -- item 26 should be deleted, items 27-37 should be shifted up in the list, and Z8,8 should be added as item 37. The corrected table is below. Note that all analysis was done correctly, only the entries in this table are incorrect.
|
Since the top 37 features from SDA provided the best classification rate of any number of features tested, their performance was compared to previous results. The results of training and testing a BPNN with the 37 best features are summarized in Table 3.12. Overall, these 37 features provide slightly better performance than the complete 84 element feature set (83% vs. 81%). This improvement comes largely through increases in the correct classification of actin (96% vs. 91%), transferrin receptor (62% vs. 55%), and tubulin (81% vs. 77%). Despite the improved performance, however, these features are also not able to completely distinguish giantin from GPP130 and transferrin receptor from LAMP2. >>>>
>>>>
True | Output of the Classifier | |||||||||
Classification | DNA | ER | Giant. | GPP | LAMP | Mito. | Nucle. | Actin | TfR | Tubul. |
DNA | 99% | 1% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
---|---|---|---|---|---|---|---|---|---|---|
ER | 0% | 87% | 2% | 0% | 1% | 7% | 0% | 0% | 2% | 2% |
Giantin | 0% | 1% | 77% | 19% | 1% | 0% | 1% | 0% | 1% | 0% |
GPP130 | 0% | 0% | 16% | 78% | 2% | 1% | 1% | 0% | 1% | 0% |
LAMP2 | 0% | 1% | 5% | 2% | 74% | 1% | 1% | 0% | 16% | 1% |
Mito. | 0% | 8% | 2% | 0% | 2% | 79% | 0% | 1% | 2% | 6% |
Nucleolin | 1% | 0% | 1% | 2% | 0% | 0% | 95% | 0% | 0% | 0% |
Actin | 0% | 0% | 0% | 0% | 0% | 1% | 0% | 96% | 0% | 2% |
TfR | 0% | 5% | 1% | 1% | 20% | 3% | 0% | 2% | 62% | 6% |
Tubulin | 0% | 4% | 0% | 0% | 0% | 8% | 0% | 1% | 5% | 81% |
Given the performance of the 37 best features using a BPNN without thresholding, the results obtained using thresholded outputs are not surprising (see Table 3.13). Again, the overall performance is increased slightly, and some classes (mitochondria, actin, transferrin receptor, and tubulin) show modest gains compared to the all-features result. >>>>
>>>>
True | Output of the Classifier | ||||||||||
Classification | DNA | ER | Giant. | GPP | LAMP | Mito. | Nucle. | Actin | TfR | Tubul. | Unk. |
DNA | 98% | 1% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 1% |
---|---|---|---|---|---|---|---|---|---|---|---|
(99%) | |||||||||||
ER | 0% | 79% | 0% | 0% | 0% | 3% | 0% | 0% | 0% | 1% | 16% |
(94%) | |||||||||||
Giantin | 0% | 0% | 68% | 15% | 0% | 0% | 0% | 0% | 1% | 0% | 16% |
(81%) | |||||||||||
GPP130 | 0% | 0% | 12% | 70% | 1% | 1% | 1% | 0% | 1% | 0% | 14% |
(82%) | |||||||||||
LAMP2 | 0% | 0% | 4% | 1% | 57% | 0% | 1% | 0% | 6% | 0% | 30% |
(81%) | |||||||||||
Mito. | 0% | 5% | 2% | 0% | 1% | 71% | 0% | 0% | 1% | 2% | 20% |
(88%) | |||||||||||
Nucleolin | 0% | 0% | 0% | 2% | 0% | 0% | 90% | 0% | 0% | 0% | 7% |
(97%) | |||||||||||
Actin | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 92% | 0% | 2% | 6% |
(98%) | |||||||||||
TfR | 0% | 1% | 0% | 0% | 15% | 1% | 0% | 0% | 49% | 1% | 33% |
(73%) | |||||||||||
Tubulin | 0% | 2% | 0% | 0% | 0% | 4% | 0% | 0% | 2% | 69% | 23% |
(90%) |
The performance of the kNN classifier using the 37 best features is again similar to that obtained with all features (see Table 3.14). The overall performance is 4-5% better, with a significant increase in the classification rate for actin patterns. >>>>
Based on both the BPNN and kNN results, it is possible to conclude that the first 37 features returned by stepwise discriminant analysis are a better feature set than the complete 84 feature set tested above. This conclusion is based not so much on the overall classification rate, which is only slightly improved with 37 features, but rather on the decrease in the total number of features used for classification. It is known [10, p. 95] that the ``curse of dimensionality'' is a real effect and it is therefore desirable to reduce the dimensionality of a feature based classification problem whenever possible. These results indicate that reducing the dimensionality of this problem is, in fact, beneficial. >>>>
>>>>
True | Output of the Classifier | ||||||||||
Classification | DNA | ER | Giant. | GPP | LAMP | Mito. | Nucle. | Actin | TfR | Tubul. | Unk. |
DNA | 97% | 1% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 2% |
---|---|---|---|---|---|---|---|---|---|---|---|
(99%) | |||||||||||
ER | 0% | 84% | 0% | 0% | 1% | 4% | 0% | 0% | 0% | 3% | 8% |
(91%) | |||||||||||
Giantin | 0% | 1% | 71% | 13% | 1% | 1% | 0% | 0% | 0% | 0% | 11% |
(81%) | |||||||||||
GPP130 | 0% | 0% | 15% | 69% | 5% | 0% | 1% | 0% | 2% | 0% | 8% |
(74%) | |||||||||||
LAMP2 | 0% | 1% | 3% | 2% | 58% | 1% | 2% | 0% | 8% | 1% | 23% |
(76%) | |||||||||||
Mito. | 0% | 11% | 2% | 0% | 3% | 67% | 0% | 2% | 1% | 9% | 6% |
(71%) | |||||||||||
Nucleolin | 0% | 0% | 2% | 2% | 3% | 0% | 90% | 0% | 0% | 0% | 3% |
(93%) | |||||||||||
Actin | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 91% | 0% | 5% | 4% |
(95%) | |||||||||||
TfR | 0% | 4% | 1% | 1% | 19% | 5% | 0% | 7% | 33% | 10% | 21% |
(42%) | |||||||||||
Tubulin | 0% | 5% | 0% | 1% | 1% | 8% | 0% | 4% | 1% | 67% | 12% |
(77%) |
>>>>