next up previous contents
Next: Conclusions Up: A Ten Class Problem Previous: Classification of Sets of

Discussion

While the results from Chapter 2 demonstrated that the general paradigm of pattern recognition could be used to quantitatively describe protein localization patterns, the new results described in this chapter serve to provide better characterization of both the problem and the approach. Specific experiments were targeted at assessing the ability of various feature sets to correctly classify the images, at comparing classification methods, and at leveraging the ability to collect images of sets of homogeneously prepared cells in order to improve the overall classification rate.

After using all available features (Zernike, Haralick, ad hoc) to establish a baseline in terms of classification rate, several trials were conducted with other feature sets. The classification results from the ad hoc features only, while not quite as good as those provided by all features, do support the contention that biologically motivated features can be useful in the description of protein localization patterns. On the other hand, these same results indicate that the Zernike and Haralick features are useful complements to the ad hoc features, and should also be retained as valid descriptors of protein localization.

The results presented above also show that it is possible to obtain classification performance at least as good as that using all features by selecting a subset of those features. The 37 top features selected by stepwise discriminant analysis can be considered the best feature set available thus far because the resulting classification is better than the result for all features and because that result is obtained with a lower dimension feature set. At this early point in the effort to develop methods for describing protein localization, these 37 features represent the best basis for such descriptions. Interestingly, there is almost an equal contribution from each feature set to the 37 best features with 14 ad hoc, 12 Haralick and 11 Zernike features among them. Since the Haralick and Zernike features lack the interpretability of the ad hoc features however, further study may be needed to better understand what information they are capturing and how it relates to the underlying biology.

As a subset of each of the BPNN classification trials, two classification rules were applied to the BPNN outputs. Working under the assumption that the BPNN outputs are approximating Bayesian posterior probabilities [14], the first of these rules assigned a sample to the class which corresponded to the largest of the output values (i.e., the most likely class). The alternative approach was to essentially allow the classifier to say ``I don't know'' whenever a single output value was not above a threshold. In this way only those samples about which the classifier was most confident were assigned to a class. The net effect is that fewer samples are classified (some of them are unknown), but those that are classified are more frequently placed in the correct category. In the results presented here, this trade off is frequently of questionable value. The gains in classification of non-unknown samples achieved with thresholding are between 5 and 10% while the losses incurred are 17 to 30% unknown samples. Since the increase in classification rate is not clearly worth that many samples being classified as unknown, the thresholding technique can only be considered useful on a case-by-case basis. Another problem that showed up in the thresholded classifier results is related to classes for which the rate of classification as unknown was close to the rate of correct classification (e.g., Section 3.3.4). When this occurs, it indicates that the classifier is not able to reliably recognize that class and the result is that there is high variability between classifiers trained on different subsets of the data. In such cases the data presented above support the approach of simply using the highest BPNN output to assign a class.

Aside from determining the utility of BPNN thresholding, other classifier-related testing was aimed at comparing BPNNs to kNN classifiers. Although the kNN classifier has the advantage of requiring the selection of a single parameter (the number of nearest neighbors to consider), it suffers from the inability to generate complex decision boundaries like the BPNN. This deficiency turns out to be significant as the results from the kNN classifiers are consistently worse than those from the corresponding BPNNs. The kNNs are also less able to discriminate the most confusing classes (e.g., giantin/GPP130 and LAMP2/transferrin receptor). For these reasons, the BPNN is the preferred classifier.

In addition to investigations into useful feature sets and classifiers, the effect of image resolution was also considered. It is known that the existing image data are not properly sampled due to hardware limitations. To estimate the effect of the undersampling, the image data were scaled by one-half and then reclassified. While most classes could still be recognized even at the lower resolution, the classes that caused the most confusion before scaling the images were more difficult to recognize afterward. As mentioned above, the subtle differences between some pairs of classes (giantin/GPP130, and LAMP2/transferrin receptor) are further obfuscated by the decreased resolution. Since the images, as collected, would require an increase in magnification of over 2X to be properly sampled, it is reasonable to expect that these difficult classes might be better classified if sampled appropriately. This finding has clear implications for future work.

The final and perhaps most significant contribution of these results is the experimental investigation of using sets of images to provide a single classification result. Reasons justifying the classification of sets of images as a biologically valid approach were mentioned above, but the bottom line is that the nature of many biological experiments allows for entire populations of cells to receive a single classification. Integrating this knowledge into the existing classification system requires only that a voting scheme be applied to the single cell classification results from each population under study. The majority rule approach to combining the single cell classifications was discussed analytically in Section 2.3.3 (p. [*]), and plurality rule was investigated above. Given a system in which the goal is to classify populations rather than individuals, both of these methods serve to greatly increase the rate of correct classification. The only other requirement for application of these approaches is that the probability of a correct classification is greater than 50% (majority rule) or is the most likely outcome (plurality rule). While analytical expressions can readily be developed for the majority rule case (Equation 2.16 - equivalent to 2 class plurality rule), and for the 3 class plurality rule case (Equation 3.103.1), extension of plurality to more classes was not tractable. The important aspect of these formulas is that for both majority and plurality rule, the probability of a correct classification increases as the number of samples examined increases. To demonstrate the power of this technique, consider the following: even if a particular class is recognized correctly only 55% of the time, the classification assigned to 500 samples (a large but not unreasonable number) would be expected to be correct 99% of the time using majority rule. The population-based nature of many biological experiments coupled with the power of majority or plurality rule therefore helps to alleviate the typical pattern recognition requirement for near perfect single object classification.


 \begin{displaymath}\begin{array}{r}
P_{plurality}(3) =
\left[n!
\sum_{n_1=\lef...
..._2}p_3^{n-n_1-n_2}}
{n_1!n_2!(n-n_1-n_2)!} \right]
\end{array}\end{displaymath} (3.10)


next up previous contents
Next: Conclusions Up: A Ten Class Problem Previous: Classification of Sets of
Copyright ©1999 Michael V. Boland
1999-09-18