next up previous contents
Next: Classification with the ad Up: Results Previous: ad hoc Features

Classification with All Features

The first experiment undertaken with the HeLa data and the available feature sets was to see how well the classes could be discriminated using all 84 features (Zernike, Haralick, and ad hoc). As part of this endeavor, several classification trials were completed to determine the number of hidden nodes that provided the best performance. BPNNs with 5, 10, 15, 20, and 30 hidden nodes were trained and tested. The average correct classification rate for each of these networks is summarized in Table 3.3. It was decided to use 20 hidden nodes because classifier performance improved up to that point and then plateaued. To maintain consistency among the experiments described below, all BPNNs were configured with 20 hidden nodes.


  
Table 3.3: Network performance on test samples for various numbers of hidden nodes and all features (Zernike, Haralick, ad hoc).
Hidden Nodes Classification Rate
  (mean $\pm$ 95% CI)
5 $75\pm5.3\%$
10 $80\pm4.8\%$
15 $81\pm4.8\%$
20 $82\pm4.4\%$
30 $82\pm4.4\%$


The performance of a BPNN with 20 hidden nodes, using all 84 features is summarized in Table 3.4. These results were obtained using the highest network output value to classify each sample. Results were then obtained using the ``best threshold'' technique described in Section 3.2.7, and are summarized in Table 3.5. Briefly, a threshold was systematically identified for each BPNN using the stop data set before running the test data through the network. A valid classification was assigned to each test sample only if there was one and only one network output above the threshold, otherwise, the sample was considered ``unknown''. The threshold approach to classification is included because it has application to the reliable identification of single cells. By applying a threshold to the outputs of the BPNN and making an assignment of class only when a single output achieves that threshold, one has more confidence in the resulting classification. If the goal of a particular experiment is to reliably identify single cells as belonging to a particular class, it may be preferable to make a classification of ``unknown'' rather than to misclassify a cell. In such a case, thresholding the BPNN outputs is one solution.


  
Table 3.4: Average performance on the test data of 10 BPNN trials using all features and no thresholding of the network outputs. The average rate of correct classification was $81\pm4.8\%$ (mean $\pm$ 95% CI) and the variance across all 10 networks was 2.2. The average performance on the training data was $95\pm2\%$ with a variance of 2.2. Instances of confusion greater than 10% on the part of the classifier are marked with boxes.( 19990527)
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul.
                     
DNA 99% 1% 0% 0% 0% 0% 0% 0% 0% 0%
ER 0% 86% 3% 0% 0% 5% 0% 0% 0% 5%
Giantin 0% 0% 77% \fbox{19\%} 0% 1% 2% 0% 1% 0%
GPP130 0% 0% \fbox{18\%} 78% 2% 0% 2% 0% 1% 0%
LAMP2 0% 1% 3% 2% 73% 1% 2% 0% \fbox{17\%} 1%
Mito. 0% 9% 2% 0% 4% 77% 0% 0% 2% 6%
Nucleolin 2% 0% 1% 2% 1% 0% 94% 0% 0% 0%
Actin 0% 0% 0% 0% 0% 3% 0% 91% 0% 6%
TfR 0% 5% 3% 1% \fbox{25\%} 3% 0% 5% 55% 5%
Tubulin 0% 5% 0% 0% 1% 7% 1% 4% 5% 77%


The criterion used to identify a good threshold for the outputs of a BPNN is to maximize the sum of the squares of the recall and accuracy of the classifier using a particular threshold. The accuracy of a classifier is the fraction of all classification attempts (total samples minus the number of unknowns) that are successful. The recall of that classifier, on the other hand, is the fraction of all samples that are classified correctly. For example, a classifier that assigns classes to a small fraction of all samples but does so correctly will have high accuracy, but low recall. Furthermore, a classifier with perfect accuracy and perfect recall will operate at a point farther from the origin of a recall vs. accuracy plot than any other possible classifier (see Figure 3.6). The accuracy2 + recall2 criterion was therefore chosen to find the threshold that causes a particular classifier to operate as far from the origin of the recall vs. accuracy plot as possible.

The average performance of the non-thresholded classifier is acceptable (81% overall), but not entirely satisfactory for some pairs of classes. Two pairs that are frequently confused by the classifier are giantin/GPP130 and LAMP2/transferrin receptor. This result is not unexpected, however, as these are some of the same pairs that were intentionally included to create such confusion. Because the major instances of confusion can be explained biologically, they are not terribly troubling. First of all, both giantin and GPP130 reside in the same organelle (the Golgi), although apparently they do not colocalize entirely. Two possible explanations for this are: 1) they are in different subcompartments of the Golgi, and 2) one of the two proteins does completely co-localize with the other, but the second protein also resides in one or more Golgi subcompartments by itself (i.e., its localization is more extensive). The features must be able to capture such subtle differences between the two patterns.

The second set of significantly confused patterns are LAMP2 and transferrin receptor. Although these two proteins do not reside in the same compartments, their localization patterns tend to be similar (subjectively). Both of these proteins are found in vesicles throughout the cytoplasm (punctate), and tend to be concentrated in a compartment near the nucleus. Again, the features are able to capture enough information about the differences between these two visually similar patterns to distinguish them most of the time.

Finding a threshold for the network outputs and requiring that only one output exceed that threshold turns out to provide only a $\sim$5% increase in the average classification rate while also causing between 1% (DNA) and 32% (LAMP2) of the samples from any one class to be placed into the unknown category (Table 3.5). This performance increase, while not large, may none the less be useful for those experiments where correct identification of single cells is a top priority.


  
Table 3.5: Performance on the test data of 10 BPNN trials using all features and with thresholding of the network outputs. The average rate of correct classification is $72\pm5.4\%$ (mean $\pm$ 95% CI) with a variance of 2 across all 10 networks for all samples and $86\%$ (variance of 1.7%) for samples that are not classified as unknown. The average performance on the corresponding training data is $91\pm2.8\%$ with a variance of 7 for all samples and $97\%$ with a variance of 0.95 for those not placed in the unknown category. The percentages of non-unknown samples that were classified correctly are included in parentheses. ( 19990527)
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul. Unk.
                       
DNA 99% 0% 0% 0% 0% 0% 0% 0% 0% 0% 1%
  (100%)                    
ER 0% 76% 0% 0% 0% 2% 0% 0% 0% 3% 19%
    (94%)                  
Giantin 0% 0% 70% 14% 0% 1% 0% 0% 1% 0% 14%
      (81%)                
GPP130 0% 0% 13% 70% 0% 0% 1% 0% 1% 0% 14%
        (82%)              
LAMP2 0% 0% 2% 0% 55% 1% 1% 0% 9% 0% 32%
          (80%)            
Mito. 0% 7% 2% 0% 2% 68% 0% 0% 1% 5% 15%
            (80%)          
Nucleolin 0% 0% 0% 2% 0% 0% 90% 0% 0% 0% 7%
              (97%)        
Actin 0% 0% 0% 0% 0% 1% 0% 86% 0% 3% 9%
                (95%)      
TfR 0% 3% 0% 0% 17% 0% 0% 3% 45% 2% 29%
                  (63%)    
Tubulin 0% 1% 0% 0% 0% 4% 0% 1% 3% 61% 29%
                    (87%)  

In order to compare the BPNN to another classifier, all 84 features were used as inputs to a kNN classifier. Table 3.6 summarizes the performance of the 10 kNN trials (see Section 3.2.8 for details). The overall performance of the kNN classifier is not as good as either of the BPNNs (thresholded or non-thresholded outputs). While the kNN classifier produces similar performance for the DNA, giantin, and ER classes as compared to the BPNNs, it is worse at identifying most classes and is much worse at identifying the mitochondrial patterns correctly. The most common misclassifications for mitochondria are ER and Tubulin. From a biological perspective, this is expected (the ER and mitochondrial patterns were intended to be confusing) as all three patterns tend to have peri-nuclear concentrations but are also dispersed away from the nucleus. The BPNN classifiers are able to avoid this mistake, however. Even though the overall kNN performance is not significantly worse than the non-thresholded BPNN (77% vs. 81%), the inability of the kNN classifier to recognize mitochondrial and transferrin receptor patterns makes it a less desirable choice as a classifier.


  
Table 3.6: Performance on the test data of 10 kNN classifier trials using all features. The average rate of correct classification is $68\pm5.7\%$ (mean $\pm$ 95% CI) with a variance of 4.9 across all 10 classifiers for all samples and $77\%$ (variance of 3.9) for those samples not classified as unknown. The percentages of non-unknown samples that were classified correctly are included in parentheses. ( 19990607)
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul. Unk.
                       
DNA 99% 1% 0% 0% 0% 0% 0% 0% 0% 0% 0%
  (99%)                    
ER 0% 84% 0% 0% 3% 2% 0% 0% 0% 6% 4%
    (88%)                  
Giantin 0% 1% 69% 14% 1% 1% 0% 0% 0% 0% 13%
      (79%)                
GPP130 0% 1% 16% 65% 4% 0% 1% 0% 2% 0% 12%
        (74%)              
LAMP2 0% 1% 1% 3% 67% 1% 1% 0% 7% 1% 18%
          (81%)            
Mito. 0% 12% 0% 0% 2% 49% 0% 0% 5% 15% 17%
            (59%)          
Nucleolin 2% 0% 1% 2% 8% 0% 80% 0% 2% 0% 6%
              (85%)        
Actin 0% 0% 0% 0% 0% 1% 0% 72% 0% 12% 14%
                (84%)      
TfR 0% 7% 0% 1% 25% 5% 0% 4% 29% 7% 21%
                  (37%)    
Tubulin 0% 8% 0% 2% 2% 8% 0% 2% 4% 62% 13%
                    (71%)  


next up previous contents
Next: Classification with the ad Up: Results Previous: ad hoc Features
Copyright ©1999 Michael V. Boland
1999-09-18