next up previous contents
Next: Discussion Up: Results Previous: Classification of Images at

Classification of Sets of Images

The results described above for classification of single cells based on protein localization patterns are very good given the high degree of heterogeneity within the individual classes. Some may question the results, however, based on pattern recognition results from other fields in which classification rates approach 100%. It should be reiterated that the primary goal of this work is not the maximization of classification rates for these 10 classes, but rather a more general approach to systematic description of protein localization. That being said, there are biological applications of this work in which one would want the classification rate to be as high as possible. The primary example is experiments involving screening for cells expressing a particular protein localization pattern. Specifically, an investigator may have a population of cells in which each cell has a different protein labeled and want to identify only those cells for which the labeled protein is in the nucleus. Another example would be an experiment in which there are many populations of cells, each of which has been grown under different conditions and the goal is to distinguish those populations in which a particular protein is found in the Golgi from those in which that protein is in the endoplasmic reticulum. Fortunately, the nature of many such experiments allows the methods described above to be used in a new way to achieve effective classification rates very near 100%

Improvements in classification can be achieved by assigning a classification to sets of homogeneously prepared cells. Groups of cells that have been subject to the same preparation procedures (i.e., they were in the same petri dish throughout the experiment) can be assumed to belong to the same class for the purposes of assessing protein localization. In fact, the classes of data used for this project were determined in this way. Since biological experiments are frequently done using populations rather than individual cells to represent each set of conditions under study, classifying the populations is a valid approach to analyzing the experiment. Measurements can therefore be made on each of the cells in a particular population with the assumption that all such cells from that population belong to the same class in terms of protein localization.

To test this method experimentally, the same networks trained and tested for single cell classification (above) were used to classify random sets of 10 images from single classes of the test data. The entire set was then assigned the class to which a plurality of its 10 constituents were assigned. If no single class constituted a plurality then the set was classified as unknown. Starting with the random selection of 10 test samples, this procedure was repeated 1000 times for each network being tested. Since there were 10 networks trained for each set of features, there were 10,000 sets of images classified for each of the confusion matrices included below.

As with the single cell classification results, the first trials of set classification were conducted with all 84 available features. The results for non-thresholded and thresholded network outputs are summarized in Tables 3.21 and 3.22 respectively. The first comment that can be made about these results is that the classification rates are much better than those for the corresponding single cell trials (see Tables 3.4 and 3.5, for example). With regard to the non-thresholded network results (Table 3.21), there are two interesting phenomena. First, there are very few cases of confusion between classes. Sets of GPP130 images are infrequently confused as giantin (1%), LAMP2 is rarely recognized as transferrin receptor (3%) and in the most significant error, transferrin receptor is confused with LAMP2 (10%). These now relatively minor error rates correspond to classes that caused the most significant confusion in the single cell classification results. The most significant errors made with single cells have all been reduced if not eliminated with the set classification method. The second phenomenon that can be observed in Table 3.21 is that there are very few sets assigned to the unknown class. Again, the classes that produce unknown results are the same frequently confused classes described in the single cell classification results (giantin and GPP130, and LAMP2 and transferrin receptor).


  
Table 3.21: The performance of BPNNs and plurality rule for classifying sets of 10 images using all features. Each set of images was assigned a single classification based on the class to which a plurality of its members were assigned. The average performance of all networks over all sets is 97% (variance over the 10 networks was 0.02), and the performance for non-unknown sets is 98% (variance of 0.03). The percentages of non-unknown samples that were classified correctly are included in parentheses. ( 19990527)
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul. Unk.
                       
DNA 100% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
  (100%)                    
ER 0% 99.9% 0% 0% 0% 0% 0% 0% 0% 0% 0%
    (99.9%)                  
Giantin 0% 0% 99% 0% 0% 0% 0% 0% 0% 0% 1%
      (99.7%)                
GPP130 0% 0% 1% 98% 0% 0% 0% 0% 0% 0% 1%
        (99%)              
LAMP2 0% 0% 0% 0% 95% 0% 0% 0% 3% 0% 3%
          (97%)            
Mito. 0% 0% 0% 0% 0% 100% 0% 0% 0% 0% 0%
            (100%)          
Nucleolin 0% 0% 0% 0% 0% 0% 100% 0% 0% 0% 0%
              (100%)        
Actin 0% 0% 0% 0% 0% 0% 0% 100% 0% 0% 0%
                (100%)      
TfR 0% 0% 0% 0% 10% 0% 0% 0% 82% 0% 8%
                  (89%)    
Tubulin 0% 0% 0% 0% 0% 0% 0% 0% 0% 99.5% 0%
                    (99.8%)  


  
Table 3.22: The performance of BPNNs with thresholded outputs and plurality rule for classifying sets of 10 images using all features. Each set of images was assigned a single classification based on the class to which a plurality of its members were assigned. The average performance of all networks over all sets is 90% (variance over the 10 networks was 0.16), and the performance for non-unknown sets is 99% (variance of 0.02). The percentages of non-unknown samples that were classified correctly are included in parentheses. ( 19990527)
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul. Unk.
                       
DNA 100% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
  (100%)                    
ER 0% 99% 0% 0% 0% 0% 0% 0% 0% 0% 1%
    (100%)                  
Giantin 0% 0% 98% 0% 0% 0% 0% 0% 0% 0% 2%
      (99.8%)                
GPP130 0% 0% 0% 96% 0% 0% 0% 0% 0% 0% 4%
        (99.5%)              
LAMP2 0% 0% 0% 0% 71% 0% 0% 0% 0% 0% 29%
          (99.7%)            
Mito. 0% 0% 0% 0% 0% 94% 0% 0% 0% 0% 6%
            (100%)          
Nucleolin 0% 0% 0% 0% 0% 0% 100% 0% 0% 0% 0%
              (100%)        
Actin 0% 0% 0% 0% 0% 0% 0% 100% 0% 0% 0%
                (100%)      
TfR 0% 0% 0% 0% 6% 0% 0% 0% 60% 0% 34%
                  (92%)    
Tubulin 0% 0% 0% 0% 0% 0% 0% 0% 0% 83% 17%
                    (100%)  

The results for the BPNNs with thresholded outputs also have some interesting characteristics. First, the thresholded classifier provides slightly better results on non-unknown sets. These increases only raise the overall classification rate by 1%, however. The downside to this slight increase in classification is a significant increase in the number of sets classified as unknown. In the most severe increases, LAMP2 unknowns went from 3% in the unthresholded classifier to 29% with thresholding, and transferrin receptor went from 8% unknown to 34%. Because of the small increase in the classification rate (for non-unknown samples only), and the large increase in the number of samples classified as unknown for what have been shown to be confusing patterns, it is not clear that there is an advantage to using the thresholded BPNN with sets of images.

After investigating the entire feature set, the ad hoc features were used to classify sets of images. Classification was carried out as described above, and the results for non-thresholded and thresholded BPNNs are summarized in Tables 3.23 and 3.24, respectively. The overall performance of the non-thresholded BPNN is essentially the same as it was using all of the features but achieves that performance by making less errors on the LAMP2/transferrin receptor pair and more errors on the giantin/GPP130 pair.


  
Table 3.23: The performance of BPNNs and plurality rule for classifying sets of 10 images using the ad hoc features. Each set of images was assigned a single classification based on the class to which a plurality of its members were assigned. The average performance of all networks over all sets is 96% (variance over the 10 networks was 0.04), and the performance for non-unknown sets is 98% (variance of 0.01). The percentages of non-unknown samples that were classified correctly are included in parentheses. ( 19990527)
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul. Unk.
                       
DNA 100% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
  (100%)                    
ER 0% 100% 0% 0% 0% 0% 0% 0% 0% 0% 0%
    (100%)                  
Giantin 0% 0% 89% 5% 0% 0% 0% 0% 0% 0% 6%
      (95%)                
GPP130 0% 0% 3% 91% 0% 0% 0% 0% 0% 0% 6%
        (97%)              
LAMP2 0% 0% 0% 0% 99% 0% 0% 0% 0% 0% 0%
          (99.7%)            
Mito. 0% 0% 0% 0% 0% 100% 0% 0% 0% 0% 0%
            (100%)          
Nucleolin 0% 0% 0% 0% 0% 0% 100% 0% 0% 0% 0%
              (100%)        
Actin 0% 0% 0% 0% 0% 1% 0% 96% 0% 0% 2%
                (98%)      
TfR 0% 1% 0% 0% 5% 0% 0% 0% 89% 0% 6%
                  (94%)    
Tubulin 0% 0% 0% 0% 0% 0% 0% 0% 0% 99% 1%
                    (99.8%)  


  
Table 3.24: The performance of BPNNs with thresholded outputs and plurality rule for classifying sets of 10 images using the ad hoc features. Each set of images was assigned a single classification based on the class to which a plurality of its members were assigned. The average performance of all networks over all sets is 74% (variance over the 10 networks was 6), and the performance for non-unknown sets is 99% (variance of 6). The percentages of non-unknown samples that were classified correctly are included in parentheses.
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul. Unk.
                       
DNA 99.7% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
  (100%)                    
ER 0% 87% 0% 0% 0% 0% 0% 0% 0% 0% 13%
    (100%)                  
Giantin 0% 0% 72% 3% 0% 0% 0% 0% 0% 0% 26%
      (96%)                
GPP130 0% 0% 1% 77% 0% 0% 0% 0% 0% 0% 21%
        (98%)              
LAMP2 0% 0% 0% 0% 66% 0% 0% 0% 0% 0% 34%
          (100%)            
Mito. 0% 0% 0% 0% 0% 59% 0% 0% 0% 0% 41%
            (100%)          
Nucleolin 0% 0% 0% 0% 0% 0% 90% 0% 0% 0% 10%
              (100%)        
Actin 0% 0% 0% 0% 0% 0% 0% 58% 0% 0% 42%
                (99.8%)      
TfR 0% 0% 0% 0% 3% 0% 0% 0% 61% 0% 36%
                  (95%)    
Tubulin 0% 0% 0% 0% 0% 0% 0% 0% 0% 72% 28%
                    (99.9%)  

The results for the BPNN with thresholded outputs (Table 3.24) serve to demonstrate a problem with the classification of sets using the plurality rule technique. The symptoms of this problem are the relatively low classification rate for all samples (74%), and the relatively high variance of classification rates across the 10 networks, 6 vs. 0.04 for the non-thresholded results. The cause of these symptoms is that for the underlying single cell classifier, the probability of an unknown classification is very near the probability of a correct classification for several classes including LAMP2, mitochondria, transferrin receptor, and actin (see Table 3.8). When classifying sets of images with these classifiers, there are a large number of sets that end up in the unknown category, as reflected in Table 3.24. If the results from the 10 BPNNs are looked at individually, one finds that some of the BPNNs are unable to reliably classify sets from particular classes because their probability of an unknown classification for that class is, in fact, higher than the probability of making a correct classification. Although none of the results presented here demonstrate it, this phenomenon would clearly hold true for any BPNN for which the probability of a correct classification is near the probability of any other classification for a particular set. This problem would be present in all of the results if the confusion between giantin and GPP130 or between LAMP2 and transferrin receptor were more severe than it already is.

As a final investigation into classification of sets, the BPNNs trained for single cell classification with the 37 best features were tested. The results are summarized as confusion matrices in Tables 3.25 and 3.26. These data follow some of the same trends noted with other feature sets. First of all, there is the significant increase in the overall classification rate as well as the rates for individual classes. Comparing the non-thresholded results (Table 3.25) with the corresponding single cell results (Table 3.12), there is also a decrease in the confusion between the troublesome classes (giantin/GPP130, LAMP2/transferrin receptor). Although the results from thresholded BPNNs have been shown to be problematic above, the results from the best features are not as suspect. The low classification rate and high variance between the 10 BPNNs demonstrated for the ad hoc features is avoided here because the underlying single cell classifiers (Table 3.13) are able to discriminate each class more effectively (i.e., there are less samples placed in the unknown category). The questionable trade-off still exists with the thresholded results as a small increase in overall classification rate comes at the expense of large increases in the number of unknowns. Once again this is most prominent for LAMP2 and transferrin receptor.


  
Table 3.25: The performance of BPNNs and plurality rule for classifying sets of 10 images using the 37 best features as determined using stepwise discriminant analysis. Each set of images was assigned a single classification based on the class to which a plurality of its members were assigned. The average performance of all networks over all sets is 98% (variance over the 10 networks was 0.02), and the performance for non-unknown sets is 99% (variance of 0.01). The percentages of non-unknown samples that were classified correctly are included in parentheses. ( 19990608)
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul. Unk.
                       
DNA 100% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
  (100%)                    
ER 0% 100% 0% 0% 0% 0% 0% 0% 0% 0% 0%
    (100%)                  
Giantin 0% 0% 98% 0% 0% 0% 0% 0% 0% 0% 1%
      (99.5%)                
GPP130 0% 0% 0% 99% 0% 0% 0% 0% 0% 0% 1%
        (99.7%)              
LAMP2 0% 0% 0% 0% 97% 0% 0% 0% 1% 0% 2%
          (99%)            
Mito. 0% 0% 0% 0% 0% 100% 0% 0% 0% 0% 0%
            (100%)          
Nucleolin 0% 0% 0% 0% 0% 0% 100% 0% 0% 0% 0%
              (100%)        
Actin 0% 0% 0% 0% 0% 0% 0% 100% 0% 0% 0%
                (100%)      
TfR 0% 0% 0% 0% 6% 0% 0% 0% 88% 0% 6%
                  (93%)    
Tubulin 0% 0% 0% 0% 0% 0% 0% 0% 0% 99.9% 0%
                    (100%)  


  
Table 3.26: The performance of BPNNs with thresholded outputs and plurality rule for classifying sets of 10 images using the 37 best features as determined using stepwise discriminant analysis. Each set of images was assigned a single classification based on the class to which a plurality of its members were assigned. The average performance of all networks over all sets is 90% (variance over the 10 networks was 0.3), and the performance for non-unknown sets is 99% (variance of 0.02). The percentages of non-unknown samples that were classified correctly are included in parentheses. ( 19990608)
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul. Unk.
                       
DNA 100% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
  (100%)                    
ER 0% 99% 0% 0% 0% 0% 0% 0% 0% 0% 1%
    (100%)                  
Giantin 0% 0% 95% 0% 0% 0% 0% 0% 0% 0% 5%
      (99.6%)                
GPP130 0% 0% 0% 94% 0% 0% 0% 0% 0% 0% 6%
        (99.6%)              
LAMP2 0% 0% 0% 0% 73% 0% 0% 0% 0% 0% 27%
          (99.9%)            
Mito. 0% 0% 0% 0% 0% 92% 0% 0% 0% 0% 8%
            (100%)          
Nucleolin 0% 0% 0% 0% 0% 0% 100% 0% 0% 0% 0%
              (100%)        
Actin 0% 0% 0% 0% 0% 0% 0% 100% 0% 0% 0%
                (100%)      
TfR 0% 0% 0% 0% 3% 0% 0% 0% 63% 0% 34%
                  (96%)    
Tubulin 0% 0% 0% 0% 0% 0% 0% 0% 0% 89% 11%
                    (100%)  


next up previous contents
Next: Discussion Up: Results Previous: Classification of Images at
Copyright ©1999 Michael V. Boland
1999-09-18