next up previous contents
Next: Classification of Sets of Up: Results Previous: Classification Without a DNA

   
Classification of Images at Lower Resolution

One way in which the current images are deficient is that they are all significantly undersampled by the CCD camera. As with all undersampling, the result is aliasing in the sampled image of high frequency information into low frequencies. The net result is distortion of the image with respect to the actual microscope output. Given that the microscope used to collect these images can resolve objects as small as 0.2 $\mu$m (see Section 1.6, p. [*]), the Nyquist sampling theorem dictates that for all spatial information to be retained, the microscope output must be sampled such that each sample represents no more than one-half of the minimum spatial resolution - 0.1 $\mu$m (0.2 $\mu$m $\div$ 2) - at the specimen. Since the CCD camera on the microscope has 23 $\mu$m pixels, a magnification of 230 or greater would be required to achieve Nyquist sampling. Unfortunately this high magnification is not practical with the microscope used here. Even using an objective with a magnification of 100, as with all of the images produced for this work, the longest dimension of a typical HeLa cell extends across most of the field of view. For properly sampled images to be acquired in future work, the microscope camera will have to be upgraded to have both more and smaller pixels so that the image magnification can be increased.

To gain some insight into how much this problem affects the classification of these 10 patterns, the 37 best features (see Section 3.3.5) were calculated for images that were scaled by half, as if they had been collected at a magnification of 50 rather than 100. Image scaling was accomplished using the Matlab imresize command with the bilinear interpolation option. The features calculated using the reduced magnification images were then used as inputs to BPNN and kNN classifiers. The results from the BPNN classifiers are summarized in Tables 3.18 and 3.19. The most significant aspect of the BPNN results is the large decrease in the ability of the classifier to discriminate giantin from GPP130. This particular decrease in performance is not unexpected as both of these proteins are found in the same small organelle. Apparently the decreased magnification of the images has served to diminish whatever subtle distinctions existed between the giantin and GPP130 patterns at the higher magnification. The BPNN classifiers also show a decrease in the discrimination of LAMP2 and transferrin receptor, although not as large as that for giantin and GPP130. This effect of lowered magnification is also not unexpected given the significant confusion that exists between LAMP2 and transferrin receptor in earlier results. Furthermore, it is important to note that, in the case of the BPNN with thresholding of outputs (Table 3.19), the number of giantin, GPP130, LAMP2 and transferrin receptor samples that are classified as unknown is increased. For example, not only is the classification rate for giantin down more than 20 percentage points, but the number of giantin samples classified as unknown are up by more than 20 percentage points. Similar comments apply to GPP130, LAMP2 and transferrin receptor. Finally, the results obtained with the kNN classifier (Table 3.20) follow the pattern of the BPNN data, and also show decreased recognition of the ER and nucleolin patterns.

Taken together, these results indicate that image resolution is an important variable in the ability of numerical features to capture discriminatory information about protein localization patterns. Although not all of the patterns presented problems at the lower magnification, those patterns that did were the ones intentionally included to produce confusion. This supports the hypothesis that classification performance on future data sets could be improved by ensuring proper sampling of the microscope output.


  
Table 3.18: Performance on the low-resolution test data of a BPNN using the 37 best features (as determined using stepwise discriminant analysis on the unmodified data set) and no thresholding of the network outputs. The average rate of correct classification is $78\pm5\%$ (mean $\pm$ 95% CI) with a variance of 2.7 across all 10 networks. The average performance on the training data is $91\%$ with a variance of 3.3. ( 19990604)
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul.
                     
DNA 100% 0% 0% 0% 0% 0% 0% 0% 0% 0%
ER 0% 86% 0% 0% 5% 3% 0% 0% 1% 5%
Giantin 0% 0% 60% 30% 6% 0% 4% 0% 0% 0%
GPP130 0% 0% 26% 68% 3% 1% 1% 0% 0% 0%
LAMP2 0% 3% 9% 2% 65% 1% 2% 0% 16% 1%
Mito. 0% 8% 1% 0% 2% 78% 0% 2% 7% 4%
Nucleolin 1% 1% 2% 0% 1% 0% 95% 0% 0% 1%
Actin 0% 0% 0% 0% 0% 2% 0% 93% 1% 4%
TfR 0% 4% 4% 1% 24% 5% 1% 2% 56% 5%
Tubulin 0% 4% 0% 1% 1% 7% 0% 2% 4% 81%


  
Table 3.19: Performance on the low-resolution test data of a BPNN using the 37 best features (as determined using stepwise discriminant analysis on the original images) and with thresholding of the network outputs. The average rate of correct classification is $65\pm5.8\%$ (mean $\pm$ 95% CI) with a variance of 87 for all samples and $83\%$ (variance of 26) for samples that are not classified as unknown (average of values in parentheses, below). The average performance on the corresponding training data is $78\pm4.1\%$ with a variance of 140 for all samples and $95\%$ (variance of 3.8) for those not placed in the unknown category. The percentages of non-unknown samples that were classified correctly are included in parentheses. ( 19990604)
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul. Unk.
                       
DNA 100% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
  (100%)                    
ER 0% 72% 0% 0% 3% 0% 0% 0% 0% 1% 24%
    (95%)                  
Giantin 0% 0% 37% 16% 4% 0% 2% 0% 0% 0% 41%
      (62%)                
GPP130 0% 0% 15% 50% 2% 0% 0% 0% 0% 0% 32%
        (74%)              
LAMP2 0% 0% 3% 0% 37% 0% 1% 0% 8% 0% 50%
          (73%)            
Mito. 0% 2% 0% 0% 2% 70% 0% 1% 4% 3% 18%
            (85%)          
Nucleolin 0% 0% 1% 0% 0% 0% 88% 0% 0% 1% 11%
              (98%)        
Actin 0% 0% 0% 0% 0% 1% 0% 88% 0% 3% 9%
                (96%)      
TfR 0% 0% 1% 0% 15% 3% 0% 1% 41% 1% 38%
                  (65%)    
Tubulin 0% 2% 0% 1% 0% 3% 0% 2% 1% 68% 24%
                    (89%)  


  
Table 3.20: Performance of a kNN classifier on the low magnification image test data using the 37 best features as determined using stepwise discriminant analysis on the unscaled data set. The average rate of correct classification is $64\pm5.8\%$ (mean $\pm$ 95% CI) with a variance of 4.4 across all 10 classifiers for all samples and $75\%$ (variance of 1.9) for those samples not classified as unknown. The percentages of non-unknown samples that were classified correctly are included in parentheses. ( 19990608)
True Output of the Classifier
Classification DNA ER Giant. GPP LAMP Mito. Nucle. Actin TfR Tubul. Unk.
                       
DNA 99% 0% 0% 0% 0% 0% 0% 0% 0% 0% 1%
  (100%)                    
ER 3% 70% 0% 0% 4% 2% 0% 0% 0% 8% 12%
    (79%)                  
Giantin 0% 0% 43% 21% 5% 0% 3% 0% 1% 0% 26%
      (59%)                
GPP130 0% 0% 23% 48% 3% 0% 1% 0% 1% 0% 23%
        (63%)              
LAMP2 0% 3% 9% 3% 54% 2% 1% 0% 10% 0% 18%
          (66%)            
Mito. 0% 4% 1% 0% 1% 71% 0% 2% 2% 7% 12%
            (81%)          
Nucleolin 1% 1% 3% 2% 4% 0% 80% 0% 2% 0% 8%
              (87%)        
Actin 0% 0% 0% 0% 0% 2% 0% 86% 0% 6% 7%
                (92%)      
TfR 0% 2% 0% 1% 21% 9% 0% 8% 25% 9% 24%
                  (33%)    
Tubulin 0% 5% 0% 2% 0% 3% 0% 5% 5% 67% 12%
                    (76%)  


next up previous contents
Next: Classification of Sets of Up: Results Previous: Classification Without a DNA
Copyright ©1999 Michael V. Boland
1999-09-18