The goal of the work done using the five-class data set described below was to conduct the initial development of methods that allow for the numerical description and subsequent classification of the patterns characteristic of subcellular structures in fluorescence microscope images of eukaryotic cells. To this end, various numerical features were investigated and some were implemented and used as inputs to standard pattern classifiers. >>>>
The extensive literature on pattern recognition describes its application to a wide variety of systems, but only sporadically to automated microscope image analysis. While the screening of Pap smears [19] has received significant attention from the pattern recognition community, the goal of recognizing potentially cancerous cells in a background of normal tissue stained with hematoxylin and eosin is inherently different from the problem of identifying a fluorescence pattern as being from one of a number of distinct classes. >>>>
In considering various pattern recognition applications as a starting point for classifying protein localization patterns, a parallel in the field handwritten character recognition was encountered. The problems are similar, in that while there are distinct classes of images (numbers and letters, organelle-specific localization patterns) there is also considerable variability within each class (individual versions of the number "2" can be quite different, the appearance of the Golgi apparatus varies from cell to cell). Approaches that can recognize individual handwritten characters have been described [20,21]; initial work was therefore modeled on character recognition and other approaches were subsequently incorporated. >>>>
The nature of the image data generated for this project helped to focus the choice of which numeric features were suitable for describing protein localization patterns. First, because cultured cells are fairly heterogeneous in terms of their morphology, any features chosen had to be invariant to the translation and rotation of the patterns within the field of view. A second limitation imposed by the nature of the problem is that the features used should not be tailored to a particular set of localization patterns. Since a long-term goal is the ability to describe as many cellular protein localization patterns as possible, it was important to find or design features that are `generic' in that they are useful for describing a variety of patterns. >>>>
Unlike the feature selection process, there were no clear restrictions on the choice of classifiers. Since a long-term, subsidiary goal of this project is a more quantitative, systematic understanding of protein localization patterns, a classifier that could be easily interpreted was preferable to one that was more like a ``black box''. That being said, however, a more immediate goal was to demonstrate that protein localization patterns could be described quantitatively and then reliably and automatically recognized. A high rate of correct classification was therefore deemed a more important criterion than interpretability in terms of classifier selection. Finally, it was decided that the classifiers to be investigated should be chosen from among those that have been previously well described and tested. By avoiding the development of an application-specific classifier at this point, it was possible to focus on the goals of description and classification. >>>>
>>>>