next up previous contents
Next: Applications Up: Conclusions Previous: Impact of this Work

Future Work

As is the nature of scientific investigation, this work, while answering some questions, also provides a basis for the future. In fact, improvements could be made to each of the major steps used here to describe protein localization patterns: image collection, image processing, feature extraction, and classification.

Based on the results of classifying reduced magnification images in Section 3.3.7 (p. [*]), it is clear that all future image data should be collected on microscope hardware that allows for proper sampling. Satisfying the Nyquist criterion will guarantee that the image data contain the most spatial information possible. At the same time, future investigators should consider the collection of 3-dimensional data sets. Although the cultured cells used for this work are nearly 2-dimensional, there is certain to be additional information contained in the third dimension. More importantly, 3-D data will be important for the generalization of this approach to cell types that have more defined 3-D structure.

A second improvement that should be made to the image collection phase of this work is the inclusion, in some fashion, of the cell boundary. This information would be useful in helping to ``normalize'' the features such that a particular localization pattern in a more-or-less triangular cell could be better compared to that same pattern in a more-or-less circular cell. The cell boundary itself could be obtained using a transmitted-light image of the cell, or by using a fluorescent label targeted to the cell surface.

Also with regard to image collection, future data should include as many classes of localization patterns as possible. If systematic description of protein localization is to be accepted as a biologically useful technique, it will have to continue to be applied to larger and larger numbers of proteins.

One improvement to the image processing phase would be the use of truly 3-D data. To maximize the usefulness of the third dimension there will have to be investigation of methods for removing out-of-focus fluorescence from each image plane. Assuming that one of these methods is computational deconvolution, techniques other than nearest neighbor deconvolution should be studied. Nearest neighbor deconvolution, while computationally relatively straightforward, is not as good at removing out-of-focus fluorescence as other methods, notably expectation maximization (EM). EM is an iterative method that produces better results than the nearest-neighbor method, but at the expense of much longer computation time. If and when 3-D data are available, it would be useful to compare classification rates for different deconvolution methods.

Although the features calculated and tested thus far have been shown to be useful descriptors of protein localization, they by no means represent a complete sampling of possible features. Only through continued development of new features, preferably biologically motivated, and the application of those features to a common set of data will the ``best'' descriptors of protein localization be defined.

With respect to classification of localization patterns, now that a reasonable set of pattern descriptors (features) has been implemented and tested with common classification techniques, it is reasonable to consider using those features with more esoteric and application-specific classifiers. Such work will likely be done with the goal of maximizing the rate of correct classification for methods used in screening biological samples during an experiment, particularly where the cost of making a mistake is high.

Finally, there is useful work that can be done after classification. Specifically, it would be interesting to determine which features are able to distinguish the various classes from one another. The benefits of such analysis would be twofold. First, by determining which of the biologically motivated features were responsible for distinguishing a particular pair of classes, one might glean some insight into the underlying biology of the corresponding proteins. Second, by finding out which of the non-biologically motivated features (e.g., Zernike moments, Haralick features) were responsible for discriminating a particular pair of classes, one might better understand what kind of useful biological information those features are capturing.

next up previous contents
Next: Applications Up: Conclusions Previous: Impact of this Work
Copyright ©1999 Michael V. Boland