next up previous contents
Next: Materials and Methods Up: A Ten Class Problem Previous: A Ten Class Problem

Introduction

While previous work (see Chapter 2) served to demonstrate that pattern recognition approaches could be used to describe and classify protein localization patterns, a better data set was required to test the limits of such a system. First of all, more classes were needed. While a subset of biological problems could be addressed with five classes, it was an initial goal of this project to be able to describe as many patterns as possible. It was also important to generate classes of patterns that were visually similar to see whether they could be distinguished from one another. Finally, while the Zernike and Haralick features were shown to be useful for classification, it was not possible to glean any biological insight from the classifiers because the features did not specifically capture such knowledge.

The first step in addressing these needs was to choose a new cell type. The primary problem with CHO cells is that there were not enough antibodies available against their proteins to easily expand that data set. HeLa cells were chosen as a replacement because as a human cell line commonly used in research, there are many antibodies directed against their proteins. The increased availability of antibodies simultaneously facilitated imaging of more localization patterns, and therefore patterns with similar visual appearance.

To address the concerns with the Zernike and Haralick features, new, more biologically intuitive features were designed. The primary motivation behind the development of new features was a desire to encapsulate numerically some of the subjective terms used to describe protein localization. Such subjective assessments may address the number of distinct fluorescent objects in a given cell, the arrangement of those objects with respect to one another, the sizes of the objects, the distances of the objects to the nucleus, and the fraction of the entire cell that is occupied by fluorescence, for example. New features described below capture such information.

Even though the previous results obtained with the 5-class CHO data are considered adequate, both because of the heterogeneity within each class of the data and because they represent an important proof-of-concept, it is desirable to improve upon the classification rates. One way to accomplish this, without resorting to feature sets and classifiers that are designed specifically for a given set of images and thereby sacrificing the generalizability of the approach, is to consider several identically prepared samples at the same time. Fortunately, this work has application to a variety of experiments that could be configured to present sets of images to a classifier rather than just a single image. Whereas the analysis of set classification in Section 2.3.3 (p. [*]) was strictly analytical, it is explored experimentally below.


next up previous contents
Next: Materials and Methods Up: A Ten Class Problem Previous: A Ten Class Problem
Copyright ©1999 Michael V. Boland
1999-09-18