Automated Image Analysis of Protein Localization in Budding Yeast
The material in this page supplements our paper to be presented at ISMB/ECCB 2007:
Shann-Ching Chen, Ting Zhao, Geoffrey J. Gordon and Robert F. Murphy.
"Automated Image Analysis of Protein Localization in Budding Yeast,"
Bioinformatics (2007), in press
MOTIVATION: The Yeast Saccharomyces cerevisiae is the
first eukaryotic organism to have its genome completely sequenced.
Since then, several large-scale analyses of the yeast genome have
provided extensive functional annotations of individual genes and
proteins. One fundamental property of a protein is its subcellular
localization, which provides critical information about how this
protein works in a cell. An important project therefore was the
creation of the yeast GFP fusion localization database
by the University of California, San Francisco (UCSF). This database
provides localization data for 75% of the proteins believed to be
encoded by the yeast genome. These proteins were classified into 22
distinct subcellular location categories by visual examination. Based
on our past success at building automated systems to classify
subcellular location patterns in mammalian cells, we sought to create a
similar system for yeast.
RESULTS: We developed computational methods to automatically analyze the images created by the UCSF yeast GFP fusion localization project.
The system was trained to recognize the same location categories that were used in that study.
We applied the system to 2640 images, and the system gave the same label as the previous
assignments to 2139 images (81%). When only the highest confidence assignments were considered,
94.7% agreement was observed. Visual examination of the proteins for which the two approaches
disagree suggests that at least some of the automated assignments may be more accurate.
The automated method provides an objective, quantitative, and repeatable assignment of
protein locations that can be applied to new collections of yeast images
(e.g., for different strains or the same strain under different conditions).
It is also important to note that this performance could be achieved without
requiring colocalization with any marker proteins.
Three files are available for download:
- Supplement 1: Web Page Tab-delimited text
- List of 501 proteins whose label from visual assignment differs from that by automated classification.
- Supplement 2: Web Page Tab-delimited text
- List of computer-assigned labels for 156 proteins within the ambiguous category.
- Supplement 3: Web Page Tab-delimited text
- List of computer-assigned labels for 72 proteins within the punctate_composite category.
Last modified: June 4, 2007