Extracting and Classifying Fluorescence Microscope Images of Cells from Online Journals

Jie Yao, graduate student at the Center for Automated Learning and Discovery
Meel Velliste, graduate student in Biomedical Engineering


We are interested in creating a self-populating knowledge base that can extract and store assertions about protein subcellular location from published literature in an automated manner. Such kind of knowledge base can serve not only as a resource for biologists but also as a test bed for knowledge reasoning systems that can generate new hypotheses under uncertainty.


As a starting point, we have developed an automated system to find fluorescence microscope images from on-line journal articles. Our system includes:

  1. web robot to download articles from PubMed matching a keyword query,
  2. tool to extract figures and captions from PDF files,
  3. algorithm for splitting figures into individual panels,
  4. program for distinguishing fluorescence microscope images from other types of images,
  5. program to find scale information from the images and corresponding captions,
  6. tool to remove annotations (such as characters and arrows) from the fluorescence microscope images,
  7. segmentation program to isolate individual cells from images containing multiple cells,
  8. classifier that can rank the returned images by their "likelyhood" of belonging to a particular location class


Evaluation of each of the parts of this system revealed good precision (number of correct results out of all results returned) and reasonable recall (number of correct results out of all possible correct results). To demonstrate the usefulness of the system a search was performed using the keyword "Tubulin". 8 out of the top 10 images returned were actually images of tubulin.


When combined with utilities that extract assertions from figure captions and body text, this fully automated online image extractor will provide a truly useful tool for harnessing the vast amounts of information about protein subcellular location available in online journal articles.


R. F. Murphy, M. Velliste, J. Yao, and G. Porreca (2001). Searching Online Journals for Fluorescence Microscope Images Depicting Protein Subcellular Location Patterns. Proc IEEE Int Symp Bio-Informat Biomed Eng (BIBE 2001) 2; pp. 119-128. [ PDF Reprint ]

