Report on Image Databases and Image Features from the 12th Cytometry Development Workshop held October 18-21, 2002 at Asilomar Conference Grounds, Pacific Grove, California

The workshop participants discussed a number of issues relating to specifications for image databases and image features. One breakout session focused specifically on image database schemas and image feature descriptions. A summary and recommendations are provided below.

Image Database Schemas

The participants discussed the schemas for PSLID (Protein Subcellular Location Image Database), an image database developed by the Murphy group based on their experience with image interpretation and retrieval by pattern analysis and which has been discussed extensively at previous Cytometry Development Workshops, and OME (Open Microscopy Environment), developed by Jason Swedlow, Ilya Goldberg, and Peter Sorger. PSLID is described in K. Huang, J. Lin, J.A. Gajnak, and R.F. Murphy (2002) Image Content-based Retrieval and Automated Interpretation of Fluorescence Microscope Images via the Protein Subcellular Location Image Database. Proc 2002 IEEE Intl Symp Biomed Imaging (ISBI 2002), pp. 325-328 (available as a PDF file) and the FMAS database schema used by PSLID is available at http://murphylab.web.cmu.edu/services/FMAS/. OME is described at http://openmicroscopy.org/. The participants applauded the past efforts of both groups and ongoing efforts to attempt to merge these two schemas.

Issues critical to success of OME with respect to image retrieval were identified. The following recommendations were made:

Feature Semantics

The participants spent a significant amount of time discussing mechanisms for describing and classifying image features per se. The desirability of retrieval or classification of images from diverse sources (including collections of image databases distributed across many sites) requires the ability to query for specific types of features, but not all sites may have implemented or calculated all features for all images. Requiring each remote site to calculate one or more features on the fly for all of its images in order to satisfy a query (or transmit all images so that those features can be calculated by the query source) was considered unfeasible. It was proposed that b by a program to be uniquely identified but also be grouped into hierarchies so that retrieval processes can specify what features can satisfy the query. For example, a number of different programs might be used to find objects in an image and calculate average properties (e.g., average object size) that are features of the image as a whole. A hierarchical grouping of these features could be created that captures the differences and similarities between them. For example, strings of the form [a.b.c.d.e] could be created where a denotes the general class (e.g., the feature involves area), b denotes a more specific class (e.g., the feature involves object measurements), c denotes a specific method (e.g., the objects are found using Ridler-Calvard thresholding), d denotes a more specific method (e.g., object intensity is found by summing 8-connected neighbors), and e denotes a specific program. A program wanting to search in general for images with objects of a certain average size could request [a.b.*] while a specific pattern classifier might request [a.b.c.d.*]. The number of sublevels could vary at each level. Creating such a hierarchy will require a considerable effort but would provide a valuable capability for the future.