Photography Media Journal
ISSN 1918-8153


Home|
PhotographyMedia.com
Blog|Journal | Gallery|Contact|Site map|About 




Print version (full-text)

Image Indexing

by: Tomasz Neugebauer

March 2005


page: 3 of 3

previous page| [ go to page 1]

Indexing for Retrieval

In image database retrieval systems, the most common method of indexing and retrieval is the establishment of “fields containing bibliographic data (artist, photographer, title, date, etc.) together with a field containing some descriptive text to support the image field.” (Baxter 68) Shatford argues that although different image attributes need to be indexed depending on the type of collection and users, the general categories of: biographical, subject, exemplified and relationship attributes can be used as a checklist. (Shatford 1994: 584) The biographical attributes include those about the history of the image’s creation such as creator attributes, creation date, as well as its “travels […] Where it is now, where it has been, who has owned it, how much it costs or has cost, whether it has been altered in any way.” (Shatford 1994: 584) Exemplified attributes are distinct from the subject: they describe the visual object as an instance-of: an etching, photograph, poster or an 8-bit GIF. Relationship attributes are those between different images (for example, a digital image of a class hierarchy expressed in Universal Modeling Language and an image of the Graphical User Interface that is an implementation of that hierarchy), or between images and texts (e.g.: the list of functional requirements that were used to create the class hierarchy.)

Jörgensen summarizes research in cognitive psychology suggesting that users search for objects in images at a ‘Basic Level’, which is “neither the most specific nor the most abstract level but is rather an intermediate level” and carries out studies that confirm this (305). The layers and levels of indexing of images inevitably vary with each collection, its audience and purpose. Schroeder describes the General Motors Media Archive (GMMA) project dealing with over 3 million still photographs, and using a layered indexing approach consisting of the object (trees, sky, dirt road, etc.), style (glamour, documentary, engineering) and implication (purpose, unique qualities) (Schroeder, 1998). The implication layer is the greatest challenge to the indexer and of highest value to the searcher, as it provides the differentiating unique quality descriptions of the images, enabling an increase in precision.

The large number of terms necessary in image indexing has led to a large number of specialized thesauri-based systems. The Art and Architecture Thesaurus uses hierarchically arranged “terminology describing physical attributes, styles and periods, agents, activities, materials and objects” (Baxter & Anderson) ICONCLASS is another classification system used extensively to index images in the domain of art history (Baxter & Anderson). Thesauri-based systems raise issues of cost-effectiveness: it is time-consuming to translate the multitude of terms into structured vocabularies that are not standardized for the entire domain of images. The subjective elements of ‘aboutness’ of an image, formed by the viewer during the sensory visual experience of understanding an image require too many terms to be exhaustively indexed. This is the reason why content-based algorithmic techniques continue to be the focus of research and development.

The misconception about algorithmic techniques is that they do not require human indexers. Modeling indexing activities for a computer requires the kind of sound theoretical foundation in indexing that only professional indexers and information professionals can provide. The separation of computer scientists working towards content-based indexing from traditional indexing theorists is not ideal. It seems unlikely that mathematical analysis of images as bit-depth digit maps alone will result in useable systems. It is the hybrid approaches which are most promising.

Besser’s proposed hybrid solution of “text-based cataloguing and indexing sufficient to allow the user to narrow a retrieval set to a reasonable size, coupled with some kind of procedure for browsing through the retrieved set of images” (790) anticipates the popular image search interfaces by Google, Yahoo! and MSN. A division into a pyramid of syntactic and semantic levels for visual descriptor terms has produced “consistent and positive results” for image representation and retrieval. (Jörgensen et al. 2001: 945) This approach satisfies Small’s Aristotelian “Principle Number One” for image indexing: “’Do not make your datum more accurate than it is.’ This principle may be rephrased as, ‘Preserve the Mess.’ Preservation of ambiguity, however, does not mean a lack of either organization or controls.” (Small 52)

The reasonable approach is to combine algorithmic approaches for syntactic attributes that can be extracted from images automatically, and human indexing for semantic attributes that require world knowledge. Jörgensen et al.’s pyramid contains the following syntactic elements that lend themselves to automatic (computer-generated) indexing: type/technique (e.g.: black & white, color), spectral sensitivity (color), frequency sensitivity (texture), image components such as dot, line, tone, spatial layout of elements. (Jörgensen et al. 2001: 940) Automated assignment of keywords saves time and money, so it should be done as much as possible. However, humans “mainly use higher level attributes to describe, classify and search for visual material” (Jörgensen et al. 2001: 940) and these semantic attributes cannot at this time be algorithmically assigned. The semantic attributes include: generic objects (e.g.: table, telephone, chair), generic scenes (e.g.: city, landscape, indoor, outdoor, portrait), specific objects (e.g.: Notre Dame Basilica, Bruce Lee), specific scenes (e.g.: Warsaw, Plains of Abraham), and abstract objects (e.g.: guilt, remorse) (Jörgensen et al. 2001: 940).

The popularity of the World Wide Web has emphasized the need to index images in a multimodal environment which lends itself especially to hybrid approaches. The MARIE-3 system, for example, uses Web page layout and word syntax to extract captions of photographs (accompanying text) from the rest of the text on the page (Rowe & Frew 1998). Extracted captions from multimodal documents can be used as an automatic content-based indexing strategy for improving search precision (Srihari et al.). With the addition of human semantic level indexing to the system using a structured set of categories for visual descriptors (see Jörgensen et al. 2001 pyramid) we have the kind of hybrid approach that results in useable image search applications for the web.

Conclusion

This paper presents visual materials as distinct from the textual in that only the latter satisfy Nelson Goodman’s syntactic requirements for a language. When indexing images with textual descriptors, indexers are in fact creating linguistic interpretations of their subjective experience of the images. The end-users are similarly creative when searching for images using written language. The result of this is the need for general theoretical basis for crosswalks from these image-experiences to linguistic descriptions, because as Paula Berinstein points out “If your inquiring mind and that of the cataloger don’t meet, you won’t find the pictures you need.” (85)

The theoretical distinction between the Of and About of a picture has been used to create structured layers of visual descriptors. Syntactic attributes of images can be indexed algorithmically; however, human information seekers use more abstract terms for searching that require the indexing and image recognition abilities of the human mind. Computer scientists are continually improving algorithms for automatic content-based image analysis and indexing. The use of image captions in multimodal environments as a source of automatically generated index terms has proven to be particularly successful. However, this is nothing more than the extraction of human-generated indexing and description from multi-modal environments. End-users search for images using the kind of abstract concepts that require human processing and classification of the images. It is inevitable that in the absence of existing textual descriptions for images, the indexer will have to create these in order to provide access. Solid theoretical background in the types and classes of descriptors for the domain of images will improve inter-indexer consistency, but the interpretation of images will always be a creative process.

 

Images:

· All of the images used in this paper are by the author (Tomasz Neugebauer)

Works Cited:

Baxter, G. Anderson, D. “Image indexing and retrieval: some problems and proposed solutions” Internet Research 6.4 (1996). Research Libraries. ProQuest. McGill University Libraries. 1 Mar. 2005 <http://www.proquest.com>.

Berinstrein, P. “Do You See What I See? Image Indexing Principles for the Rest of Us” Online March/April (1999): 85-88. Research Libraries. ProQuest. McGill University Libraries. 1 Mar. 2005 <http://www.proquest.com>.

Besser, H. “Visual Access to Visual Images: The UC Berkley Image Database Project” Library Trends 38.4 (1990): 787-98.

Chu, H. “Research in Image Indexing and Retrieval as Reflected in the Literature.” Journal of the American Society for Information Science and Technology 52.12 (2001): 1011-1018. Research Libraries. ProQuest. McGill University Libraries. 1 Mar. 2005 <http://www.proquest.com>.

Goodman, Nelson. Languages of Art : an approach to a theory of symbols. Indianapolis: Hackett Publishing Company, 1976.

Jacobs, C. "If a picture is worth a thousand words, then…." The Indexer 21.3 (1999): 119-121.

Jörgensen, C. “Access to Pictorial Material: A Review of Current Research and Future Prospects.” Computers and Humanities 33 (1999): 293-318. Elsevier Science Direct. McGill University Libraries. 1 Mar. 2005 < http://www.sciencedirect.com/>.

Jörgensen, C., Jaimes A., Benitez, A. B., Chang, S. “A Conceptual Framework and Empirical Research for Classifying Visual Descriptors.” Journal of the American Society for Information Science and Technology 52.11 (2001): 938-947. Research Libraries. ProQuest. McGill University Libraries. 1 Mar. 2005 <http://www.proquest.com>.

Rowe, N., C., Frew, B. “Automatic Caption Localization for Photographs on World Wide Web.” Information Processing & Management 34.1 (1998): 95-107. Elsevier Science Direct. McGill University Libraries. 1 Mar. 2005 < http://www.sciencedirect.com/>.

Shatford Layne, S. "Some issues in the indexing of images." Journal of the American Society for Information Science 45.8 (1994): 583-588. ACM Portal. Google Scholar. 6 Mar. 2005 <http://scholar.google.com>.

Shatford, S. “Analyzing the Subject of a Picture: A Theoretical Approach.” Cataloging & Classification Quarterly 6.3 (1986): 39-62.

Schroeder, K. “Layered indexing of images.” The Indexer. 21.1 (1998):11-15.

Small, J., P. “Retrieving Images Verbally: No More Key Words and Other Heresies.” Library Hi Tech 9.1 (1991): 51-60.

Srihari, R., K., Zhang, Z., Aibing, R. “Intelligent Indexing and Semantic Retrieval of Multimodal Documents.” Information Retrieval 2 (2000): 245-275. Elsevier Science Direct. McGill University Libraries. 1 Mar. 2005 < http://www.sciencedirect.com/>.

[*] Ansel Adams quotation source: The Most Notable Quotations: 1950-1988, Compiled by James B. Simpson. Originally published by Boston: Houghton Mifflin Company, 1988. 7 Mar. 2005. <http://www.bartleby.com/63/3/5803.html>


in this section:
1
2
3







PhotographyMedia
-