8 Multimedia Information Retrieval

Vinit Kumar

I. Objectives

The objective of this module is to:

• Describe the basic concept of Multimedia Information system.

• Introduce the reader about various scenarios where non-textual or multimedia information plays major role.

• Familiarize the reader with the various functions of Multimedia Information Retrieval.

• Enlist different types of Multimedia Information Retrieval.

• Introduce various approaches to handle different Multimedia Information Retrieval systems.

II. Learning Outcomes

After reading this module:

• The reader will gain the knowledge of different multimedia objects.

• The reader will gain a good understanding of Multimedia Information Retrieval System.

• The reader will be know about the different subfields of Multimedia Information Retrieval.

• The reader will gain the knowledge of various information retrieval techniques for retrieval of multimedia information like Audio, Video, etc.

• The reader will understand the various research areas in Multimedia Information Retrieval.

III. Structure

1. Introduction

2. Multimedia Information Retrieval

3. Text Information Retrieval

4. Audio and Music Information Retrieval

5. Image Information Retrieval

6. Video Information Retrieval

7. Summary

8. References

1. Introduction

Text has been the predominant medium for the communication of information. With the availability of better computing capabilities such as availability of disk space, memory space, display resolution and better processing power the other media such as image, audio and video etc. are also gaining importance. There are many fields of work that require access to non-textual information. For example, medical professionals need access to medical images, architects to building plans, ornithologists to bird calls, estate agents to property photographs, car engineers and buyers need photographs and sound of car engines, and so on. This gave rise to another kind of systems that have been developed to handle information contained in more than one medium, known as multimedia information systems.

In the other sections of this course, we are already aware that the text information retrieval has been already established well. However, multimedia information retrieval is less established. For having good understanding about the multimedia information retrieval it is very necessary to understand the basics of text retrieval, audio and music information retrieval, image and video information retrieval.

In this module we will discuss the definition, approaches and applications of each one of the above.

2. Multimedia Information Retrieval

Multimedia information retrieval system is associated with storage, indexing, search, and delivery of multimedia data such as images, videos, sounds, 3D graphics, or their combination. By definition, it includes works on, for example, extracting descriptive features from images, reducing high dimensional indexes into low-dimensional ones, defining new similarity metrics, efficient delivery of the retrieved data, and so forth. Systems that provide all or part of the above functionalities are multimedia retrieval systems. The Google image search engine is a typical example of such a system. A video-on-demand site that allows people to search movies by their titles is another example.

Multimedia information has some specific characteristics that make it distinct from textual information; thus multimedia information retrieval systems differ from conventional text retrieval systems. A good multimedia information retrieval system should have the capability to store, retrieve and present heterogeneous data ranging from text to audio, still and moving images and digital video. The architecture of a multimedia information retrieval system depends on the characteristics of the multimedia data and the kind of operations to be performed on such data.

Multimedia information retrieval encompasses different subareas:

• Content representation and multimedia object representation.

• Feature extraction

• Query formulation to map high-level semantic concepts into low level features

• Query-by-example

• Relevance feedback and interactive queries

• Efficient feature indexing and cataloguing

• Integrated searching and browsing

• Techniques of searching multimedia based on their contents

In short we can say, multimedia information retrieval is retrieval of text, image, video, and sound data related to the user and their ranking according to some similarity degree. The better the similarity degree there will be more likelihood of user finding the relevant answers.

3. Text Information Retrieval

Text information retrieval is now very well established and developed. It basically involves answering user queries based on a keyword index. In a typical session, a user frames his query in the form of some keywords in a search bar. The keywords received through the search bar are processed using various techniques of spell check, tokenization and applying logical operators. The processed query is matched with the already created index of documents present in the information system. The result of this matching is displayed to the user specifying the location of documents. Finally the user selects relevant document and if dissatisfied, reframes the query.

In other modules text information retrieval has been discussed in detail.

4. Audio and Music Information Retrieval

Audio and music information retrieval have become prominent areas of research over the past few years. The most easy and commonly used medium of communication is speech. Although tools have been developed for capturing and storing the audio information, due to its sequential nature it becomes very challenging to retrieve a particular piece of audio from a long recorded audio. Another problem is that of losing the context of the retrieved audio.

The conventional text retrieval techniques may be applied to voice retrieval easily if we could generate the transcripts of the spoken audio documents. Advances in speech recognition have made it possible to automatically generate good quality speech transcripts. A perfect automatic speech recognition (ASR) system that can efficiently transcribe spoken audio document would be an ideal solution. Hidden Markov Models (HMM) form the backbone of ASR systems. An HMM is a statistical representation of a speech event like a word. Model parameters are trained on a large corpus of labelled speech data. Once a trained set of HMMs is generated, query speech can be matched to find the most likely model sequence (the recognized words). However, even good quality transcript lacks punctuation, paragraphs, and all the elements that provide structure. Although the retrieval based on speech transcripts seems to be very close to text retrieval, in practice it is not so. Out-of-vocabulary words, such as proper nouns cannot be recognized by many ASR systems. Another problem is that the whole process is very time consuming and expensive as the audio documents usually have large sized file.

Similarly, music information retrieval is also another area of research in multimedia information retrieval. This is very less developed field as it does not contain any specific words per se. Transcript generation in not possible in musical information. Music information consists of seven facets:

• Pitch: a quality of sound that is related to the frequency

• Tempo: information concerning the duration of a musical event

• Harmony: related to the attribute of music

• Timbre: related to tone

• Editing: related to the performance instructions as fingering, ornamentation, articulations and so on.

• Text: related to the lyrics, symphonies and so on.

• Bibliography: information about the composer, performer, title of the piece, publisher etc.

A query based music retrieval system relies on similarity matching between the query and the stored music. The user is provided with an interface where the user submits query by playing the music or humming a tune. Then the received query is transformed into digital format and matched with the available recordings in the database and the most relevant strings are returned to the user. Some other advanced approaches involve matching the different attributes discussed above, such as comparing the edit distances, matching the pitch contours, matching the time contours that represent the rhythm information.

In other approaches, text based retrieval techniques are used for music retrieval such as, providing search by the name of the artist, title of the song, file types, popularity ratings and by keywords.

Audio retrieval research faces many challenges because of two distinct qualities of audio data: audio data is aurally based instead of visually based, and audio data is time-dependent. Also the presentation of retrieved results is also a challenge, as the audio data is time-dependent a user when supplied with 20 clips and simultaneously played would be of no use to the searcher. There are some efforts going on for finding how to browse and navigate through databases of audio.

5. Image Information Retrieval

Among audio, video and image, image information retrieval is the best developed technology, since even before modern civilization; images are being used as a major medium for communication. With over a decade of research and development, image information retrieval has had time to grow and mature. However, image processing and retrieval activities began in the 1980s, and became active area of research interest since the creation of the web in the early 1990s.

Several commercial image data management systems provide retrieval based on metadata and text keywords or assigned descriptors. These metadata and descriptors are manually provided by the human indexers by describing the various attributes of the image. The process of describing images by human indexers is very expensive and time-consuming process, and yet is highly subjective. The attributes that can be used for retrieval are:

• A combination of colour, texture, shape and so on

• A specific arrangement of objects in the image

• Depiction of a particular event

• Presence of one or more persons or objects

• Presence of a specific location

• Emotions attached to an event or a person, and so on.

All these attributes are researched under the category of “context based image retrieval (CBIR)” Two different types of interfaces are used for querying images:

i. Browsing and navigation interface: here the user is allowed to browse a collection and navigate through a structured collection to images in order of retrieve the desired images(s).

ii. Query interface: in most cases a query by example approach is followed whereby the user can specify an image from a collection, which is used as a query to search the database. The problem of this approach is that the system should have a selection of sample objects, with associated attributes, which can be used for querying the database. Some interfaces offer options for selection from a palette or sketch input.

Similarity between a query image and the image objects in the collection is computed and images that fully or partially match the query are retrieved.

Other approaches involve, answering the query based on colour histogram. Colour histogram shows the proportion of pixels of each colour within the image. A colour histogram is computed for each image in the database and stored internally as vector of values which are easier for matching algorithms to match with the query. This approach solves queries like “find all images whose most frequently used colour is similar to this image”.

6. Video Information Retrieval

Video data retrieval shares some properties with image data retrieval, due to the commonality of their visual nature. However, the video information is time-dependent like audio information, which means the video information changes as the time changes; also usually video is associated with synchronized audio track, the image and audio retrieval techniques are applied on video retrieval too. Videos are usually made up of a number of distinct scenes each of which can be further broken down to individual shots depicting a single view, conversation or action.

Video retrieval systems are still in early stage of research and development. There are some popular approaches being followed in video retrieval. Application of text retrieval techniques combined with content-based image retrieval techniques (CBIR) is deemed as the best solution. Another one is by indexing of video metadata, colour histograms, texture analysis, video segmentation (breaking the video into small segments, usually where the camera shot changes), pattern recognition.

In another approach, if closed-captioning signals exists for a video, the keywords are extracted from the text of the closed captioning, using well understood text manipulation techniques. If the closed-captioning does not exist the keywords are extracted from the audio stream.

Another strategy involves use of key frames. Key frames are frames whose images represent a semantic unit of the stream, such as a scene. After extracting the key frames, image retrieval techniques could be applied to support queries on key frames.

7. Summary

In this Module we tried to understand the answers to the following questions:

• What is multimedia information retrieval?

• What is text, image, audio and video information retrieval?

• What are the various approaches being followed for the retrieval of the above?

• What are the areas of application of multimedia information retrieval?

Multimedia offers a richer experience than plain text as they have other details which were otherwise not possible to express in text only format. We understood the various aspects and properties of image, text, audio, music, and video information representations and the concept of multimedia information retrieval and types of media that forms multimedia. Though the text is most prevalent format for information the underlying technology and developments in infrastructure are making other forms more visible. The approaches being used for developing multimedia information retrieval systems are also discussed.

8. References

1. Chaudhary, G.G. Introduction to modern information retrieval, 2nd ed. London, Facet Publishing, 2004

2. Downie, J. S., Music Information retrieval. In Cronin, B. (ed.) ARIST, 37, Information Today, 2002.

3. Eakins, J. P. and Graham, M.E., Context-based image retrieval: a report to the JISC technology application programme, 1999.

4. Eakins, J. P., Techniques for image retrieval, Library and Information Briefings, 85, south bank University, 1998.

5. Faloutsos, C., Multimedia IR: indexing and searching. In Baeza-, R. and Ribeiro-Neto, B., Modern information retrieval, ACM Press, 1999.

6. Gudivada, V. N. and Raghavan, V. V., Modeling and retrieving images by content.

Information processing & management, 33(4), 427-452, 1997.

7. Khosrow-Pour, Mehdi(ed.), Dictionary of information science and technology, Hershey, Idea Group Reference, 2006

8. Lancaster, F.W., Indexing and abstracting theory and practice, 3rd ed. London, Facet Publishing, 2003.

9. Rasmussen, E., Libraries and bibliographic systems. In Baeza-Yates, R. and Ribeiro- Neto, B., Modern information retrieval, ACM Press, 1999.

10. Tang, Nelson, and Jonathan Furner. “Multimedia Information Retrieval Systems: An Overview”, 1999.