A Study on Unified Retrieval Methods for Multimedia Documents

Yu Suzuki (9951057)

Recently, a retrieval method for electronic documents should be considered not only text information but also multimedia information such as images, videos, and so on. This is why current electronic documents are composed of many kinds of medium. In this paper, we propose a method to integrate each retrieval method of the media. In our method, we extract documents' features such as term frequencies of text information, color histograms of image information, and layout information of these information, and generate feature vectors of each media from the media information. When users retrieve electronic documents, users search relevant electronic documents based on similarities which are calculated in each medium retrieval technique. Therefore, users can retrieve electronic documents corresponding to the users' interest rather than text retrieval technique. Furthermore, we evaluate our proposed method using PDF files, because PDF has a strict format concerned with layout information which is different from HTML's, and has not only text information but also image information, too. We can verify the efficiency of our proposed method in this experiment.