Abstract
The amount of digital video being shot, captured, and
stored is growing at a rate faster than ever before. The large
amount of stored video is not penetrable without efficient video
indexing, retrieval, and browsing technology. Most prior work in
the field can be roughly categorized into two classes. One class
is based on image processing techniques, often called
content-based image and video retrieval, in which video frames
are indexed and searched for visual content. The other class is
based on spoken document retrieval, which relies on automatic
speech recognition and text queries. Both approaches have major
limitations. In the first approach, semantic queries pose a great
challenge, while the second, speech-based approach, does not
support efficient video browsing. This paper describes a system
where speech is used for efficient searching and visual data for
efficient browsing, a combination that takes advantage of both
approaches. A fully automatic indexing and retrieval system has
been developed and tested. Automated speech recognition and
phonetic speech indexing support text-to-speech queries. New
browsable views are generated from the original video. A special
synchronized browser allows instantaneous, context-preserving
switching from one view to another. The system was successfully
used to produce searchable-browsable video proceedings for three
local conferences.