ABSTRACT
The development of a new test collection is described in which the task is to search naturally occurring spoken content using naturally occurring spoken queries. To support research on speech retrieval for low-resource settings, the collection includes terms learned by zero-resource term discovery techniques. Use of a new tool designed for exploration of spoken collections provides some additional insight into characteristics of the collection.
- T. Akiba et al. Overview of the NTCIR-11 spoken query and doc task. In NTCIR-11, 2014.Google Scholar
- X. Anguera et al. The spoken web search task. In MediaEval, 2013.Google Scholar
- P. Comas et al. Sibyl, a factoid question-answering system for spoken documents. ACM TOIS, 30 (3): 19, 2012. Google ScholarDigital Library
- M. Dredze et al. NLP on spoken documents without ASR. In EMNLP, 2010. Google ScholarDigital Library
- J. Garofolo et al. The TREC spoken document retrieval track: A success story. In RIAO, 2000.Google Scholar
- H. Joshi and J. White. Document silmilarity amid automatically detected terms. In FIRE, 2014.Google Scholar
- D. Oard et al. The FIRE 2013 question answering for the spoken web task. In FIRE, 2013. Google ScholarDigital Library
- N. Patel et al. Avaaj Otalo: A field study of an interactive voice forum for small farmers in rural India. In CHI, 2010. Google ScholarDigital Library
- J. White et al. Using zero-resource spoken term discovery for ranked retrieval. In NAACL-HLT, 2015.Google ScholarCross Ref
- E. Yilmaz et al. A simple and efficient sampling method for estimating AP and NDCG. In SIGIR, 2008. Google ScholarDigital Library
Index Terms
- A Test Collection for Spoken Gujarati Queries
Recommendations
Vocabulary independent spoken term detection
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalWe are interested in retrieving information from speech data like broadcast news, telephone conversations and roundtable meetings. Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts; the transcripts ...
A Test Collection for Ad-hoc Dataset Retrieval
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information RetrievalThis paper introduces a new test collection for ad-hoc dataset retrieval, which have been developed through a shared task called Data Search in the fifteenth NTCIR. This test collection consists of dataset collections derived from the US and Japanese ...
Spoken information retrieval for turkish broadcast news
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalSpeech Retrieval systems utilize automatic speech recognition (ASR) to generate textual data for indexing. However, automatic transcriptions include errors, either because of out-of-vocabulary (OOV) words or due to ASR inaccuracy. In this work, we ...
Comments