short-paper

A Test Collection for Spoken Gujarati Queries

Authors:
Douglas W. Oard

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Rashmi Sankepally

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Jerome White

New York University, Abu Dhabi, Uae

New York University, Abu Dhabi, Uae
View Profile

,
Aren Jansen

Johns Hopkins University, Baltimore, MD, USA

Johns Hopkins University, Baltimore, MD, USA
View Profile

,
Craig Harman

Johns Hopkins University, Baltimore, MD, USA

Johns Hopkins University, Baltimore, MD, USA
View Profile

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalAugust 2015Pages 919–922https://doi.org/10.1145/2766462.2767791

Published:09 August 2015Publication History

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 919–922

ABSTRACT

The development of a new test collection is described in which the task is to search naturally occurring spoken content using naturally occurring spoken queries. To support research on speech retrieval for low-resource settings, the collection includes terms learned by zero-resource term discovery techniques. Use of a new tool designed for exploration of spoken collections provides some additional insight into characteristics of the collection.

References

T. Akiba et al. Overview of the NTCIR-11 spoken query and doc task. In NTCIR-11, 2014.Google Scholar
X. Anguera et al. The spoken web search task. In MediaEval, 2013.Google Scholar
P. Comas et al. Sibyl, a factoid question-answering system for spoken documents. ACM TOIS, 30 (3): 19, 2012. Google ScholarDigital Library
M. Dredze et al. NLP on spoken documents without ASR. In EMNLP, 2010. Google ScholarDigital Library
J. Garofolo et al. The TREC spoken document retrieval track: A success story. In RIAO, 2000.Google Scholar
H. Joshi and J. White. Document silmilarity amid automatically detected terms. In FIRE, 2014.Google Scholar
D. Oard et al. The FIRE 2013 question answering for the spoken web task. In FIRE, 2013. Google ScholarDigital Library
N. Patel et al. Avaaj Otalo: A field study of an interactive voice forum for small farmers in rural India. In CHI, 2010. Google ScholarDigital Library
J. White et al. Using zero-resource spoken term discovery for ranked retrieval. In NAACL-HLT, 2015.Google ScholarCross Ref
E. Yilmaz et al. A simple and efficient sampling method for estimating AP and NDCG. In SIGIR, 2008. Google ScholarDigital Library

Index Terms

A Test Collection for Spoken Gujarati Queries
1. Information systems
  1. Information retrieval

Recommendations

Vocabulary independent spoken term detection
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

We are interested in retrieving information from speech data like broadcast news, telephone conversations and roundtable meetings. Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts; the transcripts ...
Read More
A Test Collection for Ad-hoc Dataset Retrieval
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

This paper introduces a new test collection for ad-hoc dataset retrieval, which have been developed through a shared task called Data Search in the fifteenth NTCIR. This test collection consists of dataset collections derived from the US and Japanese ...
Read More
Spoken information retrieval for turkish broadcast news
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Speech Retrieval systems utilize automatic speech recognition (ASR) to generate textual data for indexing. However, automatic transcriptions include errors, either because of out-of-vocabulary (OOV) words or due to ASR inaccuracy. In this work, we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2015
1198 pages
ISBN:9781450336215
DOI:10.1145/2766462
General Chair:
Ricardo Baeza-Yates
Yahoo Labs, USA
,
Program Chairs:
Mounia Lalmas
Yahoo Labs, UK
,
Alistair Moffat
University of Melbourne, Australia
,
Berthier Ribeiro-Neto
Google, Brazil, and UFMG, Brazil
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
relevance judgment
speech retrieval
test collection
Qualifiers
- short-paper
Conference

Acceptance Rates
SIGIR '15 Paper Acceptance Rate70of351submissions,20%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 156
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Test Collection for Spoken Gujarati Queries

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Vocabulary independent spoken term detection

A Test Collection for Ad-hoc Dataset Retrieval

Spoken information retrieval for turkish broadcast news