Article

DP9: an OAI gateway service for web crawlers

Authors:
Xiaoming Liu

Old Dominion University, Norfolk, VA

Old Dominion University, Norfolk, VA
View Profile

,
Kurt Maly

Old Dominion University, Norfolk, VA

Old Dominion University, Norfolk, VA
View Profile

,
Mohammad Zubair

Old Dominion University, Norfolk, VA

Old Dominion University, Norfolk, VA
View Profile

,
Michael L. Nelson

NASA Langley Research Center, Hampton, VA

NASA Langley Research Center, Hampton, VA
View Profile

JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital librariesJuly 2002Pages 283–284https://doi.org/10.1145/544220.544284

Published:14 July 2002Publication History

JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries

Pages 283–284

ABSTRACT

Many libraries and databases are closed to general-purpose Web crawlers, and they expose their content only through their own search engines. At the same time many researchers attempt to locate technical papers through general-purpose Web search engines. DP9 is an open source gateway service that allows general search engines, (e.g. Google, Inktomi) to index OAI-compliant archives. DP9 does this by providing consistent URLs for repository records, and converting them to OAI queries against the appropriate repository when the URL is requested. This allows search engines that do not support the OAI protocol to index the "deep Web" contained within OAI compliant repositories.

References

M. K. Bergman. The Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing, 7(1), 2001]]Google ScholarCross Ref
M. Mahoui and S. J. Cunningham. Search Behavior in a Research-Oriented Digital Library. Proceedings of ECDL2001, Darmstadt, Germany, September 4--9, 2001, LNCS 2163, pp. 13--24]] Google ScholarDigital Library
C. Lagoze and H. Van de Sompel. The Open Archives Initiative: Building a low-barrier interoperability framework. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, Roanoke VA, June 24-28, 2001, pp. 54--62]] Google ScholarDigital Library
X. Liu, K. Maly, M. Zubair, and M. L. Nelson. Arc - An OAI Service Provider for Digital Library Federation, D-Lib Magazine 7(4), April 2001]]Google Scholar
M. Koster. The Web Robots Page. Available at http://info.webcrawler.com/mak/projects/robots/robots.html]]Google Scholar
OAI Perl. Available at http://oai-perl.sourceforge.net/]]Google Scholar

Index Terms

DP9: an OAI gateway service for web crawlers
1. Applied computing
  1. Computers in other domains
    1. Digital libraries and archives
2. Information systems
  1. Information systems applications
    1. Digital libraries and archives

Recommendations

Current challenges in web crawling
ICWE'13: Proceedings of the 13th international conference on Web Engineering

Web crawling, a process of collecting web pages in an automated manner, is the primary and ubiquitous operation used by a large number of web systems and agents starting from a simple program for website backup to a major web search engine. Due to an ...
Read More
Search Engine Coverage of the OAI-PMH Corpus

Having indexed much of the "surface" Web, search engines are now using various approaches to index the "deep"Web. At the same time, institutional repositories and digital libraries are adopting the Open Archives Initiative Protocol for Metadata ...
Read More
A framework for incremental deep web crawler based on URL classification
WISM'11: Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II

With the Web grows rapidly, more and more data become available in the Deep Web.But users have to key in a set of keywords in order to access the pages from some web sites. Traditional search engines only index and retrieve Surface Web pages through ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
July 2002
448 pages
ISBN:1581135130
DOI:10.1145/544220
General Chair:
William Hersh
Oregon Health & Science University
,
Program Chair:
Gary Marchionini
University of North Carolina at Chapel Hill
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 July 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep web
gateway service
open archives initiative
Qualifiers
- Article
Conference

Acceptance Rates
JCDL '02 Paper Acceptance Rate69of240submissions,29%Overall Acceptance Rate415of1,482submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 577
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DP9: an OAI gateway service for web crawlers

JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries

ABSTRACT

References

Cited By

Index Terms

Recommendations

Current challenges in web crawling

Search Engine Coverage of the OAI-PMH Corpus

A framework for incremental deep web crawler based on URL classification