Multiple Buffering for Parallel Approximate Sequence Matching using Disk-based Suffix Tree on Multi-core CPU

Tamura, Keiichi; Watanuki, Yousuke; Kitakami, Hajime; Takahashi, Yoshifumi

doi:10.7603/s40601-013-0022-0

Multiple Buffering for Parallel Approximate Sequence Matching using Disk-based Suffix Tree on Multi-core CPU

Open access
Published: 28 February 2014

Volume 3, article number 22, (2013)
Cite this article

Download PDF

You have full access to this open access article

GSTF Journal on Computing (JoC)

Multiple Buffering for Parallel Approximate Sequence Matching using Disk-based Suffix Tree on Multi-core CPU

Download PDF

Keiichi Tamura B.Eng., M.Eng., Ph.D.^1,2,
Yousuke Watanuki¹,
Hajime Kitakami M.Eng., Ph.D.^1,3 &
…
Yoshifumi Takahashi⁴

154 Accesses
Explore all metrics

Abstract

Suffix trees, which are trie structures that present the suffixes of sequences (e.g., strings), are widely used for sequence search in different application domains such as, text data mining, bioinformatics and computational biology. In particular, suffix trees are useful in bioinformatics applications, because they can search similar sub-sequences and extract frequent sequence patterns efficiently. In recent years, efficient construction of a suffix tree that allows faster sequence searches has become one of the most important challenges, because the number and size of the data that are stored in sequence databases have been increasing exponentially. This paper proposes a novel parallelization model for approximate sequence matching that uses disk-based suffix trees, which are built on hard disks not on memory, on a multi-core CPU. In the proposed parallelization model, we divide an entire sequence database into two or more sub-databases called partitions. For each partition, we build a disk-based suffix tree and define a task as an approximate sequence matching on one disk-based suffix tree. Moreover, the proposed parallelization model involves a multiple buffering management system to avoid conflicts among CPU-cores. We evaluated the proposed parallelization model using an actual amino acid sequence database on a PC. The experimental results show a substantial improvement in computation performance.

Article PDF

In-Place Suffix Sorting on a Multicore Computer with Better Design

Fast induced sorting suffixes on a multicore machine

Article 02 May 2018

Space-efficient computation of parallel approximate string matching

Article 07 January 2023

References

P. Weiner, “Linear pattern matching algorithms,” in Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973), SWAT ’73, pp. 1–11, 1973.
E. M. McCreight, “A space-economical suffix tree construction algorithm,” Journal of the ACM, vol. 23, pp. 262–272, Apr. 1976.
D. Gusfield, Algorithms on strings, trees, and sequences: computer science and computational biology. New York, NY, USA: Cambridge University Press, 1997.
Google Scholar
Y. Tian, S. Tata, R. A. Hankins, and J. M. Patel, “Practical methods for constructing suffix trees,” The VLDB Journal, vol. 14, no. 3, pp. 281–299, 200–5.
B. Phoophakdee and M. J. Zaki, “Genome-scale disk-based suffix tree indexing,” in Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD ’07, pp. 833–844, 2007.
M. Barsky, U. Stege, A. Thomo, and C. Upton, “Suffix trees for very large genomic sequences,” in Proceedings of the 18th ACM conference on Information and knowledge management, CIKM ’09, pp. 1417–1420, 2009.
M. R. M. Mark D. Hill, “Amdahl’s law in the multicore era,” vol. 41 of IEEE Computer 2008, pp. 33–38, 200–8.
D. J. DeWitt and J. Gray, “Parallel database systems: the future of database processing or a passing fad?,” ACM SIGMOD Record, vol. 19, pp. 104–112, Dec. 1990.
D. Comer, “Ubiquitous b-tree,” ACM Computing Surveys (CSUR), vol. 11, pp. 121–137, June 1979.
B. Seeger and P.-A. Larson, “Multi-disk b-trees,” in Proceedings of the 1991 ACM SIGMOD international conference on Management of data, SIGMOD ’91, pp. 436–445, 1991.
A. Guttman, “R-trees: a dynamic index structure for spatial searching,” in Proceedings of the 1984 ACM SIGMOD international conference on Management of data, SIGMOD ’84, pp. 47–57, 1984.
I. Kamel and C. Faloutsos, “Parallel r-trees,” in Proceedings of the 1992 ACM SIGMOD international conference on Management of data, SIGMOD ’92, pp. 195–204, 1992.
G. Graefe, H. Kimura, and H. Kuno, “Foster b-trees,” ACM Trans. Database Syst., vol. 37, pp. 17:1–17:29, Sept. 2012.
R. Hariharan, “Optimal parallel suffix tree construction,” in Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, STOC ’94, pp. 290–299, 1994.
D. Tsirogiannis and N. Koudas, “Suffix tree construction algorithms on modern hardware,” in Proceedings of the 13th International Conference on Extending Database Technology, EDBT ’10, pp. 263–274, 2010.
A. Ghoting and K. Makarychev, “I/o efficient algorithms for serial and parallel suffix tree construction,” ACM Trans. Database Syst., vol. 35, pp. 25:1–25:37, Oct. 2010.
Google Scholar
E. Mansour, A. Allam, S. Skiadopoulos, and P. Kalnis, “Era: efficient serial and parallel suffix tree construction for very long strings,” Proc. VLDB Endow., vol. 5, pp. 49–60, Sept. 2011.
R. Johnson, I. Pandis, N. Hardavellas, A. Ailamaki, and B. Falsafi, “Shore-mt: a scalable storage manager for the multicore era,” in Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09, pp. 24–35, 2009.
X. Ding, K. Wang, and X. Zhang, “Srm-buffer: an os buffer management technique to prevent last level cache from thrashing in multicores,” in Proceedings of the sixth conference on Computer systems, EuroSys ’11, pp. 243–256, 2011.
S. J. Bedathur and J. R. Haritsa, “Engineering a fast online persistent suffix tree construction.,” in ICDE, pp. 720–731, 200–4.

Download references

Author information

Authors and Affiliations

Department of Intelligent Systems, Graduate School of Information Sciences, Kyushu University, Fukuoka, Japan
Keiichi Tamura B.Eng., M.Eng., Ph.D., Yousuke Watanuki & Hajime Kitakami M.Eng., Ph.D.
Information Science, Hiroshima City University, Hiroshima, Japan
Keiichi Tamura B.Eng., M.Eng., Ph.D.
Tohoku University, Tohoku, Japan
Hajime Kitakami M.Eng., Ph.D.
Master of Information Engineering degree, Graduate School of Information Science, Hiroshima City University, Hiroshima, Japan
Yoshifumi Takahashi

Authors

Keiichi Tamura B.Eng., M.Eng., Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Yousuke Watanuki
View author publications
You can also search for this author in PubMed Google Scholar
Hajime Kitakami M.Eng., Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Yoshifumi Takahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keiichi Tamura B.Eng., M.Eng., Ph.D..

Additional information

Y.Watanuki, K.Tamura, H.Kitakami and Y.Takahashi are with the Graduate School of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-Higashi, Asa-Minami-Ku, Hiroshima 731-3194, Japan; corresponding e-mail: (ktamura@hiroshima-cu.ac.jp).

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Yousuke Watanuki is a student at the Department of Intelligent Systems, Graduate School of Information Sciences, Hiroshima City University, Hiroshima, Japan. His research interests include suffix tree and parallel computing.

Keiichi Tamura received his B.Eng., M.Eng., and Ph.D. degrees in Information Science from Kyushu University, Fukuoka, Japan, in 1998, 2000, and 2005, respectively. He is presently Associate Professor at the Department of Intelligent Systems, Graduate School of Information Sciences, Hiroshima City University, Hiroshima, Japan. He has been in organizing committee of IEEE SMC Hiroshima Chapter since 2012. His research interests include parallel computing, data engineering, data mining, and evolutionary computation. He is a member of IEEE, Information Processing Society of Japan, Database Society of Japan, The Japanese Society for Artificial Intelligence, Japan Society for Fuzzy Theory and Intelligent Informatics.

Hajime Kitakami has been a Professor in the Department of Intelligent Systems, Graduate School of Information Sciences, Hiroshima City University in Japan since 1994. He received the M.Eng. from Tohoku University in 1976 and Ph.D. in engineering from Kyushu University in 1992. His paper was recorded as the 25th Anniversary Best Paper Award of Information Processing Society of Japan (IPSJ) in 1985. He received Paper Award from Japanese Society for Engineering Education (JSEE) in 2003. His research interests include database, data mining, distributed parallel processing, and bioinformatics. He has been an editorial board member for Transactions on Mathematical Modeling and its Applications (TOM), Journal of the Information Processing Society of Japan (IPSJ) since 2006. Also, he has been an editorial board member for Journal of the Database Society of Japan (DBSJ) since 2008.

Yoshifumi Takahashi Yoshifumi Takahashi received a Master of Information Engineering degree from Hiroshima City University, Japan in 2010. He is now a doctoral student in the Graduate School of Information Science, Hiroshima City University.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Tamura, K., Watanuki, Y., Kitakami, H. et al. Multiple Buffering for Parallel Approximate Sequence Matching using Disk-based Suffix Tree on Multi-core CPU. GSTF J Comput 3, 22 (2013). https://doi.org/10.7603/s40601-013-0022-0

Download citation

Published: 28 February 2014
DOI: https://doi.org/10.7603/s40601-013-0022-0

Index Terms

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multiple Buffering for Parallel Approximate Sequence Matching using Disk-based Suffix Tree on Multi-core CPU

Abstract

Article PDF

Similar content being viewed by others

In-Place Suffix Sorting on a Multicore Computer with Better Design

Fast induced sorting suffixes on a multicore machine

Space-efficient computation of parallel approximate string matching

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Index Terms

Navigation

Multiple Buffering for Parallel Approximate Sequence Matching using Disk-based Suffix Tree on Multi-core CPU

Abstract

Article PDF

Similar content being viewed by others

In-Place Suffix Sorting on a Multicore Computer with Better Design

Fast induced sorting suffixes on a multicore machine

Space-efficient computation of parallel approximate string matching

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Index Terms

Search

Navigation