Abstract
The well-known Smith–Waterman (SW) algorithm is the most commonly used method for local sequence alignments, but its acceptance is limited by the computational requirements for large protein databases. Although the acceleration of SW has already been studied on many parallel platforms, there are hardly any studies which take advantage of the latest Intel architectures based on AVX-512 vector extensions. This SIMD set is currently supported by Intel’s Knights Landing (KNL) accelerator and Intel’s Skylake (SKL) general purpose processors. In this paper, we present an SW version that is optimized for both architectures: the renowned SWIMM 2.0. The novelty of this vector instruction set requires the revision of previous programming and optimization techniques. SWIMM 2.0 is based on a massive multi-threading and SIMD exploitation. It is competitive in terms of performance compared with other state-of-the-art implementations, reaching 511 GCUPS on a single KNL node and 734 GCUPS on a server equipped with a dual SKL processor. Moreover, these successful performance rates make SWIMM 2.0 the most efficient energy footprint implementation in this study achieving 2.94 GCUPS/Watts on the SKL processor.
Similar content being viewed by others
Notes
SWIMM 2.0 is available at https://github.com/enzorucci/SWIMM2.0.
SWIPE is available at public repository: https://github.com/torognes/swipe.
Parasail is available at public repository: https://github.com/jeffdaily/parasail.
libssa is available at public repository: https://github.com/RonnySoak/libssa.
FASTA format description: http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml.
Swiss-Prot: http://www.uniprot.org/downloads.
Environmental NR: ftp://ftp.ncbi.nih.gov/blast/db/FASTA/env_nr.gz.
TrEMBL: http://www.uniprot.org/downloads.
SSE4.1 and AVX2 versions using the QP technique were excluded from the analysis to improve figure readability since we found that the SP scheme always achieved the best performance, as in previous works [14].
We have discarded the comparison with the SWhybrid framework [15] because we detected inconsistent alignment results in most of the experiments.
The SSE4.1 and AVX2 versions using the QP technique were excluded from the analysis to improve figure readability since we found that the SP scheme always achieved the best performance, as in previous works [14].
Once again, we have discarded the comparison with the SWhybrid framework [15] because we detected inconsistent alignment results in most of the experiments.
References
Bender, E.: Big data in biomedicine: 4 big questions. Nature 527, S19 (2015)
Altschul, S.F., Madden, T.L., Schffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped Blast and PsiBlast: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389 (1997)
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85(8), 2444 (1988). https://doi.org/10.1073/pnas.85.8.2444
Sæbø, P.E., Andersen, S.M., Myrseth, J., Laerdahl, J.K., Rognes, T.: PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology. Nucleic Acids Res. 33(Suppl 2), W535 (2005)
Farrar, M.: Striped Smith–Waterman speeds database searches six time over other SIMD implementations. Bioinformatics 23(2), 156 (2007)
Rucci, E., García, C., Botella, G., De Giusti, A., Naiouf, M., Prieto-Matías, M.: State-of-the-Art in Smith–Waterman Protein Database Search on HPC Platforms, pp. 197–223. Springer, New York (2016). https://doi.org/10.1007/978-3-319-41279-5_6
Rognes, T.: Faster Smith–Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12(1), 221 (2011). https://doi.org/10.1186/1471-2105-12-221
Frielingsdorf, J.T.: Improving optimal sequence alignments through a simd-accelerated library. Master’s thesis, University of Oslo (2015)
Daily, J.: Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinform. 17, 81 (2016)
Liu, Y., Schmidt, B., Maskell, D.L.: CUDASW++2.0: enhanced Smith–Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC Res. Notes 3(1), 1 (2010). https://doi.org/10.1186/1756-0500-3-93
Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith–Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinform. 14, 117 (2013)
Liu, Y., Schmidt, B.: SWAPHI: Smith–Waterman protein database search on Xeon Phi coprocessors. In: 25th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2014) (2014)
Lan, H., Liu, W., Schmidt, B., Wang, B.: Accelerating large-scale biological database search on Xeon Phi-based neo-heterogeneous architectures. in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2015), pp. 503–510. https://doi.org/10.1109/BIBM.2015.7359735
Rucci, E., Garcia, C., Botella, G., De Giusti, A., Naiouf, M., Prieto-Matas, M.: An energy-aware performance analysis of SWIMM: Smith–Waterman implementation on Intel’s Multicore and Manycore architectures. Concurr. Comput. Pract. Exp. 27(18), 5517 (2015). https://doi.org/10.1002/cpe.3598
Lan, H., Liu, W., Liu, Y., Schmidt, B.: SWhybrid: a hybrid-parallel framework for large-scale protein sequence database search. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2017), pp. 42–51. https://doi.org/10.1109/IPDPS.2017.42
Isa, M., Benkrid, K., Clayton, T., Ling, C., Erdogan, A.: An FPGA-based parameterised and scalable optimal solutions for pairwise biological sequence analysis. In: Adaptive Hardware and Systems (AHS), 2011 NASA/ESA Conference on (2011), pp. 344–351. https://doi.org/10.1109/AHS.2011.5963957
Oliver, T.F., Schmidt, B., Maskell, D.L.: Reconfigurable architectures for bio-sequence database scanning on FPGAs. IEEE Trans. Circuits Syst. II Express Briefs 52(12), 851 (2005). https://doi.org/10.1109/TCSII.2005.853340
Li, T.I., Shum, W., Truong, K.: 160-fold acceleration of the Smith–Waterman algorithm using a field programmable gate array (FPGA). BMC Bioinform. 8, I85 (2007)
Rucci, E., Garcia, C., Botella, G., De Giusti, A., Naiouf, M., Prieto-Matas, M.: OSWALD: OpenCL Smith–Waterman algorithm on altera FPGA for large protein databases. J. High Perform. Comput. Appl, Int (2016). https://doi.org/10.1177/1094342016654215
Rucci, E., Garcia, C., Botella, G., De Giusti, A., Naiouf, M., Prieto-Matias, M.: First experiences accelerating Smith–Waterman on Intel’s Knights Landing processor. In: Ibrahim, S., Choo, K.K.R., Yan, Z., Pedrycz, W. (eds.) Algorithms and Architectures for Parallel Processing: 17th International Conference, ICA3PP 2017, Helsinki, Finland, August 21–23, 2017, Proceedings, pp. 569–579. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65482-9_42
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195 (1981)
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1981)
Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 36(2), 34 (2016). https://doi.org/10.1109/MM.2016.25
Asai, R.: MCDRAM as High-Bandidth Memory (HBM) in Knights Landing Processors: Developer’s Guide (2016). https://goparallel.sourceforge.net/wp-content/uploads/2016/05/Colfax_KNL_MCDRAM_Guide.pdf
Intel Corporation: Intel 64 and IA-32 Architectures Optimization Reference Manual (2017). https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf
Rognes, T., Seeberg, E.: Six-fold speed-up of Smith–Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699 (2000). https://doi.org/10.1093/bioinformatics/16.8.699
Acknowledgements
This work has been supported by the EU (FEDER) and the Spanish MINECO, under Grant TIN2015-65277-R and the CAPAP-H6 network (TIN2016-81840-REDT).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rucci, E., Garcia Sanchez, C., Botella Juan, G. et al. SWIMM 2.0: Enhanced Smith–Waterman on Intel’s Multicore and Manycore Architectures Based on AVX-512 Vector Extensions. Int J Parallel Prog 47, 296–316 (2019). https://doi.org/10.1007/s10766-018-0585-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-018-0585-7