Skip to main content

A Linear Time Approximation Algorithm for the DCJ Distance for Genomes with Bounded Number of Duplicates

  • Conference paper
  • First Online:
Algorithms in Bioinformatics (WABI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9838))

Included in the following conference series:

Abstract

Rearrangements are large-scale mutations in genomes, responsible for complex changes and structural variations. Most rearrangements that modify the organization of a genome can be represented by the double cut and join (DCJ) operation. Given two genomes with the same content, so that we have exactly the same number of copies of each gene in each genome, we are interested in the problem of computing the rearrangement distance between them, i.e., finding the minimum number of DCJ operations that transform one genome into the other. We propose a linear time approximation algorithm with approximation factor O(k) for the DCJ distance problem, where k is the maximum number of duplicates of any gene in the input genomes. Our algorithm uses as an intermediate step an O(k)-approximation for the minimum common string partition problem, which is closely related to the DCJ distance problem. Experiments on simulated data sets show that the algorithm is very competitive both in efficiency and quality of the solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes. J. Comput. Biol. 15(8), 1093–1115 (2008)

    Article  MathSciNet  Google Scholar 

  2. Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1), 19–53 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Angibaud, S., Fertin, G., Rusu, I., Vialette, S.: A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates. J. Comput. Biol. 14(4), 379–393 (2007)

    Article  MathSciNet  Google Scholar 

  4. Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 163–173. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Braga, M.D.V., Stoye, J.: The solution space of sorting by DCJ. J. Comp. Biol. 17(9), 1145–1165 (2010)

    Article  MathSciNet  Google Scholar 

  6. Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 207–211. Kluwer Academic Publishers, Dortrecht (2000)

    Chapter  Google Scholar 

  7. Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings of IEEE/FOCS 1997, pp. 137–143 (1997)

    Google Scholar 

  8. Goldstein, A., Kolman, P., Zheng, J.: Minimum common string partition problem: hardness and approximations. Eletron. J. Comb. 12, 18 (2005). R50

    MathSciNet  MATH  Google Scholar 

  9. Hannenhalli, S., Pevzner, P.: Transforming men into mice (polynomial algorithm for genomic distance problem). In: Proceedings of FOCS 1995, pp. 581–592 (1995)

    Google Scholar 

  10. Jiang, H., Zheng, C., Sankoff, D., Zhu, B.: Scaffold filling under the breakpoint distance. In: Tannier, E. (ed.) RECOMB-CG 2010. LNCS, vol. 6398, pp. 83–92. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Kolman, P., Waleń, T.: Reversal distance for strings with duplicates: linear time approximation using hitting set. Electron. J. Comb. 14(1), R50 (2007)

    MathSciNet  MATH  Google Scholar 

  12. Shao, M., Lin, Y.: Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion. BMC Bioinform. 13(Suppl 19), S13 (2012)

    Article  Google Scholar 

  13. Shao, M., Lin, Y., Moret, B.: An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. J. Comput. Biol. 22(5), 425–435 (2015)

    Article  MathSciNet  Google Scholar 

  14. Swenson, K., Marron, M., Earnest-DeYong, K., Moret, B.M.E.: Approximating the true evolutionary distance between two genomes. In: Proceedings of ALENEX/ANALCO 2005, pp. 121–129 (2005)

    Google Scholar 

  15. Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchanges. Bioinformatics 21(16), 3340–3346 (2005)

    Article  Google Scholar 

  16. Yancopoulos, S., Friedberg, R.: DCJ path formulation for genome transformations which include insertions, deletions, and duplications. J. Comput. Biol. 16(10), 1311–1338 (2009)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fábio V. Martinez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Rubert, D.P., Feijão, P., Braga, M.D.V., Stoye, J., Martinez, F.V. (2016). A Linear Time Approximation Algorithm for the DCJ Distance for Genomes with Bounded Number of Duplicates. In: Frith, M., Storm Pedersen, C. (eds) Algorithms in Bioinformatics. WABI 2016. Lecture Notes in Computer Science(), vol 9838. Springer, Cham. https://doi.org/10.1007/978-3-319-43681-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43681-4_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43680-7

  • Online ISBN: 978-3-319-43681-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics