De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease

Marchant, A.; Mougel, F.; Almeida, C.; Jacquin-Joly, E.; Costa, J.; Harry, M.

doi:10.1007/s10709-014-9790-5

De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease

Published: 19 September 2014

Volume 143, pages 225–239, (2015)
Cite this article

Genetica Aims and scope Submit manuscript

A. Marchant^1,2,
F. Mougel^1,2,
C. Almeida³,
E. Jacquin-Joly⁴,
J. Costa⁵ &
…
M. Harry^1,2

959 Accesses
19 Citations
1 Altmetric
Explore all metrics

Abstract

High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, <1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DNA barcoding, an effective tool for species identification: a review

Article 29 October 2022

Sandeep Antil, Jeeva Susan Abraham, … Ravi Toteja

A practical guide to amplicon and metagenomic analysis of microbiome data

Article Open access 11 May 2020

Yong-Xin Liu, Yuan Qin, … Yang Bai

Reduction of bitter taste receptor gene family in folivorous colobine primates relative to omnivorous cercopithecine primates

Article Open access 11 April 2024

Min Hou, Muhammad Shoaib Akhtar, … Shoji Kawamura

References

Almeida CE, Pacheco RS, Haag K et al (2008) Inferring from the Cyt B gene the Triatoma brasiliensis Neiva, 1911 (Hemiptera: Reduviidae: Triatominae) genetic structure and domiciliary infestation in the state of Paraíba, Brazil. Am J Trop Med Hyg 78:791–802
CAS PubMed Google Scholar
Bai X, Mamidala P, Rajarapu SP et al (2011) Transcriptomics of the bed bug (Cimex lectularius). PLoS ONE 6:e16336. doi:10.1371/journal.pone.0016336
Article PubMed Central CAS PubMed Google Scholar
Bonen L (1993) Trans-splicing of pre-mRNA in plants, animals, and protists. FASEB J 7:40–46
CAS PubMed Google Scholar
Borges ÉC, Dujardin J-P, Schofield CJ et al (2005) Dynamics between sylvatic, peridomestic and domestic populations of Triatoma brasiliensis (Hemiptera: Reduviidae) in Ceará State, Northeastern Brazil. Acta Trop 93:119–126. doi:10.1016/j.actatropica.2004.10.002
Article PubMed Google Scholar
Cahais V, Gayral P, Tsagkogeorga G et al (2012) Reference-free transcriptome assembly in non-model animals from next-generation sequencing data. Mol Ecol Resour 12:834–845. doi:10.1111/j.1755-0998.2012.03148.x
Article CAS PubMed Google Scholar
Chevreux B, Pfisterer T, Drescher B et al (2004) Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 14:1147–1159. doi:10.1101/gr.1917404
Article PubMed Central CAS PubMed Google Scholar
Conesa (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21
Costa J (1999) The synanthropic process of Chagas disease vectors in Brazil, with special attention to Triatoma brasiliensis Neiva, 1911 (Hemiptera, Reduviidae, Triatominae) population, genetical, ecological, and epidemiological aspects. Mem Inst Oswaldo Cruz 94:239–241. doi:10.1590/S0074-02761999000700038
Article PubMed Google Scholar
Costa J, de Almeida JR, Britto C et al (1998) Ecotopes, natural infection and trophic resources of Triatoma brasiliensis (Hemiptera, Reduviidae, Triatominae). Mem Inst Oswaldo Cruz 93:7–13
Article CAS PubMed Google Scholar
Costa J, Almeida CE, Dotson EM et al (2003) The epidemiologic importance of Triatoma brasiliensis as a Chagas disease vector in Brazil: a revision of domiciliary captures during 1993-1999. Mem Inst Oswaldo Cruz 98:443–449. doi:10.1590/S0074-02762003000400002
Article PubMed Google Scholar
Croucher PJ, Brewer MS, Winchell CJ et al (2013) de novo characterization of the gene-rich transcriptomes of two color-polymorphic spiders, Theridion grallator and T. californicum (Araneae: Theridiidae), with special reference to pigment genes. BMC Genom 14:862. doi:10.1186/1471-2164-14-862
Article Google Scholar
Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158. doi:10.1093/bioinformatics/btr330
Article PubMed Central CAS PubMed Google Scholar
Development Core Team R (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Google Scholar
Dias JCP, Silveira AC, Schofield CJ (2002) The impact of Chagas disease control in Latin America: a review. Mem Inst Oswaldo Cruz 97:603–612. doi:10.1590/S0074-02762002000500002
Article CAS PubMed Google Scholar
Ekblom R, Galindo J (2011) Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity 107:1–15. doi:10.1038/hdy.2010.152
Article PubMed Central CAS PubMed Google Scholar
Feldmeyer B, Wheat CW, Krezdorn N et al (2011) Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance. BMC Genom 12:317. doi:10.1186/1471-2164-12-317
Article Google Scholar
Francis WR, Christianson LM, Kiko R et al (2013) A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly. BMC Genom 14:167. doi:10.1186/1471-2164-14-167
Article CAS Google Scholar
Glaser N, Gallot A, Legeai F et al (2013) Candidate chemosensory genes in the stemborer Sesamia nonagrioides. Int J Biol Sci 9:481–495. doi:10.7150/ijbs.6109
Article PubMed Central PubMed Google Scholar
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. doi:10.1038/nbt.1883
Article PubMed Central CAS PubMed Google Scholar
Hashimoto K, Schofield CJ (2012) Elimination of Rhodnius prolixus in Central America. Parasit Vectors 5:45. doi:10.1186/1756-3305-5-45
Article PubMed Central PubMed Google Scholar
Haznedaroglu BZ, Reeves D, Rismani-Yazdi H, Peccia J (2012) Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms. BMC Bioinform 13:170. doi:10.1186/1471-2105-13-170
Article Google Scholar
Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res 9:868–877. doi:10.1101/gr.9.9.868
Article PubMed Central CAS PubMed Google Scholar
Hull JJ, Geib SM, Fabrick JA, Brent CS (2013) Sequencing and de novo assembly of the western tarnished plant bug (Lygus hesperus) transcriptome. PLoS ONE 8:e55105. doi:10.1371/journal.pone.0055105
Article PubMed Central PubMed Google Scholar
Karatolos N, Pauchet Y, Wilkinson P et al (2011) Pyrosequencing the transcriptome of the greenhouse whitefly, Trialeurodes vaporariorum reveals multiple transcripts encoding insecticide targets and detoxifying enzymes. BMC Genom 12:56. doi:10.1186/1471-2164-12-56
Article CAS Google Scholar
Knudsen B, Knudsen T, Flensborg M et al (2007) CLC Genomics Workbench. Version 5:5
Google Scholar
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. doi:10.1093/bioinformatics/btp324
Article PubMed Central CAS PubMed Google Scholar
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. doi:10.1093/bioinformatics/btl158
Article CAS PubMed Google Scholar
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. doi:10.1093/bioinformatics/btp352
Article PubMed Central PubMed Google Scholar
Liu S, Chougule NP, Vijayendran D, Bonning BC (2012) Deep sequencing of the transcriptomes of Soybean aphid and associated endosymbionts. PLoS ONE 7:e45161. doi:10.1371/journal.pone.0045161
Article PubMed Central CAS PubMed Google Scholar
Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770. doi:10.1093/bioinformatics/btr011
Article PubMed Central PubMed Google Scholar
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10–12
Article Google Scholar
Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12:671–682. doi:10.1038/nrg3068
Article CAS PubMed Google Scholar
Martin J, Bruno VM, Fang Z et al (2010) Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genom 11:663. doi:10.1186/1471-2164-11-663
Article CAS Google Scholar
Mundry M, Bornberg-Bauer E, Sammeth M, Feulner PGD (2012) Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach. PLoS ONE 7:e31410. doi:10.1371/journal.pone.0031410
Article PubMed Central CAS PubMed Google Scholar
Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23:1061–1067. doi:10.1093/bioinformatics/btm071
Article CAS PubMed Google Scholar
Paszkiewicz K, Studholme DJ (2010) De novo assembly of short sequence reads. Brief Bioinform 11:457–472. doi:10.1093/bib/bbq020
Article CAS PubMed Google Scholar
Poivet E, Gallot A, Montagné N et al (2013) A comparison of the olfactory gene repertoires of adults and larvae in the noctuid moth Spodoptera littoralis. PLoS ONE 8:e60263. doi:10.1371/journal.pone.0060263
Article PubMed Central CAS PubMed Google Scholar
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. doi:10.1093/bioinformatics/btq033
Article PubMed Central CAS PubMed Google Scholar
Ren X, Liu T, Dong J et al (2012) Evaluating de Bruijn graph assemblers on 454 transcriptomic data. PLoS ONE 7:e51188. doi:10.1371/journal.pone.0051188
Article PubMed Central CAS PubMed Google Scholar
Robertson G, Schein J, Chiu R et al (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7:909–912. doi:10.1038/nmeth.1517
Article CAS PubMed Google Scholar
Sammeth M (2009) Complete alternative splicing events are bubbles in splicing graphs. J Comput Biol 16:1117–1140. doi:10.1089/cmb.2009.0108
Article CAS PubMed Google Scholar
Santos A, Ribeiro JMC, Lehane MJ et al (2007) The sialotranscriptome of the blood-sucking bug Triatoma brasiliensis (Hemiptera, Triatominae). Insect Biochem Mol Biol 37:702–712. doi:10.1016/j.ibmb.2007.04.004
Article PubMed Central CAS PubMed Google Scholar
Schmieder R, Edwards R (2011a) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27:863–864. doi:10.1093/bioinformatics/btr026
Article PubMed Central CAS PubMed Google Scholar
Schmieder R, Edwards R (2011b) Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS ONE 6:e17288. doi:10.1371/journal.pone.0017288
Article PubMed Central CAS PubMed Google Scholar
Schulz MH, Zerbino DR, Vingron M, Birney E (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086–1092. doi:10.1093/bioinformatics/bts094
Article PubMed Central CAS PubMed Google Scholar
Sengupta S, Bolin JM, Ruotti V et al (2011) Single read and paired end mRNA-seq Illumina libraries from 10 nanograms total RNA. J Vis Exp. doi:10.3791/3340
Google Scholar
Stapley J, Reger J, Feulner PGD et al (2010) Adaptation genomics: the next generation. Trends Ecol Evol 25:705–712. doi:10.1016/j.tree.2010.09.002
Article PubMed Google Scholar
Surget-Groba Y, Montoya-Burgos JI (2010) Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res 20:1432–1440. doi:10.1101/gr.103846.109
Article PubMed Central CAS PubMed Google Scholar
Vijay N, Poelstra JW, Künstner A, Wolf JBW (2013) Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol Ecol 22:620–634. doi:10.1111/mec.12014
Article CAS PubMed Google Scholar
Werner T (2010) Next generation sequencing in functional genomics. Brief Bioinform 11:499–511. doi:10.1093/bib/bbq018
Article CAS PubMed Google Scholar
Xie Y, Wu G, Tang J, et al. (2013) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. ArXiv13056760 Q-Bio
Zhao Q-Y, Wang Y, Kong Y-M et al (2011) Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinform 12:S2. doi:10.1186/1471-2105-12-S14-S2
Article CAS Google Scholar

Download references

Acknowledgments

We would like to thank Rachel Legendre and Claire Toffano of Institut de Génétique et Microbiologie CNRS - UMR 8621 who gave us the script for 454 contig correction. We thank Marie-Christine François (iEES, INRA Versailles, France) for help with the T. brasiliensis RNA extractions. The authors are also very grateful to the engineers of the bioinformatics platforms Genouest at the University of Rennes 1 and eBio of the University Paris Sud for technical support. This work has benefited from the facilities and expertise of the HTS platform of IMAGIF (Centre de Recherche de Gif - www.imagif.cnrs.fr. This study was funded by the French Agence Nationale de la Recherche (ADAPTANTHROP project, ANR-097-PEXT-009) and supported by the labex Biodiversité, Agroécosystèmes, Société, Climat (BASC; University Paris Saclay, France). Marchant A. was funded by the Idex Paris Saclay, France.

Conflict of interest

The authors announce that they have not a financial relationship with the organization that sponsored the research. The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Laboratoire Evolution, Génomes et Spéciation LEGS, UPR 9034, CNRS, Avenue de la Terrasse, Bâtiment 13, BP1, 91198, Gif-sur-Yvette, France
A. Marchant, F. Mougel & M. Harry
Université Paris Sud, Orsay, France
A. Marchant, F. Mougel & M. Harry
Departamento de Ciências Biológicas, Faculdade de Ciências Farmacêuticas, UNESP, Araraquara, SP, Brazil
C. Almeida
INRA, UMR 1392, Institut d’Ecologie et des Sciences de l’Environnement de Paris, Route de Saint Cyr, 78026, Versailles, France
E. Jacquin-Joly
Laboratório de Biodiversidade Entomológica, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil
J. Costa

Authors

A. Marchant
View author publications
You can also search for this author in PubMed Google Scholar
F. Mougel
View author publications
You can also search for this author in PubMed Google Scholar
C. Almeida
View author publications
You can also search for this author in PubMed Google Scholar
E. Jacquin-Joly
View author publications
You can also search for this author in PubMed Google Scholar
J. Costa
View author publications
You can also search for this author in PubMed Google Scholar
M. Harry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to A. Marchant or M. Harry.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (TAR 329 kb)

Supplementary material 2 (TAR 282 kb)

Supplementary material 3 (TAR 266 kb)

Supplementary material 4 (TIFF 13369 kb)

Supplementary material 5 (DOCX 12 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marchant, A., Mougel, F., Almeida, C. et al. De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease. Genetica 143, 225–239 (2015). https://doi.org/10.1007/s10709-014-9790-5

Download citation

Received: 30 April 2014
Accepted: 01 September 2014
Published: 19 September 2014
Issue Date: April 2015
DOI: https://doi.org/10.1007/s10709-014-9790-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease

Abstract

Access this article

Similar content being viewed by others

DNA barcoding, an effective tool for species identification: a review

A practical guide to amplicon and metagenomic analysis of microbiome data

Reduction of bitter taste receptor gene family in folivorous colobine primates relative to omnivorous cercopithecine primates

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding authors

Electronic supplementary material

Supplementary material 1 (TAR 329 kb)

Supplementary material 2 (TAR 282 kb)

Supplementary material 3 (TAR 266 kb)

Supplementary material 4 (TIFF 13369 kb)

Supplementary material 5 (DOCX 12 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease

Abstract

Access this article

Similar content being viewed by others

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding authors

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation