Abstract
High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, <1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.
Similar content being viewed by others
References
Almeida CE, Pacheco RS, Haag K et al (2008) Inferring from the Cyt B gene the Triatoma brasiliensis Neiva, 1911 (Hemiptera: Reduviidae: Triatominae) genetic structure and domiciliary infestation in the state of Paraíba, Brazil. Am J Trop Med Hyg 78:791–802
Bai X, Mamidala P, Rajarapu SP et al (2011) Transcriptomics of the bed bug (Cimex lectularius). PLoS ONE 6:e16336. doi:10.1371/journal.pone.0016336
Bonen L (1993) Trans-splicing of pre-mRNA in plants, animals, and protists. FASEB J 7:40–46
Borges ÉC, Dujardin J-P, Schofield CJ et al (2005) Dynamics between sylvatic, peridomestic and domestic populations of Triatoma brasiliensis (Hemiptera: Reduviidae) in Ceará State, Northeastern Brazil. Acta Trop 93:119–126. doi:10.1016/j.actatropica.2004.10.002
Cahais V, Gayral P, Tsagkogeorga G et al (2012) Reference-free transcriptome assembly in non-model animals from next-generation sequencing data. Mol Ecol Resour 12:834–845. doi:10.1111/j.1755-0998.2012.03148.x
Chevreux B, Pfisterer T, Drescher B et al (2004) Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 14:1147–1159. doi:10.1101/gr.1917404
Conesa (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21
Costa J (1999) The synanthropic process of Chagas disease vectors in Brazil, with special attention to Triatoma brasiliensis Neiva, 1911 (Hemiptera, Reduviidae, Triatominae) population, genetical, ecological, and epidemiological aspects. Mem Inst Oswaldo Cruz 94:239–241. doi:10.1590/S0074-02761999000700038
Costa J, de Almeida JR, Britto C et al (1998) Ecotopes, natural infection and trophic resources of Triatoma brasiliensis (Hemiptera, Reduviidae, Triatominae). Mem Inst Oswaldo Cruz 93:7–13
Costa J, Almeida CE, Dotson EM et al (2003) The epidemiologic importance of Triatoma brasiliensis as a Chagas disease vector in Brazil: a revision of domiciliary captures during 1993-1999. Mem Inst Oswaldo Cruz 98:443–449. doi:10.1590/S0074-02762003000400002
Croucher PJ, Brewer MS, Winchell CJ et al (2013) de novo characterization of the gene-rich transcriptomes of two color-polymorphic spiders, Theridion grallator and T. californicum (Araneae: Theridiidae), with special reference to pigment genes. BMC Genom 14:862. doi:10.1186/1471-2164-14-862
Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158. doi:10.1093/bioinformatics/btr330
Development Core Team R (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Dias JCP, Silveira AC, Schofield CJ (2002) The impact of Chagas disease control in Latin America: a review. Mem Inst Oswaldo Cruz 97:603–612. doi:10.1590/S0074-02762002000500002
Ekblom R, Galindo J (2011) Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity 107:1–15. doi:10.1038/hdy.2010.152
Feldmeyer B, Wheat CW, Krezdorn N et al (2011) Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance. BMC Genom 12:317. doi:10.1186/1471-2164-12-317
Francis WR, Christianson LM, Kiko R et al (2013) A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly. BMC Genom 14:167. doi:10.1186/1471-2164-14-167
Glaser N, Gallot A, Legeai F et al (2013) Candidate chemosensory genes in the stemborer Sesamia nonagrioides. Int J Biol Sci 9:481–495. doi:10.7150/ijbs.6109
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. doi:10.1038/nbt.1883
Hashimoto K, Schofield CJ (2012) Elimination of Rhodnius prolixus in Central America. Parasit Vectors 5:45. doi:10.1186/1756-3305-5-45
Haznedaroglu BZ, Reeves D, Rismani-Yazdi H, Peccia J (2012) Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms. BMC Bioinform 13:170. doi:10.1186/1471-2105-13-170
Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res 9:868–877. doi:10.1101/gr.9.9.868
Hull JJ, Geib SM, Fabrick JA, Brent CS (2013) Sequencing and de novo assembly of the western tarnished plant bug (Lygus hesperus) transcriptome. PLoS ONE 8:e55105. doi:10.1371/journal.pone.0055105
Karatolos N, Pauchet Y, Wilkinson P et al (2011) Pyrosequencing the transcriptome of the greenhouse whitefly, Trialeurodes vaporariorum reveals multiple transcripts encoding insecticide targets and detoxifying enzymes. BMC Genom 12:56. doi:10.1186/1471-2164-12-56
Knudsen B, Knudsen T, Flensborg M et al (2007) CLC Genomics Workbench. Version 5:5
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. doi:10.1093/bioinformatics/btp324
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. doi:10.1093/bioinformatics/btl158
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. doi:10.1093/bioinformatics/btp352
Liu S, Chougule NP, Vijayendran D, Bonning BC (2012) Deep sequencing of the transcriptomes of Soybean aphid and associated endosymbionts. PLoS ONE 7:e45161. doi:10.1371/journal.pone.0045161
Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770. doi:10.1093/bioinformatics/btr011
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10–12
Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12:671–682. doi:10.1038/nrg3068
Martin J, Bruno VM, Fang Z et al (2010) Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genom 11:663. doi:10.1186/1471-2164-11-663
Mundry M, Bornberg-Bauer E, Sammeth M, Feulner PGD (2012) Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach. PLoS ONE 7:e31410. doi:10.1371/journal.pone.0031410
Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23:1061–1067. doi:10.1093/bioinformatics/btm071
Paszkiewicz K, Studholme DJ (2010) De novo assembly of short sequence reads. Brief Bioinform 11:457–472. doi:10.1093/bib/bbq020
Poivet E, Gallot A, Montagné N et al (2013) A comparison of the olfactory gene repertoires of adults and larvae in the noctuid moth Spodoptera littoralis. PLoS ONE 8:e60263. doi:10.1371/journal.pone.0060263
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. doi:10.1093/bioinformatics/btq033
Ren X, Liu T, Dong J et al (2012) Evaluating de Bruijn graph assemblers on 454 transcriptomic data. PLoS ONE 7:e51188. doi:10.1371/journal.pone.0051188
Robertson G, Schein J, Chiu R et al (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7:909–912. doi:10.1038/nmeth.1517
Sammeth M (2009) Complete alternative splicing events are bubbles in splicing graphs. J Comput Biol 16:1117–1140. doi:10.1089/cmb.2009.0108
Santos A, Ribeiro JMC, Lehane MJ et al (2007) The sialotranscriptome of the blood-sucking bug Triatoma brasiliensis (Hemiptera, Triatominae). Insect Biochem Mol Biol 37:702–712. doi:10.1016/j.ibmb.2007.04.004
Schmieder R, Edwards R (2011a) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27:863–864. doi:10.1093/bioinformatics/btr026
Schmieder R, Edwards R (2011b) Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS ONE 6:e17288. doi:10.1371/journal.pone.0017288
Schulz MH, Zerbino DR, Vingron M, Birney E (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086–1092. doi:10.1093/bioinformatics/bts094
Sengupta S, Bolin JM, Ruotti V et al (2011) Single read and paired end mRNA-seq Illumina libraries from 10 nanograms total RNA. J Vis Exp. doi:10.3791/3340
Stapley J, Reger J, Feulner PGD et al (2010) Adaptation genomics: the next generation. Trends Ecol Evol 25:705–712. doi:10.1016/j.tree.2010.09.002
Surget-Groba Y, Montoya-Burgos JI (2010) Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res 20:1432–1440. doi:10.1101/gr.103846.109
Vijay N, Poelstra JW, Künstner A, Wolf JBW (2013) Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol Ecol 22:620–634. doi:10.1111/mec.12014
Werner T (2010) Next generation sequencing in functional genomics. Brief Bioinform 11:499–511. doi:10.1093/bib/bbq018
Xie Y, Wu G, Tang J, et al. (2013) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. ArXiv13056760 Q-Bio
Zhao Q-Y, Wang Y, Kong Y-M et al (2011) Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinform 12:S2. doi:10.1186/1471-2105-12-S14-S2
Acknowledgments
We would like to thank Rachel Legendre and Claire Toffano of Institut de Génétique et Microbiologie CNRS - UMR 8621 who gave us the script for 454 contig correction. We thank Marie-Christine François (iEES, INRA Versailles, France) for help with the T. brasiliensis RNA extractions. The authors are also very grateful to the engineers of the bioinformatics platforms Genouest at the University of Rennes 1 and eBio of the University Paris Sud for technical support. This work has benefited from the facilities and expertise of the HTS platform of IMAGIF (Centre de Recherche de Gif - www.imagif.cnrs.fr. This study was funded by the French Agence Nationale de la Recherche (ADAPTANTHROP project, ANR-097-PEXT-009) and supported by the labex Biodiversité, Agroécosystèmes, Société, Climat (BASC; University Paris Saclay, France). Marchant A. was funded by the Idex Paris Saclay, France.
Conflict of interest
The authors announce that they have not a financial relationship with the organization that sponsored the research. The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding authors
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Marchant, A., Mougel, F., Almeida, C. et al. De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease. Genetica 143, 225–239 (2015). https://doi.org/10.1007/s10709-014-9790-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10709-014-9790-5