Abstract
Gene annotation is essential for genome-based studies. However, algorithm-based genome annotation is difficult to fully and correctly reveal genomic information, especially for species with complex genomes. Artemisia annua L. is the only commercial resource of artemisinin production though the content of artemisinin is still to be improved. Genome-based genetic modification and breeding are useful strategies to boost artemisinin content and therefore, ensure the supply of artemisinin and reduce costs, but better gene annotation is urgently needed. In this study, we manually corrected the newly released genome annotation of A. annua using second- and third-generation transcriptome data. We found that incorrect gene information may lead to differences in structural, functional, and expression levels compared to the original expectations. We also identified alternative splicing events and found that genome annotation information impacted identifying alternative splicing genes. We further demonstrated that genome annotation information and alternative splicing could affect gene expression estimation and gene function prediction. Finally, we provided a valuable version of A. annua genome annotation and demonstrated the importance of gene annotation in future research.
Similar content being viewed by others
Data availability
The raw sequencing datasets of A. annua were downloaded from the NCBI Sequence Read Archive under the accession number PRJNA752933. The genome data were downloaded from NCBI under the accession number PRJNA416223 and http://www.gpgenome.com/species/92. The corrected genome annotation file can be downloaded from https://github.com/liuzy2008/Artemisia_annua_annotation2023.
References
Au KF, Sebastiano V, Afshar PT et al (2013) Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci 110:E4821–E4830. https://doi.org/10.1073/pnas.1320101110
Baralle FE, Giudice J (2017) Alternative splicing as a regulator of development and tissue identity. Nat Rev Mol Cell Biol 18:437–451. https://doi.org/10.1038/nrm.2017.27
Brisibe EA, Uyoh EA, Brisibe F et al (2008) Building a golden triangle for the production and use of artemisinin derivatives against falciparum malaria in Africa. Afr J Biotechnol. https://doi.org/10.4314/ajb.v7i25.59696
Campbell MS, Holt C, Moore B, Yandell M (2014a) Genome annotation and curation using MAKER and MAKER-P. Curr Protoc Bioinform 48:4.11.1-4.11.39. https://doi.org/10.1002/0471250953.bi0411s48
Campbell MS, Law M, Holt C et al (2014b) MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol 164:513–524. https://doi.org/10.1104/pp.113.230144
Carver T, Harris SR, Berriman M et al (2012) Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics 28:464–469. https://doi.org/10.1093/bioinformatics/btr703
Chen C, Chen H, Zhang Y et al (2020) TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol Plant 13:1194–1202. https://doi.org/10.1016/j.molp.2020.06.009
Chen C, Li J, Feng J et al (2021) sRNAanno—a database repository of uniformly annotated small RNAs in plants. Hortic Res 8:1–8. https://doi.org/10.1038/s41438-021-00480-8
Chen S, Li Z, Zhang S et al (2022) Emerging biotechnology applications in natural product and synthetic pharmaceutical analyses. Acta Pharm Sin B 12:4075–4097. https://doi.org/10.1016/j.apsb.2022.08.025
Chen H, Guo M, Dong S et al (2023a) A chromosome-scale genome assembly of Artemisia argyi reveals unbiased subgenome evolution and key contributions of gene duplication to volatile terpenoid diversity. Plant Commun. https://doi.org/10.1016/j.xplc.2023.100516
Chen W, Liu X, Zhang S, Chen S (2023b) Artificial intelligence for drug discovery: resources, methods, and applications. Mol Ther Nucleic Acids 31:691–702. https://doi.org/10.1016/j.omtn.2023.02.019
Chen X, Yang Z, Xu Y et al (2023c) Progress and prediction of multicomponent quantification in complex systems with practical LC-UV methods. J Pharm Anal 13:142–155. https://doi.org/10.1016/j.jpha.2022.11.011
Cheng C-Y, Krishnakumar V, Chan AP et al (2017) Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J 89:789–804. https://doi.org/10.1111/tpj.13415
Claros MG, Bautista R, Guerrero-Fernández D et al (2012) Why assembling plant genome sequences is so challenging. Biology 1:439–459. https://doi.org/10.3390/biology1020439
Dunn NA, Unni DR, Diesh C et al (2019) Apollo: democratizing genome annotation. PLOS Comput Biol 15:e1006790. https://doi.org/10.1371/journal.pcbi.1006790
Edger PP, VanBuren R, Colle M et al (2018) Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity. GigaScience. https://doi.org/10.1093/gigascience/gix124
Foissac S, Sammeth M (2007) ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic Acids Res 35:W297–W299. https://doi.org/10.1093/nar/gkm311
Frosch AE, Thielen BK, Alpern JD et al (2022) Antimalarial chemoprophylaxis and treatment in the USA: limited access and extreme price variability. J Travel Med. https://doi.org/10.1093/jtm/taab117
Fu S, Wang A, Au KF (2019) A comparative evaluation of hybrid error correction methods for error-prone long reads. Genome Biol 20:26. https://doi.org/10.1186/s13059-018-1605-z
Gao H, Li F, Xu Z et al (2019) Genome-wide analysis of methyl jasmonate-regulated isoform expression in the medicinal plant Andrographis paniculata. Ind Crops Prod 135:39–48. https://doi.org/10.1016/j.indcrop.2019.04.023
Goudey B, Geard N, Verspoor K, Zobel J (2022) Propagation, detection and correction of errors using the sequence database network. Brief Bioinform. https://doi.org/10.1093/bib/bbac416
Guo M, Zhang Y, Jia X et al (2022) Alternative splicing of REGULATOR OF LEAF INCLINATION 1 modulates phosphate starvation signaling and growth in plants. Plant Cell 34:3319–3338. https://doi.org/10.1093/plcell/koac161
Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075. https://doi.org/10.1093/bioinformatics/btt086
Hoff K, Stanke M (2015) Current methods for automated annotation of protein-coding genes. Curr Opin Insect Sci 7:8–14. https://doi.org/10.1016/j.cois.2015.02.008
Hoff KJ, Lomsadze A, Borodovsky M, Stanke M (2019) Whole-genome annotation with BRAKER. In: Kollmar M (ed) Gene prediction: methods and protocols. Springer, New York, pp 65–95
Hu G, Feng J, Xiang X et al (2022) Two divergent haplotypes from a highly heterozygous lychee genome suggest independent domestication events for early and late-maturing cultivars. Nat Genet 54:73–83. https://doi.org/10.1038/s41588-021-00971-3
Ishino Y, Okada H, Ikeuchi M, Taniguchi H (2007) Mass spectrometry-based prokaryote gene annotation. Proteomics 7:4053–4065. https://doi.org/10.1002/pmic.200700080
Jagannathan P, Kakuru A (2022) Malaria in 2022: increasing challenges, cautious optimism. Nat Commun 13:2678. https://doi.org/10.1038/s41467-022-30133-w
Jiao W-B, Schneeberger K (2017) The impact of third generation genomic technologies on plant genome assembly. Curr Opin Plant Biol 36:64–70. https://doi.org/10.1016/j.pbi.2017.02.002
Kashkan I, Timofeyenko K, Růžička K (2022) How alternative splicing changes the properties of plant proteins. Quant Plant Biol 3:e14. https://doi.org/10.1017/qpb.2022.9
Kelkar DS, Provost E, Chaerkady R et al (2014) Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis. Mol Cell Proteom 13:3184–3198. https://doi.org/10.1074/mcp.M114.038299
Klayman DL (1993) Artemisia annua: From weed to respectable antimalarial plant. In: Kinghorn AD, Balandrin MF (eds) Human medicinal agents from plants. Am Chem Soc Symp Ser. Washington, DC, pp 242–255. https://pubs.acs.org/doi/abs/10.1021/bk-1993-0534.ch017
Klimke W, O’Donovan C, White O et al (2011) Solving the problem: genome annotation standards before the data deluge. Stand Genom Sci 5:168–193. https://doi.org/10.4056/sigs.2084864
Kufel J, Diachenko N, Golisz A (2022) Alternative splicing as a key player in the fine-tuning of the immunity response in Arabidopsis. Mol Plant Pathol 23:1226–1238. https://doi.org/10.1111/mpp.13228
Lee BJ, Weyers M, Haynes RK, van der Kooy F (2023) Discovery of artemisinin in Artemisia annua, its current production, and relevance to sub-Saharan Africa. S Afr J Bot 153:21–27. https://doi.org/10.1016/j.sajb.2022.12.017
Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100. https://doi.org/10.1093/bioinformatics/bty191
Liao B, Hu H, Xiao S et al (2022a) Global pharmacopoeia genome database is an integrated and mineable genomic database for traditional medicines derived from eight international pharmacopoeias. Sci China Life Sci 65:809–817. https://doi.org/10.1007/s11427-021-1968-7
Liao B, Shen X, Xiang L et al (2022b) Allele-aware chromosome-level genome assembly of Artemisia annua reveals the correlation between ADS expansion and artemisinin yield. Mol Plant. https://doi.org/10.1016/j.molp.2022.05.013
Ma T, Gao H, Zhang D et al (2021) Genome-wide analysis of light-regulated alternative splicing in Artemisia annua L. Front Plant Sci. https://doi.org/10.3389/fpls.2021.733505
Markowitz VM, Mavromatis K, Ivanova NN et al (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25:2271–2278. https://doi.org/10.1093/bioinformatics/btp393
Marquez Y, Brown JWS, Simpson C et al (2012) Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res 22:1184–1195. https://doi.org/10.1101/gr.134106.111
Masimbi O, Schurer JM, Rafferty E et al (2022) A cost analysis of the diagnosis and treatment of malaria at public health facilities and communities in three districts in Rwanda. Malar J 21:150. https://doi.org/10.1186/s12936-022-04158-x
Min B, Grigoriev IV, Choi I-G (2017) FunGAP: fungal genome annotation pipeline using evidence-based gene model evaluation. Bioinformatics 33:2936–2937. https://doi.org/10.1093/bioinformatics/btx353
Nurk S, Koren S, Rhie A et al (2022) The complete sequence of a human genome. Science 376:44–53. https://doi.org/10.1126/science.abj6987
Ouzounis CA, Karp PD (2002) The past, present and future of genome-wide re-annotation. Genome Biol. https://doi.org/10.1186/gb-2002-3-2-comment2001
Pertea M, Kim D, Pertea G et al (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11:1650–1667. https://doi.org/10.1038/nprot.2016.095
Pilkington SM, Crowhurst R, Hilario E et al (2018) A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants. BMC Genom 19:257. https://doi.org/10.1186/s12864-018-4656-3
Rodríguez-Ortega MJ, Luque I, Tarradas C, Bárcena JA (2008) Overcoming function annotation errors in the Gram-positive pathogen Streptococcus suis by a proteomics-driven approach. BMC Genom 9:588. https://doi.org/10.1186/1471-2164-9-588
Rosenbloom KR, Armstrong J, Barber GP et al (2015) The UCSC Genome Browser database: 2015 update. Nucleic Acids Res 43:D670–D681. https://doi.org/10.1093/nar/gku1177
Shen S, Park JW, Lu Z et al (2014) rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc Natl Acad Sci 111:E5593–E5601. https://doi.org/10.1073/pnas.1419161111
Shen X, Wu M, Liao B et al (2017) Complete chloroplast genome sequence and phylogenetic analysis of the medicinal plant Artemisia annua. Molecules 22:1330. https://doi.org/10.3390/molecules22081330
Simão FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212. https://doi.org/10.1093/bioinformatics/btv351
Stanke M, Keller O, Gunduz I et al (2006) AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34:W435–W439. https://doi.org/10.1093/nar/gkl200
Steward CA, Parker APJ, Minassian BA et al (2017) Genome annotation for clinical genomic diagnostics: strengths and weaknesses. Genome Med 9:49. https://doi.org/10.1186/s13073-017-0441-1
Sun W, Xu Z, Song C, Chen S (2022) Herbgenomics: decipher molecular genetics of medicinal plants. Innovation. https://doi.org/10.1016/j.xinn.2022.100322
Sun S, Shen X, Li Y et al (2023) Single-cell RNA sequencing provides a high-resolution roadmap for understanding the multicellular compartmentation of specialized metabolism. Nat Plants 9:179–190. https://doi.org/10.1038/s41477-022-01291-y
Syed NH, Kalyna M, Marquez Y et al (2012) Alternative splicing in plants—coming of age. Trends Plant Sci 17:616–623. https://doi.org/10.1016/j.tplants.2012.06.001
Syme RA, Tan K-C, Hane JK et al (2016) Comprehensive annotation of the parastagonospora nodorum reference genome using next-generation genomics, transcriptomics and proteogenomics. PLoS One 11:e0147221. https://doi.org/10.1371/journal.pone.0147221
Wilbrandt J, Misof B, Panfilio KA, Niehuis O (2019) Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models. BMC Genom 20:753. https://doi.org/10.1186/s12864-019-6064-8
World Health Organization (2021) World malaria report 2021. World Health Organization, Geneva
Yan H, Sun M, Zhang Z et al (2023) Pangenomic analysis identifies structural variation associated with heat tolerance in pearl millet. Nat Genet. https://doi.org/10.1038/s41588-023-01302-4
Zhang H, Zhu J, Gong Z, Zhu J-K (2022) Abiotic stress responses in plants. Nat Rev Genet 23:104–119. https://doi.org/10.1038/s41576-021-00413-0
Zheng H, Fu X, Shao J et al (2023) Transcriptional regulatory network of high-value active ingredients in medicinal plants. Trends Plant Sci 28:429–446. https://doi.org/10.1016/j.tplants.2022.12.007
Zhou L, Huang Y, Wang Q, Guo D (2021) AaHY5 ChIP-seq based on transient expression system reveals the role of AaWRKY14 in artemisinin biosynthetic gene regulation. Plant Physiol Biochem 168:321–328. https://doi.org/10.1016/j.plaphy.2021.10.010
Acknowledgements
We appreciate Dr. Chengjie Chen's support in using IGV-GSAman during the research process.
Funding
This work is supported by the National Key Research and Development Program (2019YFE0108700, China), the National Natural Science Foundation of China (U1812403-1), Scientific Research Project of Hainan Academician Innovation Platform (SQ2021PTZ0052; YSPTZX202137).
Author information
Authors and Affiliations
Contributions
SC, LL supervised the study. LL, ZL, and ZS collected the data. YD, ZS, ZB, ZY, YL, HZ, RY, and SK manually corrected the genome annotation. ZL and YD analyzed the data. LX and DQ provided all plant materials. LX designed all validation experiments. BC and YS helped conduct the verification experiment. ZL, YD, ZS, LX, HW, and BC wrote the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Communicated by Anastasios Melis.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Z., Du, Y., Sun, Z. et al. Manual correction of genome annotation improved alternative splicing identification of Artemisia annua. Planta 258, 83 (2023). https://doi.org/10.1007/s00425-023-04237-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00425-023-04237-6