Colletotrichum chrysophilum (Ascomycota, Sordariomycetes, Glomerellaceae) is a species belonging to the C. gloeosporioides complex. Described in 2017 as responsible for anthracnose on Musa acuminata (banana plants; Vieira et al. 2017), C. chrysophilum has been associated with Persea americana (avocado) and Prunus persica (peach) (Talhinhas and Baroncelli 2021). Moreover, together with Colletotrichum fructicola and C. noveboracense, it is considered one of the major causal agents of Glomerella leaf spot (GLS) and Apple bitter rot (ABR) diseases on Malus domestica (apple) (Astolfi et al. 2022; Khodadadi et al. 2020). Originally, C. chrysophilum was presumed to be limited to the American and Asian continents (Astolfi et al. 2022; Talhinhas and Baroncelli 2021), however, reports of GLS and ABR caused by this pathogen in European apple orchards, such as in Italy and Spain, start emerging in 2022 (Cabrefiga et al. 2022; Deltedesco and Oettl 2022).

Colletotrichum chrysophilum was isolated in September 2021 from symptomatic leaves showing GLS symptoms from an apple orchard with a disease incidence close to 50%, in northern Italy (Province of Ferrara, Emilia-Romagna). The monosporic strain M932 was transferred onto fresh PDA medium (supplemented with 200 ml/L streptomycin and 200 ml/L neomycin) and incubated at 20 °C for 10 days to obtain mycelium for genomic DNA extraction using a modified CTAB method (Prodi et al. 2011).

The DNA of C. chrysophilum strain M932 was sequenced using the Illumina NovaSeq 6000 150bp paired-end sequencing system. NovaSeq 6000 adapters were trimmed using Trimmomatic v0.39 (Bolger et al. 2014) and low-quality reads were removed using TrimGalore v0.6.4 (Krueger 2015). The quality of the reads was assessed and compared using FastQC v0.11.9 (Andrews 2010). Illumina reads were assembled using SPAdes v3.15.1 (Bankevich et al. 2012). The first draft of the nuclear genome of C. chrysophilum consists of 1497 scaffolds with a total length of 55.56 Mbp (N50= 86538 bp and N75= 44545 bp). BUSCO v5.2.2 (Seppey et al. 2019) software was used to assess the integrity of the fungal genome assembly while assembly statistics were evaluated with QUAST v5.0.2 (Gurevich et al. 2013). Results are reported in Table 1.

Table 1 Summary statistics of the Colletotrichum chrysophilum M932 genome

A total of 20,041 protein-coding genes were predicted to be encoded by the nuclear using MAKER v3.01.02 pipeline (Holt and Yandell 2011) with self-trained GeneMark-ES v4.10 (Borodovsky and Lomsadze 2011) and AUGUSTUS v3.3 prediction performed using the “Fusarium” model (Stanke et al. 2008). SignalP v5.0 (Almagro Armenteros et al. 2019) revealed that 2,350 proteins in C. chrysophilum are secreted and among those 991 have been predicted to be candidate effectors by EffectorP v3.0 (Sperschneider and Dodds 2022). A comparative analysis of the newly sequenced genome with those publicly available (Gan et al. 2013; Armitage et al. 2020; Gan et al. 2021; Baroncelli et al. 2022) showed similar genomic features in terms of genome size and GC% but a high diversity in gene content within strains of C. chrysophilum and with closely related species (Fig. 1). A phylogenomic approach, performed as described in Baroncelli et al. 2022 did also highlight incongruence in the taxonomic designation of deposited data as strains C. nupharicola and C. noveboracense do not form distinct clusters (Figure 1); further analyses are needed to fully understand the diversity and the taxonomy of this group.

Fig. 1
figure 1

Comparative analysis of the newly sequenced genome with those of closely related species publicly available. The genome sequenced in the present study is highlighted in bold. On the left side a phylogenomic tree showing the evolutionary relationships between genomes; number next to the nodes represent Bayesian posterior probability (BPP) values while thicker branches indicate a support value of BPP = 1.00. In the center four bubble plots illustrating assembly fragmentation, size, GC content and completeness. The bubble sizes have been scaled to each panel and are not comparable across panels. The bar diagram on the right reports the size of not secreted (in yellow) and secreted (in orange) predicted protein encoding genes

The availability of the genome of C. chrysophilum M932 offers the possibility to perform further comparative analyses, to fully understand species boundaries within the Colletotrichum gloeosporioides species complex and to develop molecular diagnostic methods.

Nucleotide sequence accession numbers

This whole-genome shotgun project has been deposited in GenBank under the accession no. JAQOWY000000000 (BioProject: PRJNA928458; BioSample: SAMN32933927).