Introduction

Male infertility is a complex condition affecting millions of men of reproductive age with a diverse etiology including endocrine anomalies, structural defects, and hazard exposure [1]. Current clinical standards for diagnosis include examination for semen abnormalities, testis malformations, hormonal imbalances, numerical chromosome aberrations, and Y-chromosome microdeletions (YCMDs). Still, testing is conclusive (e.g., 47,XXY) for only a small fraction of patients with non-obstructive spermatogenic failure, i.e., azoospermia (NOA) or oligozoospermia (OS) [1, 2]. As such, most patients suffering from male infertility do not receive a definitive diagnosis [1].

Data from over 600 transgenic mouse models of NOA and OS have revealed its high genetic heterogeneity [3]. Consequently, the contribution of each individual gene is likely to be relatively small, with many pathogenic variants. Moreover, other than whole gene knockout, most reports focus on analysis of point mutations (a.k.a. single nucleotide variants, SNVs) with less attention given to identifying and assessing larger structural genomic changes. As structural sex chromosome aberrations are known to be elevated in NOA, while structural autosomal aberrations are more frequently observed in severe oligozoospermia (sOS), attention should be given to small deletions and duplications of gene regions in addition to single nucleotide changes [4]. To this end, advances in array comparative genomic hybridization (aCGH), whole-exome sequencing (WES), and whole-genome sequencing (WGS) applications have allowed for a number of successful large-scale genomic studies of male infertility [5,6,7,8,9]. In addition to YCMDs easily detectable with PCR, larger Y chromosome aberrations, including both deletions and duplications, have been observed in SPGF patients using fluorescence in situ hybridization (FISH) and CGH [10, 11]. CGH has also identified pathogenic X chromosome variants responsible for primary male infertility, many of which encompassed novel testis-specific genes [12,13,14,15]. Autosomal chromosome deletions, including doublesex- and mab-3-related transcription factor 1 (DMRT1) on chromosome 9, were found in a number of individuals with idiopathic azoospermia [16]. Additionally, deletions spanning dpy-19 like 2 (DPY19L2) on chromosome 12 have been observed in both familial and sporadic cases of globozoospermia (Online Mendelian Inheritance in Man (OMIM): SPGF9) [17]. Thus, we hypothesize that genetic variants, both single nucleotide and copy number, contribute significantly to the etiology of primary male infertility. Moreover, these variants are readily detectable with currently used genomic technologies. Here, we present a comprehensive study of clinically relevant genomic variants in males with severe SPGF based on known gene associations and variant classification in accordance with American College of Medical Genetics and Genomics (ACMGG) clinical genomic guidelines for pathogenic variant interpretation [18, 19].

Materials and methods

Subjects

This study was approved by the University of Pittsburgh Institutional Review Board (STUDY20110357) and the National Bioethical Committee, Ministry of Health, Warsaw, Poland (OKB-5–2/15). Prior to enrollment, all patients received routine male-factor evaluation based on current American Urology Association, American Society for Reproductive Medicine, and World Health Organization guidelines [20, 21]. For this study, we utilized modified sperm concentration ranges to classify non-obstructive azoospermia (0–0.001 million sperm/mL sperm in the ejaculate), severe oligozoospermia (0.001–10 million sperm/mL), and oligozoospermia (10–15 million sperm/mL). One hundred two patients previously underwent advanced clinical screening to rule out known causes of infertility such as injury, hormonal imbalance, abnormal karyotype, or vas deferens obstruction. Individuals with positive clinical tests for pathogenic CFTR mutations or numerical chromosome aberrations were excluded. Control DNA samples for copy number analysis were derived from 36 unrelated, fertile, normozoospermic males (22 previously described [12] and 14 new in-house controls).

DNA was isolated from peripheral blood using the Gentra Puregene kit (Qiagen). Quality and concentration were determined using a NanoDrop2000 (ThermoScientific). Patient DNA was screened for YCMDs of azoospermia factors AZFa, AZFb, AZFc, and sex-determining region Y (SRY) gene using the Kapa HiFi HotStart PCR kit (Kapa Biosystems) with primers as described by the European Academy of Andrology and the European Molecular Genetics Quality Network [22] (Supplemental Table 1). Samples with AZF or SRY microdeletions were excluded from further analysis.

Array comparative genomic hybridization

Analysis was performed on the 400 K SurePrint G3 Human CGH array (Agilent) which features 411,056 distinct biological 60-bp probes resulting in a median probe spacing of 4.6 Kb within RefSeq genes and 5.3 Kb across human genome hg19 (or hg38 if no gene regions were detected). Male DNA (#G1471, Promega), pooled from multiple healthy males, was used as a reference. DNA labeling, hybridization, washing, and scanning were performed according to the manufacturer’s protocol (Agilent). To maximize signal uniformity, each patient sample was paired with a Promega male reference sample of similar dye activity. Arrays were scanned on the SureScan Microarray C scanner (Agilent).

aCGH data were analyzed with default Aberration Detection Method 2 (ADM-2) algorithm settings via Cytogenomics 3.0.2.11 (Agilent). CNV calls were based on the log2 ratio of direct signal intensity of patient probe to control probe (negative = loss; positive = gain). To increase quality, noise signals were removed using a cutoff log ratio of |log2 > 0.2| for both losses and gains. Regions with three consecutive probes averaging <  − 0.2 or > 0.2 were considered true variants. Statistical significance (p value) is derived from a hypergeometric distribution. CNV zygosity was estimated on the basis of established log2 signal ratios as determined by the UPMC Clinical Genomics Laboratory (Cytogenetics Laboratory) in accordance with ACMG guidelines for chromosomal microarray analysis [23]. For autosomal chromosomes, a value between − 0.2 and − 1 was considered a heterozygous deletion, while values <  − 1 were regarded as homozygous deletions. Values between 0.2 and 1.32 were classified as mosaic duplications and values > 1.32 were called as true duplications. Zygosity of sex chromosome variants was classified as follows: mosaic deletion (− 0.2 to − 1), hemizygous deletion <  − 1, mosaic duplication (0.2 to 0.1.32), or duplication (> 1.32) (Supplemental Table 2). Samples with more than 90 calls (n = 18) were reevaluated with the ADM-2 noisy algorithm (Supplemental Table 3). Higher noise cutoff ratios of |log2 > 0.7| for losses and |log2 > 0.5| for gains were applied to variants assessed with the ADM-2 noisy algorithm. Genomic mapping of CNVs was verified in the UCSC Genome Browser, the Database of Genomic Variants (DGV), and DECIPHER.

Whole-exome sequencing

Whole-exome sequencing libraries were constructed with the SOPHiA Whole Exome Solution and sequenced on the Illumina NovoSeq 6000 (Novogene, Sacramento, CA). One hundred fifty bp paired end fragments were targeted to 20,133 genes (203,058 gene regions) with an average depth of coverage of 110X. Reads were mapped to the hg19 reference genome, and vcf files were generated by SOPHiA DDM version 5.10.12 (SOPHiA Genetics, Boston, MA). SNV quality control and filtering were conducted with Fabric Enterprise 6.13.1 (Oakland, CA) and set at coverage ≥ 10 reads, quality score ≥ 30, and allelic balance ≥ 0 and ≤ 0.7. A proprietary algorithm in SOPHiA DDM version 5.10.12 (Sophia Genetics) was also used to perform CNV analysis on samples with an average depth > 50X. A minimum of eight samples per run were required for the comparison of gene regions. To reduce noise in CNV calling, three standard settings of the algorithm were used: overall residual noise threshold of 0.3, low overall coverage of < 30 × in the respective regions, and overall number of CNV calls above 300.

Gene annotation and variant classification

Inheritance patterns were predicted using DOMINO [24]. Minor allele frequencies (MAF) in the general population were retrieved from gnomAD v2.1.1 for SNVs. CNV MAF is based on variants present in GnomAD SVs v2.1 that spanned the entire affected gene candidate region. Coding genes were classified on the basis of expression using Genotype Tissue Expression (GTEx), Aceview, BioGPS, and Human Protein Atlas (HPA), as well as previous reports of human male infertility-related phenotypes (ClinGen, OMIM, and ClinVar) and mouse (Mouse Genome Informatics (MGI)). CNVs affecting noncoding regions were classified according to the Human BodyMap (Illumina). Genomic variants were cross referenced against gene candidates with definitive, strong, moderate, or limited association with male infertility [3, 25, 26]. Variants were scored for clinical relevance (pathogenic, likely pathogenic, variant of uncertain significance, likely benign, benign) according to ACMG technical standards and guidelines.

Statistical analysis

Student’s t test was used to compare CNV parameters between patients and controls. One-way ANOVA was used to determine differences between the three patient groups. A p value of less than 0.05 was considered significant. Statistical tests were performed using Prism 8.2.1 software (GraphPad).

Results

DNA samples from 102 infertile males clinically diagnosed with SPGF and 36 fertile controls were initially screened for CNVs via aCGH. Patients were further classified into three major clinical categories, including 41 NOA, 44 sOS, and 17 OS (Table 1). In total, 4307 potential CNVs were detected in patients but not controls. Following noise removal, 2590 high-quality CNVs were retained, called here true variants (NOA n = 940, sOS n = 1222, OS n = 428). These CNVs ranged in size from 0.6 kb to 2.7 Mb, with ~ 90% of them less than 75 kb (Fig. 1). Comparative analysis identified significant differences in total count (n) and median CNV size (s) between losses/deletions (n = 1482; s = 22.8 kb) and gains/duplications (n = 1108; s = 31.8 kb) (p < 0.05). Following normalization based on chromosome size (total CNV count/chromosome kb), analysis of CNV distribution did not identify genomic hot spots or clustering with total CNVs mapping nearly equally across all chromosomes (Fig. 2).

Table 1 Patient samples by phenotype. One hundred two males with idiopathic spermatogenic failure (SPGF) were initially recruited. Three azoospermic males were excluded from the study following CGH detection of mosaic X chromosome amplifications (see Fig. 3). Three azoospermic males and two individuals with severe oligozoospermia were excluded from variant analysis due to the presence of AZFc microdeletions detected via concurrent PCR. A total of 27 clinically relevant structural CNVs were identified in azoospermic (n = 11), severe oligozoospermic (n = 11), or oligozoospermic (n = 3) individuals. Number in parentheses indicates likely diagnostic CNVs. Thirteen likely diagnostic SNVs were identified in azoospermic (n = 2), severe oligozoospermic (n = 5), or oligozoospermic (n = 2) individuals. (n = 94)
Fig. 1
figure 1

Patient variant size and frequency distribution. Total true variants (n = 2590) unique to individuals with spermatogenic failure (azoospermia n = 41, severe oligozoospermia n = 44, and oligozoospermia n = 17), detected by Agilent 400 K SurePrint G3 Human CGH array and calculated with Cytogenomics ADM-2 or ADM-2 Noisy algorithms, ranged in size from 0.6 kb-2.7 Mb, with > 95% of them less than 100 kb (X axis is shown in log2 scale)

Fig. 2
figure 2

Patient variant count and distribution per chromosome. Copy number variant detection by Agilent 400 K SurePrint G3 Human CGH array and quantification with Cytogenomics ADM-2 or ADM-2 Noisy algorithms showed no significant difference in average number of true CNVs per condition: azoospermia (n = 28), severe oligozoospermia (n = 28), oligozoospermia (n = 25). Total number of patient deletions (n = 1482) significantly outnumbered total number of patient amplifications (n = 1108) (p < 0.05). Normalization based on chromosome size showed no significant difference in chromosome distribution of copy number variants

Notably, our initial CNV analysis identified mosaic X chromosome gains (log2 ratio ~ 0.9) in three NOA patients (A1–A3) who initially presented with normal 46,XY clinical karyotypes (Fig. 3). As Klinefelter syndrome (47,XXY males) is a known cause of male infertility, we excluded these three mosaic patients (46,XY/47,XXY) from further analysis [27]. During concurrent analysis, testing for YCMDs identified 5 of the 102 patients (~ 5%) with AZFc microdeletions (Fig. 4). Since, AZFc deletions are known causes of male infertility, these patients were also excluded from further analysis [28].

Fig. 3
figure 3

Mosaic X chromosome amplification. A Agilent Cytogenomics chromosome view reveals amplification (blue) extending the entirety of the X chromosome. B Interval table. Highlighted row (gray) shows output for a 1.87-fold (2^0.905) amplification of a 93-Mb region of the X chromosome

Fig. 4
figure 4

Y Chromosome microdeletions detected by PCR. Agarose gel electrophoresis of amplified DNA shows the presence of SRY gene in eight patients and fertile control (left) and AZFc/sY254 (380 bp) deletions in samples A7, A21, A33, S7, and S25 (right)

The remaining 94 patients without numerical chromosome aberrations or YCMDs were analyzed for clinically significant CNVs associated with SPGF and male infertility in humans and/or mouse models. From the 2544 true CNVs present in infertile patients only, 589 variants (~ 23%) were unique and had not been reported as polymorphic (MAF > 0.01) in GnomAD SVs v2.1. Among these losses and gains, 451 CNVs contained coding and 138 noncoding regions (UCSC Genome Browser and Human BodyMap), herein referred to as coding and noncoding variants, respectively. According to established clinical and biological databases, 162 of the 451 coding variants (36%) contained genes having recognized roles in spermatogenesis and/or SPGF as well as significant or specific expression in the testis (BioGPS n = 152, MGI n = 49, OMIM n = 14). Among these variants, 71 were unique to NOA, 68 were observed only in sOS, and the remaining 23 were found only in OS cases (Supplemental Table 4). Of the noncoding variants, 19 (14%) contained testis-specific, long noncoding RNAs (lncRNA) as identified in the Illumina Human BodyMap (Supplemental Table 5). Both gains and losses were observed but were not seen in individuals with OS.

Coding variants (n = 164) were cross referenced against a list of genes from recent, comprehensive, systematic reviews of the genetic basis of male infertility and classified according to current ACMG guidelines for interpreting and reporting constitutional CNVs and SNVs [3, 18, 19, 25]. Using output from Agilent’s CytoGenomics browser, we identified 27 aberrations of potential clinical significance (pathogenic, likely pathogenic, variant of uncertain significance) in 24 patients (Table 2). Identified CNVs were then prioritized on the basis of genetic mode of inheritance, with X-linked and known dominant genes given highest priority. Notably, three unique, likely disease-causing CNVs were detected in three patients with NOA. These included a hemizygous, pathogenic ~ 90-kb deletion interrupting testis expressed 11 gene (TEX11) in patient A12. This variant is almost identical to the previously identified aberration detected in an individual with NOA of mixed testicular atrophy [15]. We also observed an ~ 70-kb hemizygous deletion in a variant containing the X-linked DEAD-box helicase 53 (DDX53) gene which is specific to testes in adult males. An 11.5-kb likely pathogenic homozygous deletion of exons 2–8 of serine/threonine kinase 11 (STK11) was detected in patient A18.

Table 2 Clinically relevant CNVs observed in spermatogenic failure. Azoospermic (n = 35), severe oligozoospermic (n = 42), and oligozoospermic males (n = 17). Gene association with male infertility as described by Houston et al. [9]. Inheritance predicted with DOMINO. Gene regions are continuous and include exons and interspersed introns. Deletion (negative) and amplification (positive) values produced with Agilent 400 K SurePrint G3 Human CGH array and Cytogenomics software are calculated as log2 (signal ratio) for patient versus paired Promega control. p value is equivalent to hypergeometric tail score (H.G.T.). Minor allele frequency (MAF) as reported in GnomAD SVs v2.1 for variants spanning entire gene candidate region (NF not found). Interpretation based on ACMG technical standards and guidelines. VUS variant of uncertain significance. *CNV confirmed with WES. **Alhathal et al. [25]. ***Equal probability of autosomal dominant or recessive inheritance

Subsequent WES analysis identified 13 clinically relevant SNVs/indels of uncertain significance in nine genes associated with male infertility in nine patients (Table 3). Notably, NOA patient A10 had a c.1675A > T; p.Thr559Ser substitution in the androgen receptor (AR) gene on the X chromosome as well as a two heterozygous variants, c.3520_3522del; p.Gln1174del and c.904A > G; p.Ile302Val, in the centrosomal protein 290 (CEP290) gene. An X-linked c.55_75del; p.Ser19_Ala25del deletion in Kallman syndrome interval gene 1 (KAL1) in addition to an amplification of uncertain significance of exons 2–28 of SET binding factor 1 (SBF1) were found in SO patient S28. OZ patient O2 had a hemizygous c.965C > G; p.Thr322Ser missense mutation in the nuclear receptor subfamily 0, group B, member 1 (NR0B1) gene. This individual also had a heterozygous deletion of exon 3 of serpin peptidase inhibitor clade E (nexin plasminogen activator inhibitor type 1) member 2 (SERPINE2) as detected by aCGH. In addition to variants in genes with definitive associations with male infertility, likely diagnostic hemizygous (MAGEE2), homozygous (CATSPER2, PKD1, and QRICH2), and multiple heterozygous (DNAH9) mutations were detected. WES analysis also confirmed 8/27 (30%) CNVs with an average depth of 76X for 90% of the targeted region, including hemizygous TEX11 and DDX53 deletions (Fig. 5, Table 2).

Table 3 Clinically relevant SNVs observed in spermatogenic failure. Azoospermic (n = 35), severe oligozoospermic (n = 42), and oligozoospermic males (n = 17). Gene association with male infertility as described by Houston et al. [9]. Inheritance predicted with DOMINO. Computational predictions from SIFT, PolyPhen, PhyloP, MutationTaster, Omicia, CADD, and VVP. Minor allele frequency (MAF) values from GnomAD v2.1.1 (NF not found). Computational predictions: B benign, D damaging, NA not applicable, PD possibly damaging. Interpretation based on ACMG technical standards and guidelines: VUS variant of uncertain significance. *Alhathal et al. [25]. **Documented phenotype does not match study participant phenotype [29, 30]
Fig. 5
figure 5

WES confirmation of selected likely pathogenic CNVs. Output from SOPHiA DDM version 5.10.12 showing intragenic deletion of TEX11 (upper) and full deletion of DDX53 (lower) in two separate NOA patients

Discussion

Despite alarming rates of infertility globally, progress in male infertility health care lags behind other recognized major health conditions in comprehensive diagnostics as well as therapeutic treatments of the underlying etiology. Recently, several international collaborative efforts have reported causative genetic findings associated with male infertility in general and SPGF in particular; yet, genetic/genomic evaluation in the clinic remains at an early stage in the field [15, 31,32,33,34,35].

Using aCGH and WES, we studied clinically significant genomic variants in SPGF patients. We identified 43 clinically relevant variants in 36 of the 97 (37%) patients with idiopathic male infertility. Over 90% of identified CNVs were smaller than 75 kb, a resolution surpassing routine fluorescence in situ hybridization (100–200 kb) and far superior to traditional karyotype Giemsa banding (3–5 Mb). Notable known findings in NOA patients include mosaic 47,XXY genomes in three patients and an X chromosome TEX11 intragenic deletion in one patient. TEX11 encodes a known fertility factor involved in meiotic recombination and mutations are estimated to account for 1-2% of all NOA diagnoses [15, 36, 37]. A homozygous intragenic deletion of STK11 is also a likely pathogenic variant. STK11 (previously known as LKB1) plays a role in DNA damage response, and gain-of-function mutations in humans are associated with multiple carcinomas while transgenic knockout mouse models exhibit decreased male germ cell number and abnormal spermatogenesis [38, 39]. Interestingly, the individual, harboring the homozygous STK11 deletion, also had heterozygous CNVs that resulted in a single copy losses of SBF1 exons 6–23 and exons 4–10 of DAZ-associated protein 1 (DAZAP1). Multiple heterozygous SBF1 point mutations, including one in exon 14, which our variant spans, have been associated with NOA, and knockout male mice are infertile [40]. DAZAP1 directly interacts with deleted in azoospermia-like (DAZL) in which genetic defects are linked to SPGF [41]. A full deletion of testis specific, X-linked DDX53, also known as cancer associated gene (CAGE) may also be causative; however, based on current data, we cannot unequivocally draw this conclusion [42]. Additionally, X-linked, unique SNVs detected in AR, KAL1, and NR0B1, all of which have definitive gene associations with male infertility, are potentially diagnostic. AR mutations are associated with androgen insensitivity syndrome, KAL1 variants have been shown to cause hypogonadotropic hypogonadism, and NR0B1 variants are associated with 46,XY sex reversal, type 2 and congenital adrenal hypoplasia [43,44,45,46,47].

While we did not identify common genomic regions or predominant CNVs responsible for SPGF, with the majority of patients being carriers of private, single, heterozygous aberrations in autosomal recessive genes, our data point to recurrent defects in steroid metabolism, endocrine system, regulation of meiotic division, and spermatid/spermatozoa maturation, all of which could alter or contribute to sperm production, and/or its viability [40, 47,48,49,50,51,52,53,54,55,56,57]. Additionally, multiple identified SNVs are presumed to affect sperm morphology, motility, and fertilization ability [29, 40, 47,48,49,50,51,52,53,54,55,56,57,58,59,60].

Our findings demonstrate the high heterogeneity and genomic complexity of unexplained severe male infertility. In addition to known pathogenic single nucleotide changes and deletions of essential gene regions, amplifications of significant portions of the genome may contribute to disease load through structural changes and alteration of function as a result of inversions and/or translocations [61]. Moreover, with the exception of pathogenic variants in known happloinsufficient genes, small hemizygous or autosomal alterations in which one allele is mutated and the second is deleted or inactivated are less likely to be detected via Sanger or next-generation sequencing (NGS) without the aid of CNV analysis [62].

It is estimated that single nucleotide variants comprise up to 80% of all mutations with CNVs contributing to the majority of the remaining fraction [63]. This general estimate varies significantly, however, for different disorders and depends on genomic architecture of the gene region. Limitations in NGS technology for detecting structural genomic variants are also a source of additional challenges. Taken together, this suggests that CNV testing should be considered as a necessary addition to genomic sequencing analysis. As advancements are made to NGS techniques and bioinformatics, where greater and sufficient read depth are reliably translated, high-resolution CNV detection will likely result in the incorporation of aberration detection via deep sequencing replacing CGH [64].