Study cohorts
Four cohorts of whole exome (WES) or whole genome sequencing data (WGS) of infertile men were included in this study. The MERGE cohort includes data of 2,412 men (2,352 with WES and 60 with WGS) with various infertility phenotypes and >90% of these men were recruited at the Centre of Reproductive Medicine and Andrology (CeRA), Münster. Most men of this cohort are azoospermic (N = 1,448) or have severely reduced sperm counts: N = 454 with cryptozoospermia (sperm only identified after centrifugation of the ejaculate); N = 158 with extreme oligozoospermia (sperm count <2 million); N = 67 with severe oligozoospermic (sperm count <10 million). Numerical chromosomal aberrations such as Klinefelter syndrome (karyotype 47,XXY) and Y-chromosomal AZF-deletions are exclusion criteria. Likely pathogenic monogenic causes for the infertile phenotype were already described in about 8% of cases24. Exome sequencing and bioinformatics analyses in the MERGE cohort were performed as previously described24.
The Strasbourg cohort comprises 23 men diagnosed with NOA. The Barcelona cohort (BCN) comprises 235 NOA men attending the Fundació Puigvert (Barcelona)51. The Nijmegen/Newcastle cohort includes exome sequencing data of 266 infertile men, 225 affected by azoospermia and 41 by cryptozoospermia52.
Ethical approval
All persons gave written consent compliant with local requirements. The study protocol was approved by the local ethics committees: MERGE Münster (2010-578-f-S) and Giessen (26/11); Strasbourg (CPP 09/40—WAC-2008-438 1W DC-2009-I 002), and Yeni Yüzyıl University, Scientific, social and noninterventional health sciences research ethics committee, Istanbul, Turkey (approval no: 2019/08); Barcelona: (2014/04c); Newcastle: (Newcastle:REC ref. 18/NE/0089), Nijmegen: (NL50495.091.14 version 5.0). All experiments were performed in accordance with relevant guidelines and regulations.
Whole exome and genome sequencing
Genomic DNA was extracted from peripheral blood leukocytes via standard methods. For exome sequencing of the MERGE and Strasbourg cohort, the samples were prepared and enrichment was carried out according to the protocol of either Agilent’s SureSelectQXT Target Enrichment for Illumina Multiplexed Sequencing Featuring Transposase-Based Library Prep Technology (Agilent) or Twist Bioscience’s Twist Human Core Exome. To capture libraries, Agilent’s SureSelect Human All Exon Kits V4, V5 and V6 or Twist Bioscience’s Human Core Exome plus RefSeq spike-in and Exome 2.0 plus comprehensive spike-in were used. For whole genome sequencing of samples from the MERGE cohort sequencing libraries were prepared according to Illumina’s DNA PCR-Free library kit. For multiplexed sequencing, the libraries were index tagged using appropriate pairs of index primers. Quantity and quality of the libraries were determined with the ThermoFisher Qubit, the Agilent TapeStation 2200 and Tecan Infinite 200 Pro Microplate reader, respectively. Sequencing was performed on the Illumina HiSeq 4000 System, the Illumina HiSeqX System, the Illumina NextSeq 500 System, the Illumina NextSeq 550 System, or the NovaSeq 6000 System, using the HiSeq 3000/4000 SBS Kit (300 cycles), the HiSeq X Ten Reagent Kit (300 cycles), the NextSeq 500 V2 High-Output Kit (300 cycles), or the NovaSeq 6000 S1 and S2 Reagent kits v1.5 (200 cycles), respectively. For the BCN cohort exome sequencing was carried out as a service by Macrogen Inc. 122 (Republic of Korea) using the 123 Agilent SureSelect_V6 enrichment and a NovaSeq 6000. Exome sequencing in the Nijmegen/Newcastle cohort was performed as previously described52.
Variant calling
After trimming of remaining adapter sequences and primers with Cutadapt v1.1553, reads were aligned against GRCh37.p13 using BWA Mem v0.7.1754. Base quality recalibration and variant calling were performed using the GATK toolkit v3.8 55 with haplotype caller according to the best practice recommendations. For more recent samples and whole genome samples Illumina Dragen Bio-IT platform v4.2 was used for alignment and variant calling. Both pipelines use GRCh37.7.p13 as reference genome. Resulting variants were annotated with Ensembl Variant Effect Predictor 56.
Gene Ontology analysis
WES data of infertile men from MERGE was first filtered for genes with rare (MAF£0.01 according to the gnomAD, v2.1.1) homozygous loss-of-function (LoF) variants (stop-gain, start-loss, splice acceptor, splice donor, frameshift). We then selected for genes preferentially expressed in human male germ cells according to single cell RNAseq data included in the human protein atlas (HPA)57. Gene ontology analysis (http://geneontology.org)58–60 was performed on this gene list (for “biological processes” and “homo sapiens”) and processed with PANTHER https://pantherdb.org/webservices/go/overrep.jsp60,61 (annotation dataset: “GO biological processes complete”, test type: ‘Fisher’s Exact”, Correction: “Bonferroni”, showing results with P <0.05). GO terms were then processed with Revigo62 (http://revigo.irb.hr/) using the P-value and a medium (0.7) list setting (yes to removal of obsolete GO terms, species "homo sapiens”, “SimRel” semantic similarity measure). The Revigo Table was exported and -log10(P-value) of representative GO terms (classed as representation: ‘null’) plotted as side-ways bar chart. Revigo tree data was processed with CirGO.py63 for visualization of the 2-tiered hierarchy of GO-terms.
Screening of WES data for biallelic high impact variants
To identify potentially harmful gene variants in genes of the piRNA pathway WES data of infertile men from all cohorts included in this study were screened to identify individuals with biallelic high-impact variants (stop-gain, start-loss, splice site and splice region, deletions and insertions as well as missense variants with CADD ≥15) in a total of 24 different genes of the pathway (Table S1). Only variants with a MAF £0.01 (gnomAD database, v 2.1.1) were taken into account.
To exclude presence of additional possibly pathogenic variants WES data were additionally screened for additional rare homozygous high-impact variants (LoF and missense variants with CADD ≥20) occurring in a list of 21 azoospermia-associated genes with at least moderate clinical validity24 and 363 candidate genes associated with the Gene Ontology classification “male infertility” in the Mouse Genome Informatics Database (MGI) revealing strong expression in human male germ cells. Patients in which additional candidate variants were identified were excluded from further analysis.
Further genetic analysis
Validation of prioritized variants as well as co-segregation analyses were performed by Sanger sequencing. The regions of interest were amplified from patients genomic DNA as well as available family members with the use of primers and conditions as listed in Table S2. The PCR products were then purified and sequenced using standard protocols. For validation of variants in GPAT2 long range PCR products using GPAT2 specific primers that do not bind to pseudogenes GPAT2P1 and GPAT2P2 were amplified and used as template for subsequent nested PCR and Sanger sequencing. If a variant was found in more than one individual in MERGE (GPAT2: c.1879C>T in M690 and M1844 and c.1130A>G in M13 and M454) the relationship between the two mutation carriers was determined using the Somalier tool64. In case that no parental DNA was available for analysis,biallelic occurrence of heterozygous variants was determined by long-read sequencing using long-range PCR products encompassing both genomic regions of interest amplified from variant carriers as template for library generation.
NGS library preparation and long read sequencing using the MinION system
To determine if two heterozygous variants identified in one gene of the same patient occur in cis or in trans, a long read sequencing approach was used. To this end, a long-range PCR product encompassing both variants was amplified (see Supplemental Table 2 for primer information) from patients genomic DNA using the TAKARA LA Taq® DNA Polymerase Hot-Start Version. 1 µg of PCR products was used for subsequent preparation of MinION sequencing library. Barcoding and sequencing was carried out according to manufacturer’s instructions (MinION, Oxford Nanopore Technologies). After demultiplexing of obtained reads, alignment to human reference hg19, quality control and variant calling phasing of variants on same/different alleles was determined.
Minigene assay
To determine the functional impact of splice site and splice region variants, an in vitro splicing assay based on a minigene construct was performed. The region of interest was amplified from genomic DNA of the respective patient as well as of a human control sample by standard PCR procedures. To analyse splice effect of variants GPAT2 c.1156-1G>A, MAEL c.908+1G>C and TDRD9 c.3716+3A>G products were cloned into pENTR™/D-TOPO® according to manufacturer’s instructions. The subsequent gateway cloning was performed using Gateway™ LR Clonase™ Enzyme Mix and pDESTsplice as destination vector (pDESTsplice was a gift from Stefan Stamm (Addgene plasmid #32484)65. To analyse TDRD12 c.963+1G>T amplified region encompassing exon 8-10 of TDRD12 was subcloned into pcDNA3.1 and for MOV10L1 c.2179+3A>G into pSPL3B. A transient transfection with mutant and wildtype Minigene constructs was performed using Human Embryonic Kidney cells (HEK293T Lenti-X, Clontech Laboratories, Inc.®). Total RNA was extracted using the RNeasy Plus Mini Kit (QIAGEN®) and reverse-transcribed into cDNA with the ProtoScript® II First Strand cDNA Synthesis Kit (New England Biolabs GmbH®). Amplification of the region of interest was performed and RT-PCR products were separated on a 2% agarose gel, cut out, extracted and sequenced.
AlphaFold2 protein structure
AlphaFold2 structure predictions were obtained from EBI (https://alphafold.ebi.ac.uk/), except for PNLDC1 and GPAT2, which were generated with the AlphaFold2 google colab (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) 66,67 using protein sequences encoded by the NCBI accession NM_001271862.2 for PNLDC1 and NM_001321526.1 for GPAT2. Images of protein structures were generated with Pymol (v.2.5.4, Schrödinger, LLC).
Histology and Immunohistochemical staining
Testis biopsies of patients from the MERGE cohort and control subjects were obtained from testicular sperm extraction (TESE) approaches or histological evaluation at the Department of Clinical and Surgical Andrology (University Hospital Münster, Germany) and research purposes. Biopsies were fixed in Bouin’s solution overnight, washed with 70% ethanol and embedded in paraffin for routine histological evaluation. Subsequently, 5 µm sections were stained with periodic acid-Schiff (PAS) according to previously published protocols68. Testis biopsies of patients from Giessen were processed equally but stained with hematoxylin and eosin (HE) following previously published protocols69. Testis biopsies of patients from BCN were treated as described previously51.
For immunohistochemical analyses, 3 µm sections of testicular tissue were de-paraffinized and rehydrated as described previously70. After rinsing with tap water (15 min, RT) heat-induced antigen retrieval was performed in HIER buffer (pH 6) or as indicated in table S3. This step was followed by cooling and washing with 1X Tris-buffered saline (TBS) before endogenous peroxidase activity was blocked using 3% hydrogen peroxide (15 min, RT). Blocking was performed by adding 25% goat serum (#ab7481, abcam) in TBS containing 0.5% bovine serum albumin (BSA, #A9647, Merck, 30 min, RT). Sections were incubated overnight at 4°C in primary antibody solution, including 5% BSA/TBS and primary antibody as indicated in table S3. The following day, sections were washed with 1x TBS and incubated with a corresponding biotinylated secondary antibody in 5% BSA/TBS for 1h. After washing with TBS, sections were incubated with streptavidin-horseradish peroxidase (Merck – 1:500, 45 min, RT) diluted in 5% BSA/TBS. Subsequently, sections were washed with TBS and incubated with 3,3’-Diaminobenzidine tetrahydrochloride (DAB, Merck) for visualisation of antibody binding. Staining was validated by microscopical acquisition and stopped with aqua bidest. Counterstaining was conducted using Mayer’s hematoxylin (Merck). Finally, sections were rinsed with tap water, dehydrated with increasing ethanol concentrations and mounted using M-GLAS® mounting medium (Merck).
Slides were evaluated and documented using a PreciPoint O8 Scanning Microsocope, Olympus BX61VS Virtual Slide System Axioskop (Zeiss, Oberkochen, Germany), or an Olympus BX61 microscope with an attached Retiga 400 R camera (Olympus, Melville, NY, USA) and integrated CellSens imaging software (Olympus, Melville, NY, USA).
RNA extraction and small RNA sequencing
RNA from snap-frozen testicular tissues of three controls with full spermatogenesis and infertile men with biallelic variants in PIWIL1 (M2006), TDRD1 (M1648), TDRD12 (M2317, M2595) and FKBP6 (M2546, M2548) was extracted using Direct-zol RNA Microprep kit (Zymo Research, Cat. #R2062). The quantity and quality of the isolated RNA were assessed with Qubit RNA High Sensitivity kit (Invitrogen, Cat. #Q32852) and Agilent RNA Nano kit (Agilent, Cat. #55067-1512), respectively.
300 ng of total RNA was used for small RNA library preparation using NEXTflex Small RNA-Seq Kit v3 (PerkinElmer, Cat. #NOVA-5132-05). In addition to the manufacturer’s protocol, a spike-in mix of 0.05 ng 5′P-cel-miR-39-3p-3′-OH and 0.05 ng 5′P-ath-159a-3′-2-OMe was added at the initial library preparation step, to check for technical errors at library preparation and sequencing steps. Sequencing was carried out at the OHSU Massively Parallel Sequencing Shared Resource facilities on Illumina NovaSeq 6000 S4 2x100 flow cell. For RNA-seq data processing and piRNA annotation sequencing reads were trimmed with Cutadapt (v.#3.0) according to instructions provided by CATS small RNAseq kit protocol (Diagenode, Cat. #C05010040, Doc. # v.2 I 09.17) or NEXTflex Small RNA-Seq Kit protocol (PerkinElmer, Cat. #NOVA-5132-05, Doc. # v.V19.01). Next, trimmed reads were aligned to reference genome (GRCh37) with Bowtie (v.#1.0.1)71 allowing only perfect matches, discarded miRNAs by selecting reads between 25 and 45 bases, and re-aligned to GRCh37 allowing one mismatch. Finally, known small non-coding RNAs, other than piRNAs, were removed from the dataset using DASHRv2 (v.#v2)72 and the remaining piRNA sequences were intersected with known piRNA loci detected in human adult testis3. For statistical analysis data from small RNA-seq experiments were evaluated using SciPy (ver.: 1.8.0) packages73. Shapiro-Wilk test for normality of the data and Student’s T-test for expression changes in piRNA quantities of different lengths were used.