Study design and IRB approval
This research was conducted under the Institutional Review Board (IRB) and local ethics committees of the three centers. All the procedures involving human subjects were carried out under the legacy and the ethical principles of the Declaration of Helsinki.
This is a retrospective study aiming at dissecting the genetic determinants of immune escape in AML and MDS patients undergoing allogeneic HCT; three centers contributed to this effort providing clinical and biological data and material: Cleveland Clinic (CCF, Ohio, USA), Nancy Hospital (France), Washington University in St. Louis (WashU, Missouri, USA), (Supplementary appendix, Supplementary Figure 1, Table 1). We integrated immunogenomic and transcriptomic data from 494 patients (Table 1, Data Source 1) who received allo-HCT for AML (N=294), MDS (N=125) and MPN (N=75) and followed in all three institutions: CCF (N=344), Nancy Hospital (N=143) and WashU (N=7). Pre-transplant clinical HLA genotyping was available in 487 patients and was used as benchmark for HED computation. An ad-hoc targeted high throughput sequencing for classical HLA loci and myeloid genes was performed on selected samples at disease onset (N=40), post-chemotherapy relapse (N=9) and post-transplant relapse (N=44). In addition, 13 patients presenting with disease recurrence were profiled by means of RNAseq on sequential diagnosis/post-transplant relapse samples (median 179 days post-HCT, IQR 129-297), whereas single cell transcriptomics was performed on paired specimens for a patient with early relapse (day +92).
Sample collection
Blood or bone marrow (BM) specimens at diagnosis, post-chemotherapy or post-HCT relapses were collected in ethylenediaminetetraacetic acid (EDTA) tubes, and cryopreserved until further use after Ficoll-Paque isolation and suspension in Dimethyl sulfoxide (DMSO) and Fetal bovine serum (FBS) containing media.For each sample we prioritized DNA extraction for HLA genotyping, and targeted myeloid high throughput sequencing. T-cell receptor immunosequencing, was performed on a subset of selected samples, taking into account recipients’ HED scores. In case of residual cells, RNA extraction and sequencing were performed.BM specimens collected at the moment of diagnosis and post-transplant relapse for a patient experiencing early relapse were profiled for single-cell RNA sequencing, after flow-based CD34+ cell separation.
Genotyping studies
HLA sequencing and myeloid targeted sequencing were performed on patient samples included in the biorepository of the Translational Hematology and Oncology Research Department at CCF.
Genomic DNA was isolated directly from cryopreserved unfractionated peripheral or BM blood mononuclear cells with the Nuclei Lysis Solution (Promega) according to manufacturer’s instructions.
HLA targeted sequencing was performed with TruSight HLA v2 (Illumina, San Diego California) as indicated previously.56 In brief, 11 HLA loci (Class I HLA-A, B, and C; Class II HLA-DRB1/3/4/5, HLA-DQA1, HLA-DQB1, HLA- DPA1, and HLA-DPB1) were amplified with a long-range polymerase chain reaction (PCR). After amplification, a transposon-based DNA tagmentation was applied to generate DNA amplicons, via DNA fragmentation and addition of adapter sequences. Additional PCR steps provided sequence adapters and indexing primers to generate sequencing-ready DNA libraries. Prepared libraries were then loaded directly onto a MiSeq System for sequencing. SSO-PCR methods were used to generate HLA genotypes for patients transplanted before 2013. HLA genotyping data used for HED computation were issued from the histocompatibility laboratories of the institutions involved and were coded based on a 2 or 4-field (4-6 digits) nomenclature.
Myeloid targeted sequencing was performed on 94 specimens paralleled sequenced for HLA, using a custom panel for detection of myeloid somatic variants from three sequencing platforms: TruSight, TruSeq, and Nextera (Illumina, San Diego, CA, USA). Forty-one most frequent leukemia associated genes common to the three panels were considered for this study (Supplementary Data 3). Sequencing libraries were generated according to an Illumina paired-end library protocol. The enriched targets were sequenced using a HiSeq 2000 or MiSeq (Illumina, San Diego, CA, USA). Paired-end sequenced reads were aligned with the Burrows-Wheeler Aligner (BWA)57 to GRCh37 reference and post-alignment processing included sorting, marking of duplicates, indexing, base recalibration, according to Genome Analysis Tool Kit v.4 best practices.58 Variant calling was performed with HaplotypeCaller.58 Variants were annotated using Annovar. Variants with minimum coverage less than 20 or number of high-quality reads less than 5 were filtered out. An in-house developed bio-analytic pipeline identified somatic/germline mutations using sequences derived from controls and mutational databases such as dbSNP138, 1000 Genomes or ESP 6500 database, and Exome Aggregation Consortium (ExAC) as previously published.59,60,61 Coverage of the myeloid panel was 826x. For our variant calling algorithm, the threshold of 20 reads’ depth was set-up in previous studies from our group.59,62,63Our somatic mutation detection algorithm was previously validated and demonstrated an overall accuracy of 98.7% through comparison with PCR sanger sequencing.62,64
Whole exome sequencing was applied to a subset of samples studied for the presence of molecular events in other immune non-HLA genes, related to antigen presentation machinery and T cell activation and potentially involved in immune escape (supplementary appendix).
HLA mutational analysis
The bioinformatic approach to investigate somatic HLA mutational status is reported in the supplementary appendix and has been previously described.56
KIR genotyping and HLA-C status assignment
KIR genotyping was performed as previously described, using a sequence-specific oligonucleotidic probe platform (One Lambda, West Hills, CA) and sequence-specific primers (Thermo Fisher Scientific, Waltham, MA). 67,68 Recipient HLA-C status was assigned according to the allelic C group configurations: I) C1 homozygous: if patients carried only alleles belonging to the supertypes: C*01, C*03, C*07, C*08, C*09, C*10, C*12, C*14, C*16, C*17; C2 homozygous, in case of presence of supertypes C*02, C*04, C*05, C*06, C*15 and C1/C2 heterozygous in presence of a alleles of both groups. 30,31
TCRβ chain sequencing and analysis
Immunosequencing of the CDR3 regions of human TCRβ chains was performed using the ImmunoSEQ Assay (Adaptive Biotechnologies, Seattle, WA), as previously described.69,70,71 In brief, extracted genomic DNA was amplified in a bias-controlled multiplex PCR, with i) a first PCR step consisting in forward and reverse amplification primers specific for every V and J gene segment, to allow the amplification of the hypervariable CDR3 region, and ii) a second PCR adding a proprietary barcode sequence and Illumina adapter sequences. CDR3 libraries were sequenced on an Illumina MiSeq system according to the manufacturer’s instructions. ImmunoSeq Analizer 3.0 suite was used for sample export and preliminary statistics and quality control steps while R Bioconductor 72 environment and Immunarch R 73 suite were used for all the downstream analyses as previously described (Supplementary Data 8).21
HED computation
High-quality, protein level (2nd field) HLA data in patient and control cohorts were used as input for HED computation.HED scores were generated for all the subjects and genotypes in the study using the algorithm published by Pierini and Lenz (https://sourceforge.net/projects/granthamdist/) for the calculation of the amino acid sequence divergence at the antigen binding sites of HLA molecules. Briefly, starting from a dictionary including all the protein sequences of exons 2 and 3 for class I alleles and exon 2 for class II alleles, assembled from the IPD-IMGT/HLA database v.3.41 we calculated HED for 6 class I (A, B, C) and II HLA loci (DRB1, DQB1, DPB1). The scores so computed were used for all the downstream analyses.
Bulk RNA sequencing and analysis
Paired diagnosis/post-transplant relapsed samples obtained from 13 patients (7 previously reported52) served for transcriptomic studies. For this last group of samples, total RNA was purified with the NucleoSpin RNA kit (Takara Bio USA, Inc.; Mountain View, CA, USA) according to the manufacturer’s instruction. RNA quality check was performed with Agilent 2100 Bioanalyzer. mRNA was enriched using oligo(dT) beads and then was fragmented randomly by adding fragmentation buffer. The cDNA was synthesized using mRNA template and random hexamers primer, after which a custom second-strand synthesis buffer (Illumina, dNTPs, RNase H, and DNA polymerase I) was added to initiate the second-strand synthesis. After a sequential process of terminal repair, a ligation, and sequencing adapter ligation, the double-stranded cDNA library was completed through size selection and PCR enrichment. Illumina technology (NovaSeq 6000) was used for sequencing, after pooling, and > 30 million reads were acquired for each sample. FastQC was used to check sequenced reads quality. After trimming and adapter removal, raw reads were mapped to the human hg19 reference genome using RNA STAR.74 For the transcriptomic study, cohort assembly was performed starting from read counts for all the samples in study. Batch effect was removed through a two-way ANOVA algorithm75 via limma package prior to further analyses, taking into account the design matrix used to describe comparisons between the samples (diagnosis vs post-HCT relapse), to mitigate the effects derived from different sequencing batches..
Genes that were lowly expressed across >80% of the samples were filtered out. For each sample a normalization factor was calculated through the trimmed mean of M values (TMM) method and final logarithmic counts per million were calculated (log2CPM). Differential expression and gene set enrichment analysis were assessed using edgeR 3.32.1 and limma 3.46 with R computational environment (v. 4).76 Benjamini-Hochberg procedure was used for multiple testing correction.
Single-cell RNA sequencing and analysis
Pre-sorted CD34+ cells BM cells from patient CCF#8 were processed for single-cell library preparation as previously described.77 Library preparation was performed as per manufacturer instructions with the Illumina Nextera XT DNA sample preparation kit (Illumina, San Diego, California). Briefly a first tagmentation reaction was carried out on 1 ng of cDNA at 55 C for 10 min. The amplification of adapter-ligated fragments was performed using Nextera PCR master mix, index primers and barcodes were added to identify each cell for 15 cycles. The PCR products from each cell were then pooled together and purified using Sera-mag SpeedBeads (0.6:1 ratio). A final quality check of the cDNA library was performed using an Agilent high-sensitivity DNA chip, obtaining a broad peak between 300-800bp. Single cells were sequenced using 100bp paired end sequencing on the HiSeq2500. Raw reads were assessed using FastQC v0.11.5 and aligned to the hg19 human reference genome using the STAR aligner v2.5.3a, after trimming and adapter removal. Rsubread v1.32.4 was used to assemble read matrices from aligned bam files. Cells with fewer than 150,000 reads, along with < 65% unique mapping and <35% exonic region coverage, were excluded from further analysis. Genes expressed in less than 10% of cells per patient were removed from downstream analysis. Only those cells that passed quality control were included in the downstream analysis. Seurat R package was used on feature-barcoded matrices for clustering and differential analyses and for visualization purposes.
Statistical analyses
Median, interquartile ranges (IQR), mean and 95%CI intervals were used where appropriate. Frequency and distribution of categorical variables were expressed as percentage. For all relevant comparisons, after testing for normal distribution, comparative analyses between two groups were performed, by Wilcoxon matched-pair signed rank test at 95% CI. Fisher’s exact test or Chi-square were applied for independent group comparisons.
Each HED variable was categorized in high and low HED based on the 50th percentile cutoff of the distribution in healthy controls.
Probabilities of survival for overall survival (OS), defined as the time from transplantation to death for any cause or last follow-up, was calculated using Kaplan-Meier estimates,78 with differences between the curves based on log-rank tests for univariate comparisons. Cumulative incidence (CI) of acute and chronic GVHD and relapse were calculated in a competing risk setting, where death was considered the competing event.79
Death was considered as a competing event for relapse. Multivariate cox regression models were built on cumulative incidence of relapse retaining class I and II HED considered as main effect variables.
All statistical tests were two-sided, and a P-value <0.05 was considered statistically significant.
All of the analyses and data visualization were performed using the statistical computing environment R (4.0.0 R Core Team, R Foundation for Statistical Computing, Vienna, Austria) and excel Microsoft 365.