Introduction

Rhesus macaques have been extensively used in infectious disease research. This model provided key insights into disease pathogenesis and allowed for the evaluation of novel vaccine concepts. Two populations of rhesus macaques, consisting of Indian-origin rhesus macaques and Chinese-origin rhesus macaques, have been utilized extensively in AIDS research and for other models of infectious diseases (Gardner and Luciw 2008; Giraldo-Vela et al. 2008; Miller et al. 1989; Smith et al. 1999). While these two sets of animals appear to be indistinguishable physiologically, genetic factors affecting immune responses appear to be quite distinct. This is particularly true in terms of the Major Histocompatibility Complex (MHC) class I molecules, which determine the repertoire of T-cell responses that an individual can develop against Simian Immunodeficiency Virus (SIV) and/or any other foreign pathogen (Parham 2005). Recent studies have shown that the MHC gene regions of Indian and Chinese macaques differ appreciably in terms of the degree of polymorphism, the specific allelic variant, and their functional characteristics (Solomon et al. 2010; Southwood et al. 2011; Wiseman et al. 2009).

In previous studies, we have investigated the MHC:peptide-binding repertoire of several MHC class I and II molecules in Indian-origin rhesus macaques, such as Mamu-A*01 (Allen et al. 1998), -B*17 (Mothe et al. 2002), -B*01 (Loffredo et al. 2005), -DRB1*0406, and -DRB*w201 (Dzuris et al. 2001) among others (Dzuris et al. 2000; Giraldo-Vela et al. 2008; Loffredo et al. 2008; Loffredo et al. 2007; Loffredo et al. 2004; Reed et al. 2011; Sette et al. 2005). While Chinese rhesus macaques are of value as animal models for AIDS vaccine development and for other pathogens, their full potential has not been realized due to missing functional MHC and genetic information.

We have recently started to define peptide-binding motifs for common Chinese macaque MHC class I molecules and to date have elucidated detailed quantitative motifs for three alleles: Mamu-A1*026:01, Mamu-B*083:01 (Southwood et al. 2011), and Mamu-A1*022:01 (Solomon et al. 2010). Interestingly, we found that all of these molecules are associated with motifs that are overlapping with well-known HLA supermotifs. Specifically, Mamu-A1*026:01 shares a high degree of cross-reactivity with the HLA-A2 supertype allele HLA-A*02:02, as well as -A*02:01 and -A*02:03, while Mamu-B*083:01 is highly cross-reactive with HLA-A3 supertype alleles HLA-A*31:01, -A*03:01, and -A*68:01 (Southwood et al. 2011). Additionally, Mamu-A*022:01 is highly cross-reactive with the HLA-B7 supertype alleles B*07:02 and B*35:01 (Solomon et al. 2010). The HLA-B7, -A3, and -A2 supertypes are the three most abundant supertypes in the human population, each present with a phenotypic frequency of greater than 40%, averaged across various ethnic groups (Sette and Sidney 1998; Sette and Sidney 1999).

To fully realize the potential of Chinese-origin macaques in infectious disease research, further investigation and characterization of the most commonly expressed MHC alleles is necessary. This will facilitate accurate assessment of cellular immunity and immune correlates of protection. In the present set of experiments, we focused on a high-frequency allele in Chinese rhesus macaques, Mamu-B*039:01, which is present in 5.8% of Chinese rhesus macaques (Solomon et al. 2010). We sought to characterize Mamu-B*039:01 by determining its specific MHC:peptide-binding motif. We were surprised to find that its motif consisted of a preference for glycine (G) at position 2. Analysis of published literature revealed that only the murine class I molecule H-2 Dd has been reported to share this specificity. The analysis of known Mamu and HLA sequences resulted in the definition of structural characteristics associated with the preference for glycine in position 2 and led us to discover another macaque allele that we subsequently demonstrated to be associated with similar peptide-binding specificity. Structural and phylogenetic analysis suggested that this MHC motif is not found among HLA alleles. Hence, herein we propose a new supertype motif present in macaques and mice, but not in humans.

Materials and methods

Creation of stable Mamu-B*039:01 and Mamu-B*052:01 transfectant cell lines

To produce secreted Mamu molecules in the context of endogenous ligand identification, α-chain complementary DNAs (cDNAs) of Ch-Mamu-B*039:01 and In-Mamu-B*052:01 were modified at the 3' end by PCR mutagenesis to delete codons 5–7 encoding the transmembrane and cytoplasmic domains and to add a 30 bp tail encoding the ten amino acid rat very low-density lipoprotein receptor (VLDLr), SVVSTDDDLA, for purification purposes (Hickman et al. 2000). sMHC–VLDLr were cloned into the mammalian expression vector pcDNA3.1 (Invitrogen). The MHC class I-deficient EBV-transformed B-lymphoblastoid cell line 721.221 cells were transfected with sMHCs Mamu-B*39TVLDLr and Mamu-B*52TVLDLr by electroporation. After 48 h of incubation, the cells were plated in 96-well plates (Falcon) in RPMI 1640 containing the antibiotic Geneticin. Transfectants were tested for the production of sMHC molecules by a VLDLr-specific ELISA (Hawkins et al. 2008).

Mamu endogenous ligand determination

Separately, approximately 25 mg of Mamu-B*39TVLDLr and Mamu-B*52TVLDLr molecules from the 721.221 cell line were purified over an affinity column composed of anti-VLDLr antibody (ATCC clone CRL-2197) coupled to CNBr-activated Sepharose 4B (GE Healthcare, Piscataway, NJ). sMHC molecules were then eluted in 0.2 N acetic acid, brought up to 10% acetic acid, and heated to 76°C for 10 min. Peptides were separated from heavy and light chains by ultra-filtration in a stirred cell with a 3-kDa molecular weight cutoff cellulose membrane (Millipore, Bedford, MA). The peptide batch was flash frozen and lyophilized. The peptides were then reconstituted in 10% acetic acid.

Following isolation, 10% of the peptide pool was subjected to 14 rounds of N-terminal sequencing by Edman degradation. A motif was generated by calculating the fold increase of each amino acid over the prior round. A hierarchy was then determined based on the amino acid composition at each position (Falk et al. 1991).

Peptides were reverse-phase HPLC fractionated using a Jupiter Proteo C12 column (Phenomenex, Torrance, CA) on a Paradigm MG4 system (Michrom Bioresources, Auburn, CA). A standard CH3CN gradient was employed to generate approximately 40 peptide-containing fractions. UV absorption was monitored at 215 nm. Peptide fractions were concentrated to dryness and reconstituted in 20 μl of nanospray buffer composed of 50% methanol, 50% H20, and 0.5% acetic acid. Nano-electrospray capillaries (Proxeon, Denmark) were loaded with 1 μl of each peptide fraction and infused at 1,100 V on a Q-Star Elite quadrupole mass spectrometer with a time of flight detector (Applied Biosystems, Foster City, CA). Ion maps were generated for each fraction in a mass range of 300-1,200 amu. Using independent data acquisition for selection, ions (putative peptides) were fragmented by tandem mass spectrometry (MS/MS). An amino acid sequence was assigned using the publicly available, web-based MASCOT (Matrix Science Ltd., London, UK) and/or de novo sequencing.

Positional scanning combinatorial library and peptide synthesis

Positional scanning combinatorial libraries (PSCLs) were synthesized as previously described (Pinilla et al. 1999). In the PCSL, each pool in the library contains randomized 9-mer peptides with one fixed residue at a single position. With each of the 20 naturally occurring residues represented at each position along a 9-mer backbone, the entire library consisted of 180 peptide mixtures.

Peptides utilized in screening studies were purchased as crude or purified material from Mimotopes (Minneapolis, MN, USA/Clayton, Victoria, Australia), Pepscan Systems B. V. (Lelystad, Netherlands), A&A Labs (San Diego, CA, USA), Genescript Corporation (Piscataway, NJ, USA), or the Biotechnology Center at the University of Wisconsin-Madison (Madison, WI, USA). Peptides synthesized for use as radiolabeled ligands were synthesized by A&A Labs and purified to >95% homogeneity by reverse-phase HPLC. Peptide purity was determined with analytical reverse-phase HPLC and amino acid analysis, sequencing, and/or mass spectrometry. Peptides were radiolabeled utilizing the chloramine T method (Sidney et al. 2001). Lyophilized peptides were resuspended at 20 mg/ml in 100% DMSO, then diluted to required concentrations in PBS + 0.05% (v/v) nonidet P40 (Fluka Biochemika, Buchs, Switzerland). SIV peptide sequences were derived from the SIVmac239 sequence, Genbank accession M33262 (Kestler et al. 1990).

MHC purification and peptide-binding assays

Mamu class I MHC purification was performed by affinity chromatography using the W6/32 and/or B123.2 class I antibodies, as previously described (Allen et al. 2001; Loffredo et al. 2009; Sidney et al. 2005). Protein purity, concentration, and depletion efficiency steps were monitored by SDS-PAGE.

Quantitative assays for peptide binding to detergent-solubilized MHC class I molecules were based on the inhibition of binding of a high affinity radiolabeled standard probe peptide and performed as detailed in prior studies (Loffredo et al. 2009; Schneidewind et al. 2008; Sidney et al. 2001; Sidney et al. 2005). Peptides were tested at six different concentrations covering a 100,000-fold dose range in three or more independent assays. Control wells to measure nonspecific (background) binding were also included. In each experiment, a titration of the unlabeled version of the radiolabeled probe was also tested as a positive control for inhibition.

The radiolabeled peptides utilized for the Mamu-B*039:01 and Mamu-B*052:01 assays were 3422.03 (sequence YGFSDPLTF) and 3289.0028 (VGNVYVKF), respectively, representing sequences identified by Edman degradation and mass spectrometry analysis (described above). For each peptide, the concentration of peptide yielding 50% inhibition of the binding of the radiolabeled probe peptide (IC50) was calculated. Under the conditions used, where [radiolabeled probe] < [MHC] and IC50 ≥ [MHC], the measured IC50 values are reasonable approximations of the true K d values (Gulukota et al. 1997; Sette et al. 1994).

PCR-SSP

Peripheral blood mononuclear cells (PBMCs) from two SIVmac239 challenged and infected Chinese rhesus macaques were supplied by Bioqual (Rockville, MD). The total RNA was extracted from ten million PBMCs using the Qiagen Qiashredder tissue homogenizer and the AllPrep DNA/RNA Mini Kit catalog #80204 and the manufacturer’s protocol (Qiagen, Valencia, CA, USA). First-strand cDNA was synthesized for each sample from 50 ng of RNA using the Super Script III First-Strand Synthesis for RT-PCR kit catalog #18080-51 (Invitrogen, Carlsbad, CA, USA). The cDNA was then used in MHC typing utilizing sequence-specific primers in polymerase chain reaction (PCR-SSP) as previously described (Karl et al. 2008). The presence of a Mamu-B*039:01 specific 500 bp in length product was visualized for both animals by agarose gel electrophoresis.

IFN-γ ELISPOT assay

We performed enzyme-linked immunosorbent spot (ELISPOT) assays to assess SIV-specific T-cell responses, as described previously (Allen et al. 2001). Briefly, 1 × 105 PBMCs were used per well in precoated ELISpotPLUS kits (Mabtech, Mariemont, OH) according to the manufacturer’s instructions for the detection of IFN-γ-secreting cells. All tests were performed in triplicate using individual peptides at 10 μM. The positive control, Con A (Sigma-Aldrich), was used at a final concentration of 5 μg/ml. The negative control wells were devoid of any stimulation. The 96-well plates were incubated for 20 h at 37°C in 5% CO2. Wells were imaged and counted with Zeiss KS EliSpot reader and analyzed. Spots were counted by an automated system with set parameters for size, intensity, and gradient. Background (the mean of wells without peptide stimulation) levels were subtracted from each well on the plate. Responses were considered positive if the mean of the number of SFC was more than 50 spots per million cells and the p ≤ 0.05. The level of statistical significance was determined with a Student’s t test using the mean of triplicate values of the response against individual peptides versus the response against the negative control.

Bioinformatic analysis

Analysis of the PSCL data was performed as described previously (Sidney et al. 2008). Briefly, IC50 nM values for each mixture were standardized as a ratio to the geometric mean IC50 nM value of the entire set of 180 mixtures and then normalized at each position so that the value associated with optimal binding at each position corresponds to 1. For each position, an average (geometric) relative binding affinity (ARB) was calculated, and then, the ratio of the ARB for the entire library to the ARB for each position was derived. We have denominated this ratio, which describes the factor by which the normalized geometric average binding affinity associated with all 20 residues at a specified position differs from that of the average affinity of the entire library, as the specificity factor (SF). As calculated, positions with the highest specificity will have the highest SF value. Primary anchor positions were then defined as those with an SF ≥2.4. This criterion identifies positions where the majority of residues are associated with significant decreases in binding capacity.

Secondary anchor designations were based on the standard deviation (SD) of residue specific values at each position. Dominant secondary anchor positions were defined as those where the SD was >3 and the SF <2.4, as well as positions associated with an SD >2 if the SF is between 1.5 and 2.4. Weak secondary anchors have been defined as positions associated with a SD in the 2.5–3 range with an SF <1.5, or an SF in the 1.5–2.4 range with an SD <2.

For single amino acid substitution (SAAS) panels, larger SD values are inherent comparable to PSCL, and therefore, more stringent criteria were employed for defining primary and secondary anchor positions. For SAAS, a primary anchor position is defined as one in which the SF ≥3.5. Secondary anchor positions were defined as those where the SD was >12 with an SF <3.5 or at positions with an SD >10 and an SF in the 1.5–3.5 range.

To identify predicted binders, all possible 9-mer peptides in SIVmac239 sequences were scored using the matrix values derived from the PSCL analyses of Mamu-B*039:01 and Mamu-B*052:01 (Sidney et al. 2008). The final score for each peptide represents the product of the corresponding matrix values for each peptide residue–position pair. Peptides scoring among the top 3.0% (n = 100) were selected for binding analysis.

Phylogenetic reconstruction

We obtained a multiple sequence alignment of 266 full-length Mamu-B nucleotide sequences from the IPD-MHC database release 1.6.0 (Robinson et al. 2010). The human allele HLA-B*07:02 was used as an outgroup. From the aligned MHC sequences, we deleted 51 codons encoding residues that form the binding pockets of the MHC molecules in order to avoid binding specificity-determining positions biasing the phylogeny. For phylogenetic reconstruction, we used the maximum-likelihood-based method Phyml (Guindon and Gascuel 2003) with the HKY model of nucleotide substitution (Hasegawa et al. 1985), a maximum-likelihood estimate of the proportion of invariable sites, four discrete gamma-distributed substitution rate categories, and 1,000 bootstrap runs. The HKY model was selected based on jModeltest (Posada 2008). The phylogenetic trees were visualized using SplitsTree (Huson and Bryant 2006) and Dendroscope (Huson et al. 2007), respectively.

Results

Preference for glycine in position 2 of peptides binding to Mamu-B*039:01

We were interested in defining the MHC:peptide-binding motif for Mamu-B*039:01 because, with a reported frequency of 5.8% in the captive-bred population (Solomon et al. 2010), it represents one of the most frequent Chinese rhesus macaque alleles whose binding specificity has yet to be characterized. Previous studies have demonstrated that the elution and characterization of naturally bound ligands is an effective method for determining the peptide-binding specificity of class I MHC molecules (Hickman-Miller et al. 2005; Kubo et al. 1994). Accordingly, we evaluated the natural ligands associated with soluble Mamu-B*039:01 by sequencing 21 different endogenously loaded Mamu-B*039:01 peptide ligands representing six different peptide fractions (Table 1). The majority of ligands (14 of 21; 67%) were nine residues in length. Ligands of ten residues (n = 4) were also identified. Ligands of 8, 11, and 12 residues were only rarely found (one instance each). These observations suggest that nonamers represent the preferred size for peptide ligands bound to Mamu-B*039:01.

Table 1 Endogenous Mamu-B*039:01 ligands identified by MS/MS sequencing analysis

Next, we aligned the endogenously bound peptides and tabulated the frequency of various amino acid residues at each position (Fig. 1). The results indicated an overwhelming preference for glycine (G) in position 2, as demonstrated by its presence in 20 of the 21 natural ligands (95%). At the C-terminus, aromatic residues phenylalanine (F) and tryptophan (W) accounted for 62% of residues observed (13/21) and the aliphatic residues leucine (L) and isoleucine (I) accounted for the remaining eight (38%). This information identifies position 2 and the C-terminus as the likely main anchor positions for Mamu-B*039:01 binding.

Fig. 1
figure 1

Residue frequency summary of sequenced Mamu-B*039:01 ligands. Specific amino acids are indicated and color-coded if present at a given position in greater than 10% of ligands identified. White bars represent all other unspecified residues

To confirm that the sequenced peptides were indeed natural ligands of B*039:01, a binding assay was developed using purified MHC, as described in the “Materials and methods” section. Using this assay, we were able to confirm that the eluted ligands, the basis of the preliminary motif, do indeed bind Mamu-B*039:01 with high affinity. Specifically, 20 of the 21 endogenous peptides identified bound with IC50s of 50 nM or better.

Derivation of a detailed quantitative motif

The Mamu-B*039:01 binding capacity of a nonamer PSCL was determined next. As shown in Fig. 2, the PSCL analysis confirmed the preliminary motif defined on the basis of the pool sequencing data. Specifically, using the criteria outlined in the “Materials and methods” section, position 2 and the C-terminus were identified as the primary binding anchors, with G being dominant at position 2 and aromatic residues F and W preferred at the C-terminus. Aliphatic, hydrophobic residues I and L were also tolerated at the C-terminus, as well as positively charged lysine (K). Additionally, positions 1 and 3 were identified as dominant secondary anchors. A pictorial summary representation of the Mamu-B*039:01 motif is shown in Fig. 3a.

Fig. 2
figure 2

PSCL-derived matrix describing 9-mer binding to Mamu B*039:01. The PSCL was tested for binding, the data analyzed, and primary and secondary anchor positions defined, as described in the “Materials and methods”. Values shown represent the average relative binding (ARB) of the corresponding library relative to other pools with the same fixed position. Values have been normalized to the optimal residue at the corresponding position. SD indicates the standard deviation between the ARB of pools at the same position. SF is the specificity factor, calculated as described in the “Materials and methods” representing the ratio of the average binding of the entire library to the average of pools at the indicated position. At the primary anchor positions (SF > 2.4; blue shading), the most preferred residues, associated with an ARB > 0.1 are highlighted by bold yellow font. Green shading highlights secondary anchor positions. The library average binding for Mamu-B*039:01 was 1,772 nM

Fig. 3
figure 3

Map of the a Mamu-B*039:01, b H-2 Dd, and c Mamu-B*052:01 motif. Pictorial summary of the PSCL matrices, indicating primary (in blue) and secondary (in green) anchor positions with associated preferred and tolerated residues. Also represented are those residues that are deleterious to binding at indicated positions (brown font)

Peptides with a glycine at the second position are antigenic in Mamu-B*039:01 animals

To validate the proposed motif at the functional level, we analyzed cellular responses from two Chinese rhesus macaques previously infected with SIVmac239 and positive for Mamu-B*039:01 by PCR-SSP, as detailed in the “Materials and methods” section. Cellular immune responses in these animals were evaluated using IFN-γ ELISPOT assay and SIVmac239-derived peptides. Specifically, the PCSL data were used to generate a predictive algorithm by standard methods (Southwood et al. 2011). SIVmac239 sequences scoring in the upper 3.0% range (n = 100) were synthesized and tested for in vitro MHC:peptide-binding affinity, 40 of which bound Mamu-B*039:01 with an IC50 <500 nM. These peptides were tested in IFN-γ ELISPOT assays using cells from B*039:01 SIVmac239-infected animals to identify antigenic peptides (Table 2). Of the 40 SIV peptides tested, 28 contained a glycine at the second position.

Table 2 SIVmac239 peptides in Mamu-B*039:01-positive Chinese rhesus macaques

In terms of responses, a total of 8 of the 40 peptides generated positive responses in at least one animal (see Table 2). Specifically, ChRh 1 had three peptides with average SFCs of >50. ChRh 2 had five peptides with an average SFC of >50. Out of those eight responses, three responses were induced by peptides which had a glycine at position 2, indicating that these peptides are antigenic in the setting of SIV infection.

Similarities of the B39 motif with the known H-2 Dd motif

Given the unusual preference of a glycine at the second position of the MHC:peptide-binding motif, we investigated whether other MHC molecules might also possess this particular characteristic. A thorough literature search revealed that the only other known MHC class I molecules with a reported preference of glycine at position 2 specificity were the murine Dd MHC molecule (Corr et al. 1993; Li et al. 1998) and the Mamu-B*004 molecule (Dzuris et al. 2000). Previously published data for H-2 Dd demonstrated a strong requirement for G in position 2, along with proline (P) in position 3 and hydrophobic residues at the C-terminus (Fig. 3b) (Corr et al. 1993). Data for Mamu-B*004 likewise showed a specificity for G in position 2, but displayed no strict dependence on a C-terminal anchor (Dzuris et al. 2000).

Similarities of the Mamu-B*039:01 motif with the Mamu-B*052:01 motif

Concurrently with these experiments, Mamu-B*052:01, expressed in 8.4% Indian rhesus macaques, was also being characterized. In a similar fashion to Mamu-B*039:01, we eluted and sequenced peptides from this MHC molecule. Sixteen peptides, consisting of 8-mers, 9-mers, and 10-mers, were eluted and characterized. The most frequent size observed was 8-mers, representing half of the endogenous peptides identified. The other half was split equally between 9- and 10-mers at four peptides (25%) each. Similar to Mamu-B*039:01, a dominant preference for G at position 2 was noted, as all peptides possessed this residue (Table 3).

Table 3 Endogenous Mamu-B*052:01 ligands identified by MS/MS sequencing analysis

The motif was further investigated by assessing the binding capacity using a PSCL (Fig. 4), as utilized in Mamu-B*039:01 characterization. Analyzing the data as described in the “Materials and methods” section, position 2 was identified as a secondary anchor position, where G was identified as a preferential, albeit not dominant residue. At this point, it was noted that 8-mer peptides were the dominant species for Mamu-B*052:01, comprising half of endogenous peptides sequenced. The PSCL, however, was based on 9-mers. To ensure thorough characterization of the motif, and since an 8-mer PSCL was not available, we tested a set of SAAS of an 8-mer ligand eluted from Mamu-B*052:01 (sequence VGNVYVKF; IC50 7.4 nM). This method identified position 2 as a primary anchor with a dominant preference for G (Fig. 5), with every other residue associated with a greater than two orders of magnitude reduction in binding capacity. Similarly to the case of Mamu-B*039:01, the C-terminus was also identified as a primary anchor position, with aromatic and aliphatic residues being preferred. Position 5 was identified as a dominant secondary anchor position by definition. The summary motif for Mamu-B*052:01 is shown in Fig. 3c.

Fig. 4
figure 4

PSCL-derived matrix describing 9-mer binding to Mamu B*052:01. The PSCL was tested for binding, the data analyzed, and primary and secondary anchor positions defined, as described in the “Materials and methods”. Values shown represent the average relative binding (ARB) of the corresponding library relative to other pools with the same fixed position. Values have been normalized to the optimal residue at the corresponding position. SD indicates the standard deviation between the ARB of pools at the same position. SF is the specificity factor, calculated as described in the “Materials and methods” representing the ratio of the average binding of the entire library to the average of pools at the indicated position. At the primary anchor positions (SF > 2.4; blue shading), the most preferred residues, associated with an ARB > 0.1 are highlighted by bold yellow font. Green shading highlights secondary anchor positions. The library average binding for Mamu-B*052:01 was 1,422 nM

Fig. 5
figure 5

SAAS-derived matrix describing 8-mer binding to Mamu-B*052:01. A panel of single amino acid substitution analogs of the Mamu-B*052:01 binding 7.4 (sequence VGNVYVKF) was tested for binding, and the data analyzed, as described for the PSCL in the “Materials and methods”. Because of the larger SD in the panels of SAAS, compared to PSCL, more stringent criteria to define primary and secondary anchor positions have been utilized, as also described in the “Materials and methods”. At the primary anchor positions (SF > 3.5; blue shading), the most preferred residues, associated with an ARB > 0.1 are highlighted by bold yellow font. Green shading highlights secondary anchor positions. The library average binding for Mamu-B*052:01 was 18.0 nM

A unique B pocket structural motif is shared between Mamu-B*039:01, H-2 Dd, and Mamu-B*052:01

Based on the crystal structures of MHC class I molecules (Achour et al. 1998; Li et al. 1998; Madden 1995; Saper et al. 1991), the MHC B pocket is hypothesized to interact with the residue in position 2 of peptide ligands. Previously (Sette et al. 2003), we had investigated similarities in position 2 peptide-binding specificity between human and mouse class I molecules and found that the similarities were associated with very different B pocket structures, reflecting convergent evolution. Given the similar and unusual preferences of Mamu-B*052:01, Mamu-B*039:01, and murine H-2 Dd for G in position 2, we next investigated whether a common structural motif could be identified. We further sought to determine if the shared specificity would be reflective of convergent evolution or the result of common ancestry.

Accordingly, we analyzed the residues in positions 7, 9, 24, 34 45, 63, 66 67, 70, and 99 which compose the B pocket of MHC class I molecules (Achour et al. 1998; Li et al. 1998; Madden 1995; Saper et al. 1991) of Mamu-B*039:01, H-2 Dd, and Mamu-B*052:01. Surprisingly, this analysis revealed that all three alleles had very similar B pocket structures (Table 4). Specifically, Mamu-B*039:01, H-2 Dd, and Mamu-B*052:01 shared identical residues in positions 7, 24, 34, 63, 66, and 67, where tyrosine (Y), glutamic acid (E), valine (V), E, arginine (R), and alanine (A) were found, respectively. With the exception of an A/Y dichotomy in position 99, in which alanine is a small nonpolar residue and tyrosine is a large polar aromatic, variations at other positions were more conservative in nature.

Table 4 Alignment of B pocket residues of alleles with a preference for G2 reveals a unique structural motif

To determine if this structural motif was found in other primate MHC class I alleles, we next compiled the corresponding B pocket residues for all other available full Mamu and HLA sequences. This analysis revealed that the residues present in positions 24 (E) and 66 (R) of Mamu-B*039:01, H2-Dd, and Mamu-B*052:01 were relatively frequent in other macaque class I sequences (Table 4). In macaques, R was present in position 66 in 13% and E in position 24 in 15% of the 650 Mamu alleles analyzed; as a tandem, they were found in an appreciable 5.7% of the macaque sequences. As a whole, six Mamu alleles (Mamu-B*004:01, Mamu-B*029:01:01, Mamu-B*029:01:02, Mamu-B*029:02, Mamu-B*029:03, and Mamu-B*032:01) matched the entire consensus sequence shown in Table 4. By contrast, these residues, as a pair or individually, were not found in any of the over 945 HLA alleles analyzed (A in position 67, fairly common in macaques, was also not found in any HLA sequence). These differences in frequency are highly significant (p < 0.0001 by the Fisher’s exact test). These observations demonstrate that the structural similarities associated with the specificity for G in position 2 are not uncommon in macaques while this type of peptide motif is not found in humans.

Taken together, the data in this section have defined a common B pocket structural motif for alleles with specificity for G in position 2. This structural motif was associated with the presence of E in position 24 and R in position 66. The presence of these residues in the corresponding positions is found in appreciable frequencies in macaques, at both the phenotypic and genotypic levels, and absent in humans.

Evolutionary origin of the pocket

We next examined whether the G2 structural motif in macaques might be the result of convergent evolution of Mamu-B*039:01 and Mamu-B*052:01 or a reflection of common ancestry. For this analysis, we reconstructed a phylogenetic tree for all available full-length Mamu allele sequences, five common mouse alleles, and a set of common HLA alleles (online resource 1). As expected, the mouse alleles are only distantly related to the primate MHC alleles, confirming that the emergence of the G2 motif in macaques and mice is the result of convergent evolution. Mamu-B*039:01 and Mamu-B*052:01, however, cluster together, suggesting that the shared structural motif is due to a common ancestry. In order to reveal the phylogenetic relationship between Mamu-B alleles in more detail and independently of the binding motif determining residues, we omitted 51 codons encoding the residues that form the pockets in the peptide-binding groove. In the resulting phylogenetic tree, Mamu-B*039:01 and Mamu-B*052:01 cluster together (Online resource 2). Hence, within the same species, the shared binding specificity of these alleles is the result of a common ancestry, while between species, it is the result of convergent evolution.

Discussion

In this study, we have described the peptide-binding specificity of the macaque MHC class I molecules, Mamu-B*039:01, and Mamu-B*052:01. Both molecules were found to preferentially bind peptides with glycine at the second residue. Previously, Mamu-B*004 binding of a dominant T-cell epitope was found to be dependent upon G in position 2 (Dzuris et al. 2000). While more detailed analyses of the specificity of this molecule were not undertaken due to the unavailability of reagents, pocket analyses suggest that this allele may also be a member of the supertype delineated in the present study. Also, previous studies revealed that the mouse MHC molecule H-2 Dd also has a preference for glycine at the second position (Corr et al. 1993; Li et al. 1998). Strikingly, our analysis demonstrated that this commonality hinges on convergent evolution independently generating the same pocket structure in distantly related MHC genes. Even more strikingly, this unique specificity is encoded in about 6% of macaque class I genes but is totally absent in humans.

An analysis of all the residues comprising the MHC B pocket of these macaque (Mamu) molecules, as well as the mouse molecule Dd, revealed that they share very similar B pocket structures. Together, the three molecules have identical residues in six of the ten positions hypothesized to form the B pocket and largely chemically conserved residues at the remaining positions. Most striking, however, is the shared presence of E in position 24 and R in position 66. These two residues, in these positions, are not found in any of the almost 1,000 HLA alleles sequenced thus far.

Crystal structures of peptides bound to H-2 Dd (Achour et al. 1998; Li et al. 1998) have suggested the presence of R in position 66 likely drives the requirement for glycine in position 2 of the corresponding peptide ligands. In both reported structures, the arginine at position 66 is shown to point down into the binding groove, where it forms salt bridges with E24 and E63. Contact between R66 and Y45 is also noted. Together, these contacts result in a narrowing of the B pocket, such that the presence of any residue except G in position 2 would likely result in steric clashes with R66.

Since B*039:01 and B*052:01 share the E24-E63-R66 structural motif with H-2 Dd, it is reasonable to assume that a similar narrowing of the B pocket is present and drives the preference for G in position 2 of their ligands. The observation that both Mamu-B*039:01 and -B*052:01 can bind ligands with residues other than G in position 2, albeit with lower avidity, does suggest that there may be correspondingly less hindrance in their B pockets than is the case for H-2 Dd. This observation may be due to the fact that these two Mamu molecules have, respectively, L and methionine (M) in position 45, located on the floor of the B pocket. These residues occupy less volume than the corresponding Y present in H-2 Dd and accordingly result in a B pocket with more space available to accommodate larger residues.

Unlike Mamu-B*039:01 and -B*052:01, H-2 Dd has a very strong preference for proline in position 3 (Corr et al. 1993). This specificity is hypothesized to be driven by the presence of a narrow hydrophobic ridge, formed by a pair of tryptophans in positions 97 and 114, that splits the D pocket and reduces its depth and volume (Achour et al. 1998; Li et al. 1998). Besides providing stabilizing interactions with the resulting small pocket, the proline may allow the peptide conformation necessary to ascend the wall formed by the tryptophan pair. Neither Mamu molecule studied here contains the tryptophan pair present in the murine Dd molecule, and therefore, the corresponding hydrophobic ridge is likely not present, thus liberating Mamu-B*039:01 and B*052:01 from a similar constraint in position 3.

Given the absence of these particular amino acid preferences in HLA sequences, we were interested in how nonhuman primate sequences and a mouse sequence have evolved. We analyzed macaque and mouse sequences from a phylogenetic perspective. These analyses revealed that within the same species, the evolution of these sequences is the result of common ancestry, while between species, it is the result of convergent evolution. It is therefore interesting to speculate why these sequences are absent in humans. It is possible that HLA sequences have not evolved to develop this particular motif because the evolutionary pressure leading to the origination of this motif was absent or that the expression of such specificity might be for some reason detrimental in humans. These binding specificities may be associated with disease susceptibility or autoimmunity. A specific event, such as a selective sweep (de Groot et al. 2008; de Groot et al. 2002) or bottleneck effect can also be considered as a possible explanation.

We cannot speculate on the biological significance of the peptide-binding motif elucidated herein for mice and macaques. Besides not being found in humans, the hallmark E24-R66 structural motif has also not been reported to date in most other nonhuman primates, including gorillas, chimpanzees, orangutans, olive baboons, pigtail macaques, and crab-eating macaques, as well as rats, sheep, and canines (Robinson et al. 2010).

On the other hand, it is found in a few cases in African green monkeys and swine and in several instances in cattle. The disparate phylogenetic relationships between the species that do show the motif strengthen the idea that it is the result of convergent evolution. At this time, however, we cannot speculate as to the specific mechanism(s) driving generation and maintenance of this motif, although the MHC-I richness, as seen in rhesus macaques, African green monkeys, rodents, and swine, might play a role in maintaining some “specialized” peptide-binding specificity.

In summary, herein we describe a new supertype motif that encompasses MHC molecules that preferentially bind peptides containing the glycine residue at the second position. These findings suggest that novel evolutionary mechanisms result in this particular MHC peptide-binding motif originating independently in nonhuman primates and murine species, but being totally absent in human MHC molecules.