Introduction

Coacervation1 refers to the liquid–liquid phase separation (LLPS) of a homogeneous polymer solution into two distinct phases: a concentrated macromolecule-rich (or coacervate) phase and a dilute macromolecule-depleted phase2. Coacervation can occur between two oppositely charged polyelectrolytes (complex coacervation)3 or from self-association of a single polymer (self- or simple-coacervation)4. While coacervation studies were initiated in the field of biopolymeric colloids, in recent years LLPS has attracted considerable interest from life scientists5,6 with numerous studies showing its role in organizing biomolecules in living cells via formation of membraneless organelles7,8,9,10,11. Another less recognized but increasingly appreciated biological role of LLPS is associated with the assembly of extracellular, load-bearing structures12. A well-known example is tropoelastin, which undergoes self-coacervation upon secretion into the extracellular matrix where it self-assembles to form elastic fibers that provide strength and resilience to elastic tissues4. Coacervation has also been recognized to play a key role in natural bioadhesives secreted by marine invertebrates (for example the sandcastle tubeworm3 or mussels13) and to be involved in the formation of biological composite materials. In particular, we have recently identified and sequenced a family of proteins called histidine-rich beak proteins (HBPs) that are the main load-bearing component of the hard beak14 of the jumbo squid (Dosidicus gigas).

Recent studies of proteins involved in LLPS have revealed that such proteins usually belong to the family of intrinsically disordered proteins (IDPs) or contain intrinsically disordered regions (IDRs). IDPs that drive LLPS are typically characterized by conformational heterogeneity at equilibrium and by molecular motions that span timescales from ns to ms15. They usually also exhibit a low sequence complexity with a modular organization of their primary structure6,16,17. As a result, they lack a well-defined three-dimensional structure typical of globular proteins. It has been suggested that various intra- or intermolecular interactions are involved during LLPS of IDPs/IDRs, for example multivalent (cooperative), electrostatic, hydrophobic, or cation-π interactions6,10. Structure–function relationships of IDPs have primarily been obtained by site-directed mutagenesis, establishing the contributions of individual residues to the phase separation process18,19,20,21,22. However, molecular-scale interactions behind LLPS are still sparsely understood. A few NMR studies have provided direct experimental evidence linking protein sequence and structure with the ability to undergo LLPS. For example, a combined solution and solid-state NMR study on elastin-like peptides (ELPs) that exhibit LLPS through hydrophobic interactions triggered by temperature changes established a model by which the final biomaterial structure is self-assembled23. Solid-state NMR experiments have also been used to study the low complexity domain of the FUS24 and TDP-43 (ref. 25) RNA binding proteins, which undergo LLPS and in the pathological state may lead to the formation of insoluble fibril-like structures26.

A central characteristic of HBPs is the presence of repetitive regions of low complexity amino acid squence in their C-termini. Such molecular architecture is often found in extracellular IDPs with LLPS properties that are involved in the formation of biological structures with a load-bearing function27, for example tropoelastin28, resilin29, abductin30, and spider silk31. These repetitive regions are often enriched with hydrophobic residues that interact under specific conditions to trigger LLPS, which is a first step in the self-assembly process of the load-bearing tissue12. Besides hydrophobic residues, the repeats found in HBPs additionally contain a significant fraction of ionizable histidine (His) side chains. This feature is unique, and thus we selected HBP-1 as a model structural IDP to shed light on sequence motifs that govern LLPS as well as on intermolecular interactions stabilizing the coacervate phase.

Here, we combine mutagenesis studies with both solution- and solid-state NMR spectroscopy to investigate the self-coacervation process of HBPs. We systematically explore the HBP-1 sequence and identify that the motif repeat GHGLY drives LLPS. By studying various HBP-1-derived peptide sequences we find that when at least two copies of such repeats and a linker sequence are included, LLPS can be induced over a broader range of conditions (pH and salt concentration). Alternatively, at least four GHGLY tandem repeats must be present in order to trigger self-coacervation. Within this motif we show that His residues serve as a molecular switch: upon pH change, they first undergo deprotonation followed by hydrogen bonding with Tyr. Finally, using solution-, solid-state NMR, and small angle X-ray scattering (SAXS) we demonstrate that clustering of Tyr residues is critical to stabilize coacervate microdroplets.

Results

HBP-1 is structurally disordered in solution

HBP-1 possesses primary structure features characteristic of IDPs with LLPS properties (Supplementary Fig. 1). In a recent study, we showed using circular dichroism (CD) and SAXS that it has a disordered molecular structure in solution that transitions to a more ordered form in the coacervate state, and proposed that hydrophobic modular penta-repeats from the C-terminus are key to its self-coacervation process32. To verify these assumptions and investigate the structural features of the protein, we carried out a standard set of double- and triple-resonance NMR experiments with soluble recombinant HBP-1. As expected, NMR results indicated that the protein lacked a well-defined three-dimensional structure in solution: the 1H-15N heteronuclear single quantum coherence (HSQC) spectrum (Fig. 1a) showed narrow distribution of the cross-peaks, which is typically observed in IDPs with LLPS properties25,33,34. Analysis of the Cα and Cβ chemical shifts of the assigned residues did not show significant deviations from random coil values, validating that the monomeric HBPs are uniformly disordered (Supplementary Fig. 2).

Fig. 1
figure 1

1H-15N-HSQC spectra of HBP-1 at different pH values. a HBP-1 in the initial solution state at pH 3.3. b Dilute phase after LLPS at pH 6.5 (after sedimentation of coacervate microdroplets). c Overlay of the two spectra. Spectra acquired at 298 K and a protein concentration of 2 mg mL−1 (130 μM).

C-terminal region of HBPs is involved in pH-dependent LLPS

LLPS of HBPs is triggered by changes in pH and ionic strength32. HBP-1 underwent LLPS at a minimal concentration of 20–30 µM in a narrow pH range 6.5–7.5, which is close to the proteins’ isoelectric point (predicted pI = 6.03) and could be broadened by increasing protein and salt concentration (phase diagrams presented in Supplementary Fig. 3). To precisely probe the residues involved in LLPS of HBP-1, we recorded a set of 1H-15N-HSQC spectra with a gradual increase of the pH from 3.3 (soluble state) (Fig. 1a) to 6.5 (at which point LLPS was initiated (Supplementary Fig. 4)). Finally, we measured the spectrum from the diluted phase after LLPS, when the coacervate microdroplets had sedimented (Fig. 1b). The overlay with the spectrum acquired in initial conditions (Fig. 1c) indicated the absence of resonances assigned to glycine (Gly), His, alanine (Ala), and leucine (Leu) residues located mainly in the C-terminal modular repetitive region, suggesting that these residues were involved in transient interactions that were absent at acidic pH. As a control we acquired a set of spectra at 75% lower concentration compared to the initial conditions (Supplementary Fig. 5) and at lower temperature (279 K vs. 298 K in initial conditions, Supplementary Fig. 6) to probe possible exchange between monomeric and oligomeric states or exchange with water molecules, respectively. For both experiments at pH 6.5, the intensity losses of the same cross-peaks were detected, confirming the specific involvement of these residues (located mostly in the modular repeats of HBP-1) during LLPS.

Analysis of modular repeats driving phase separation of HBPs

To study how the C-terminal modular domains’ arrangement influences self-coacervation of HBP-1, we designed a series of sequence variants (Fig. 2a–d, full sequences in Supplementary Fig. 7) and investigated their ability to phase separate at various pH and salt (NaCl) concentration using optical microscopy (Fig. 2e, f). First, we created a protein mutant lacking the first 66 amino acids but containing all modular repeats of the C-terminus (V1-C). This mutant underwent phase separation and formed coacervates at similar protein concentration and pH range compared to the full-length protein, confirming our hypothesis that C-terminal modular repeats are responsible for its phase separation behavior. Next, we studied a variant lacking the first 31 amino acids of the repetitive region (V2-C). This variant formed coacervates similarly to V1-C and HBP-1 wild type but required a slightly higher protein concentration (ca. 30 µM), indicating that the full length of the modular region was not required to induce phase separation.

Fig. 2
figure 2

Analysis of LLPS properties of HBP-1 N- and C- terminal variants and peptides. a Amino acid sequence representation of HBP-1 protein. The repetitive region (G67–G145) is presented with modular repeats indicated with different color shades for motifs containing His (blue) or hydrophobic residues (red), and for the GHGLY motif (green). Non-repetitive N- and C-terminal regions are marked in gray. b C-terminal variants (V1-C containing the whole repetitive region, and V2-C truncated at position G98). c N- and C- variants obtained by trypsin cleavage. d Synthetic peptides. The same color marking was used for all peptides shown. Full amino acid sequences of all proteins and peptides are presented in Supplementary Figs. 1 and 7. Region of the HBP-1 sequence indicted in brackets. Variants that undergo LLPS marked with *. e Phase diagrams (protein or peptide concentration (C) on x-axis and pH on y-axis) at low (0.1 M) and high (1 M) salt concentrations, illustrating the conditions required to induce LLPS. As indicated in the upper-left panel (HBP-1), at low protein concentration only one phase is present (soluble protein). When LLPS occurs two phases co-exist, i.e. protein rich phase (coacervate microdroplets/hydrogel) and protein depleted diluted phase (the boundary lines between two phases are drawn as a guide for the eye). Black empty dots indicate pH and protein concentration at which optical micrographs presented in panel (f) were obtained. Source data are provided as a Source Data file. f Examples of optical micrographs taken after LLPS of all the variants and peptides described above and of HBP-1 (used as a control). Micrographs of V5-N, V6-N and V7-N represent hydrogels.

To map out the minimal sequence length required for phase separation, we designed a series of HBP-1 mutants with various lengths of the repetitive region. The mutants were created by introducing a single Lys at different pre-selected locations, allowing to utilize trypsin cleavage to tune the length of the cleaved fragments following enzyme digestion as well as to obtain variants exhibiting different lengths of the repeating domains (Fig. 2c and Supplementary Fig. 7b).

We then analyzed the LLPS behavior of all variants as a function of protein concentration and pH, and at various salt concentrations and drew the phase diagrams shown in Fig. 2e. For N variants, LLPS occurred for V5-N to V7-N only at high salt concentrations. On the other hand, LLPS could not be induced for V3-N and V4-N at all tested conditions. We also observed that as peptide length increased, LLPS occurred over a broader range of conditions. Thus, for V7-N LLPS could be induced at pH as high as 8 provided the peptide concentration was at least 500 µM. For V6-N, the highest pH at which LLPS was observed was 7 (and a minimal peptide concentration of 400 µM), whereas for V5-N no LLPS occurred above pH 6. Correlating the results with the peptide design points out towards the importance of the GHGLY motif (marked in green in Fig. 2) and the peptide length. For the longer V6-N and V7-N peptides containing two GHGLY motifs, LLPS could be induced over a wider range of conditions, whereas for the shorter V3-N and V4-N variants containing only one copy of GHGLY, no LLPS was observed no matter the conditions. And for the intermediate length V5-N with one GHGLY motif, LLPS could be induced but only under narrow conditions. Moreover, the separated phases of the longer variants exhibited a different morphology compared to the full-length protein (Fig. 2f), forming dense hydrogel-like structures that did not disperse into the surrounding buffer. This behavior may be linked to the stronger hydrophobicity of V5-N to V7-N compared to other variants, which may favor hydrogel formation by hydrophobic interactions.

A similar trend was observed for the C-terminus variants. V3-C, which contained the longest section of the repetitive region, phase-separated at the lowest protein concentration (30 µM at pH 8) and in the broadest pH range among all tested variants. On the other hand, the shorter Vx-C variants exhibited LLPS under a narrower range of conditions and required higher protein concentrations.

To further assess the role of the GHGLY motif, we compared the coacervation ability of the HBP-1 derived GY-23 peptide (containing two GHGLY copies)32 with two other synthetic peptides made of very similar fragments of HBP-1 repeats (GA-25 and GH-25), but harboring only one GHGLY motif (Fig. 2d and Supplementary Fig. 7c). Only GY-23 phase-separated, forming coacervate microdroplets suspended in solution as well as a dense hydrogel-like structure (condensed, solid-like coacervates, Fig. 2f). In contrast, GA-25 and GH-25 remained in solution in all tested buffer conditions (Fig. 2f). We note that sequence motifs similar to GHGLY are also present in the C-terminal of HBP-2 protein, which contains seven copies of the GHGxY motif (where x can be Val, Pro, Leu) arranged in tandem (Supplementary Fig. 8). A peptide (HBP-2-pep) composed of five copies of GHGxY was previously shown to phase separate and form coacervates in the same way as the full-length protein14.

In order to confirm the central role of GHGxY motifs on LLPS of HBP-2, we utilized trypsin cleavage to obtain shorter fragments of HBP-2 and tested their ability to phase separate. Since the protein possesses only two trypsin recognition sites at positions R81 and R172, we obtained the N-terminal (M1-R81) fragment that lacked the modular repeats, the C-terminal (A82-R172) containing the whole repetitive region, and a short G173-Y175 peptide that was discarded (Supplementary Fig. 8b). As expected only the C-terminal fragment phase-separated into coacervates (Supplementary Fig. 8c). Next, we designed a series of short peptides containing different arrangement of repetitive units present in HBP-1 and HBP-2 (Fig. 3a) and analyzed their phase separation behavior in the same way as for HBP-1 variants (Fig. 3b, c). Phase separation was observed for all 25-mer peptides containing two GHGLY motifs flanking the central region composed of three copies of the GAGFA or GHGLH sequences, as well as for a 20-mer peptide (GY-20) made of four copies of GHGLY motif arranged as tandem repeats. In contrast, no phase separation was observed when the peptide length was reduced to 15 amino acids, for example when three copies of the GHGLY motif were arranged in tandem (GY-15-V1) or when the GAGFA motif was flanked by GHGLY (GY-15-V2). Similarly, no phase separation was observed for decapeptides composed of one or two GHGxY motifs or for pentapeptides GHGLY or GAGFA, respectively. Moreover, peptides with LLPS ability exhibited various rheological characteristics of the separated phase. GY-25-V1 peptide containing three copies of hydrophobic GAGFA motif phase-separated into a dense and compact hydrogel. On the other hand, GY-25-V2 and GY-20 peptides composed of less hydrophobic, His-rich motifs, only formed microdroplets (Fig. 3c), while GY-23 peptide containing both types of motifs separated into microdroplets as well as hydrogel-like condensed coacervates (Fig. 2f).

Fig. 3
figure 3

LLPS properties of HBP-1 and -2 derived peptides. a Sequences and their ability to undergo LLPS. b Phase diagrams of the peptides that exhibited LLPS properties. Source data are provided as a Source Data file. c Sample morphology after LLPS by optical microscopy (left micrograph: hydrogel; middle and right micrographs: microdroplets). d Site-directed mutants of GY-23 peptide and their LLPS ability. Color marking of HBP-1 modular repeats is identical to the color-coding described in Fig. 2. All samples were tested in the same conditions in various pH values and salt concentrations.

Taken together these results indicate that when at least two copies of the GHGLY motif are present in the tandem repeats, the phase separation ability is greatly enhanced. However, this condition is not sufficient and GHGLY copies must additionally be separated by a spacer composed of at least three copies of GAGFA or GHGLH motifs, or a combination of GAGFA/GFA and GHGLH motifs. Alternatively, the peptide must contain at least four tandem repeats of GHGLY motif to phase separate. To corroborate the role of Tyr in phase separation, we prepared two GY-23 variants in which one of two Tyr was substituted with Ala (Fig. 3d). Phase separation did not occur in both cases in all tested conditions, suggesting that it is critical to have two Tyr residues to drive phase separation. Finally, we investigated the LLPS ability of the GY23(H/K) mutant in which all His were substituted with Lys. This peptide did not undergo LLPS at all tested conditions, showing that the role of His residues in HBP peptides is not limited to shifting the net charge of the peptides as the pH changes. Instead, this result indicates that histidine residues are involved in additional interactions driving the LLPS process, as discussed below.

Molecular interactions initiating LLPS

To assess the role of Tyr residues and identify the detailed molecular interactions triggering and stabilizing LLPS, we carried out NMR spectroscopy studies. First, we acquired the 1H-15N-HMQC spectrum in solution as well as a set of triple-resonance NMR spectra for peptide backbone assignment of soluble GY-23 at pH 3.3. The 1H-15N-HMQC spectrum yielded well-resolved peaks that could be fully assigned based on the carbon chemical shifts values obtained from the 3D experiments (Fig. 4a). Observed Cα and Cβ chemical shifts showed no significant differences from the average values of random coil structures (Supplementary Fig. 9), confirming that the peptide displayed no propensity towards a specific secondary structure.

Fig. 4
figure 4

NMR spectra of GY-23 peptide at different pH values (cross-peak trajectories marked with dashed lines). a 1H-15N-HMQC spectrum at initial conditions of pH 3.3. b Overlay of 1H-15N-HMQC spectra acquired between pH 3.3 and 7 (pH 7: initiation of LLPS). c, d Overlay of 1H-13C-HSQC spectra of aliphatic (c) and aromatic (d) side chains at pH 3.3 and 7. The inset shows Tyr 1Hδ-13Cζ cross-peaks at pH 7. e Overlay of long-range 1H-15N-HMQC spectra of His side chains. The resonance assignments in the protonated state (pH 3.3) are indicated. f Long-range 1H-15N-HMQC spectrum at pH 7 acquired within 5 min after pH adjustment showing transient stabilization of His ε-tautomer with characteristic resonance at ca. 250 ppm marked with the arrow. In the spectrum acquired after 30 min of pH adjustment, this cross-peak was significantly attenuated (Supplementary Fig. 11). Spectra acquired at 298 K and peptide concentration of 1.5 mM. The trajectories (chemical shift values vs. pH) of 13C atoms of Tyr as well as 13C and 15N of His are provided in Supplementary Fig. 12.

Next, we titrated the pH of the peptide solution and monitored changes in the 1H-15N-HMQC (Fig. 4b). We did not observe major variations in the peak distribution and relative intensity at pH 4–6 (Supplementary Fig. 10) compared to the initial state (pH 3.3, Fig. 4a), since in these conditions the peptide remained fully soluble (except of the Tyr 23 cross-peak that showed a significant shift when the pH changed from 3.3 to 4 caused by deprotonation of the C-terminal carboxyl group). However, close to the LLPS point (pH 6–7), there was a clear shift and decrease in the relative intensity of all cross-peaks assigned to His residues (Supplementary Fig. 10), as well shifts of all Gly peaks flanking them. In addition, we observed shifts of the cross-peak assigned to Tyr 5 (Fig. 4b).

These results indicated that His and Tyr residues are involved in initiating LLPS. To investigate their role in the initial steps of aggregation, we carried out pH titration experiments on GY-23, where we recorded 1H-13C-HSQC spectra of aliphatic (Fig. 4c) and aromatic (Fig. 4d) side chains of all residues, as well as the long-range 1H-15N-HMQC spectra to monitor the protonation state of nitrogen atoms in the imidazole ring of His (Fig. 4e, f and Supplementary Fig. 11). Increasing the pH led to gradual changes of the chemical shifts of His 13Cα and 13Cβ atoms (Fig. 4c), as well as of 13Cδ and 13Cε atoms of the imidazole ring (Fig. 4d). In addition, when the pH was raised from 3 to 4 the cross-peaks assigned to 13Cα and 13Cβ of the C-terminal residue Tyr 23 significantly shifted in the 1H and 13C dimensions, suggesting that the shift is caused by deprotonation of the C-terminal carboxylic group. We also observed a major shift of the 13Cα cross-peak assigned to Gly 1 (Fig. 4c).

Aromatic 1H-13C-HSQC spectra showed (Fig. 4d) that increasing pH results in gradual shifts of the cross-peaks assigned to 13Cδ2 and 13Cε1 of His residues, caused by deprotonation of the imidazole ring. Resonances assigned to Phe remained unaffected by change of pH between 3.3 and 6.0 but when pH increased to 7.0 we detected a shift of all Phe 1H resonances (Fig. 4d). Tyr resonances showed similar trend, except 13Cζ that started to split at pH 5.0. With further increase of the pH we observed the presence of two distinct cross-peaks 13Cζ atoms of Tyr 5 and Tyr 23 (Fig. 4d, inset). In addition we also observed a split of Tyr 13Cδ resonances into two cross-peaks when pH increased from 6.0 to 7.0. Figure 4e shows changes in chemical shifts of 15N atoms of His imidazole ring during pH titration. At pH 3.3 and 4 all His were fully protonated as indicated by characteristic the 15Nε2 and 15Nδ1 chemical shifts, i.e. 173 ppm and 176 ppm, respectively35. Increasing pH from 4 to 7 led to a gradual deprotonation of the imidazole rings of all His, resulting in the co-existence of the protonated state with two tautomeric forms of the imidazole ring. Critically, we observed that immediately after raising the pH from 6 to 7, only one of four His residues showed transient stabilization of its ε tautomer state since the 15Nδ1 cross-peak also appeared at 250 ppm36 within 5 min after pH adjustment (Fig. 4f). However, the cross-peak intensity was significantly reduced 30 min after pH adjustment (Supplementary Fig. 11a, b), indicating that only one His residue underwent transient stabilization of the tautomeric state, which was likely caused by hydrogen bonding. Since between pH 5 and 7 the chemical shifts of Tyr 13Cζ atoms split into two distinct shifts (Fig. 4d, inset), this suggests that hydrogen bond interaction is taking place between the hydroxyl group of Tyr (donor) and 15Nδ1 of His (acceptor), which may be the first step in the oligomerization cascade. Detailed analysis of chemical shift trajectories presented in Supplementary Fig. 12 further confirm these observations. Moreover, we carried out 3D 15N- and 13C-NOESY experiments with long mixing times and did not observe NOEs between His and Tyr, further supporting the transient character of the Tyr/His interactions.

GY-23 peptide shows partially ordered structure after LLPS

Although IDPs do not exhibit well-defined tertiary structures, there are evidences that coacervate microdroplets of IDPs contain short-range order10. To further study the coacervation at the nanostructural level and assess whether GY-23 coacervate microdroplets exhibited such internal ordering, we investigated their structural features using SAXS. Scattering profiles of GY-23 in acetic acid (pH 3.3) before LLPS and in the coacervation buffer (pH 7.0) after LLPS (both the coacervate and the coexisting dilute phases) are presented in Fig. 5a and were very distinct from each other. The scattering intensity of GY-23 in acetic acid and of the dilute phase after centrifugation had a very low signal-to-noise ratio. Nevertheless, for GY-23 in acetic acid, a weak low-q upturn with an indication of a broad correlation peak between 0.3 and 2 nm-1 was observed, which may be attributed to nanometer-sized peptide oligomers. Dynamic light scattering (DLS) analysis of the peptide in acetic acid (Fig. 5c) indicated the presence of structures with a hydrodynamic diameter (DH) of ca. 8 nm, corroborating the presence of small oligomeric units (assuming DH on the order of 4–8 nm for the 23 residue-long monomeric peptide). As expected, DH increased drastically to around 50 nm at pH 7.0 due to initiation of LLPS.

Fig. 5
figure 5

SAXS and DLS of GY-23 peptide. a SAXS experimental curves of the peptide before and after coacervation. After LLPS, the dilute and coacervate-rich phases were measured following a centrifugation step. The q-3 power-law region of the scattering data is highlighted, with the black line as a guide for the eye. The calculated fit for the peptide assemblies from the IFT method is also presented as a continuous red line. b Corresponding p(r) profile calculated from the SAXS data in (a) using Supplementary Eq. 2. c Hydrodynamic diameter (DH) measured by DLS of GY-23 before (pH 3.3) and after (pH 7.0) coacervation. Correlation functions showing the ‘raw’ data are presented in Supplementary Fig. 14. Source data are provided as a Source Data file.

In contrast, the scattering profile of GY-23 in the peptide-rich phase (Fig. 5a) indicated the presence of much larger peptide aggregates typical for coacervate microdroplets with overall dimensions that exceeded the resolution limit of the SAXS set-up. An indication of a broad correlation peak in the q-region of ~1.5 nm-1 suggested structural features from peptide self-assemblies within the coacervate microdroplets. The low signal-to-noise ratio in this q-region makes it difficult to analyze this feature in detail (however, this correlation peak was confirmed using a more intense synchrotron X-ray source, Supplementary Fig. 13). At q < 1 nm-1, on the other hand, the scattering curve showed an approximate power-law dependence over at least an order of magnitude in the q-range, indicating fractal scattering from the dense peptide assemblies within the coacervate phase.

To further investigate the internal structure of coacervate microdroplets, the pair distance distribution function p(r) was calculated from the SAXS curve using the indirect Fourier transformation (IFT) method (Fig. 5b). The p(r) function reflected large peptide aggregates in the microdroplets with dimensions well-beyond the resolution limit of the SAXS set-up in this study (around 50 nm in real space). Hence, the p(r) was mathematically forced to 0 at r around 100 nm, but this does not represent the overall dimension of the coacervate microdroplets. The analysis of the corresponding SAXS data of the coacervate droplets in buffer at a higher signal-to-noise ratio, recorded at the synchrotron, is presented in the Supplementary Fig. 13. The results indicated that the coacervates microdroplets contained nanostructural features of ca. 2 nm. These features are most likely attributed to oligomeric peptides forming the internal domain structures of the coacervate microdroplets.

Analysis of tyrosine–tyrosine interactions by ssNMR

Since site-directed mutagenesis experiments suggested a critical role of Tyr residues, we synthesized GY-23 containing uniformly labeled (13C and 15N) Tyr residues (Tyr 5 and Tyr 23) and analyzed possible Tyr–Tyr interactions in the condensed, solid-like phase by solid-state NMR. Figure 6a, b show a comparison between the one-dimensional direct- and 1H-13C cross-polarization (CP)-based carbon spectra. Both spectra contain relatively broad lines, indicating that Tyr residues were present in heterogeneous conformational environments since multiple peaks for each Tyr carbon were observed. For example, 13Cα resonances at 53.2 ppm, 57.5 ppm, 58.7 ppm, and carbonyl 13C at 173.3 ppm, 176.8 ppm, 180.8 ppm, respectively, were detected. The presence of strong signals in the CP-based spectrum indicated that most of Tyr moieties were locked in the rigid structure with high dipolar order. No extra sharp peak was observed in the direct-polarization 13C spectrum compared with the CP-based spectrum, indicating the absence of highly flexible Tyr residues and further supporting that Tyr residues were rigidly locked. This is further confirmed by a control experiment in which we recorded the same set of spectra from the sample at pH 6 and 7 (Supplementary Fig. 15). At pH 6, the intensity of the CP signal decreased compared to pH 7, revealing an increased mobility of Tyr residues. Moreover, the directly detected 13C spectrum displayed sharper peaks at pH 6, thus corroborating the increased mobility of Tyr at low pH.

Fig. 6
figure 6

Characterization of molecular interactions driving LLPS of GY-23 peptide by ssNMR. Spectra of 13C-selectively Tyr 5 and Tyr 23 labeled GY-23: (a) directly observed carbon, (b) 1H-13C cross-polarization (CP)-based with carbon detection, (c) DARR (100 ms mixing time), and (d) 1H-13C HETCOR (100 μs mixing time). Examples of correlations indicating Tyr–Tyr interactions are marked with arrows.

The two-dimensional 13C–13C dipolar assisted rotational resonance (DARR) spectrum (Fig. 6c) showed correlations between the two Tyr residues of the peptide, suggesting that they interacted with each other. Moreover, the DARR data clearly indicated that Tyr residues were in heterogeneous chemical environments, implying clustering of Tyr residues close to each other. Tyr–Tyr direct interactions were also corroborated by the heteronuclear correlation (HETCOR) spectrum (Fig. 6d), which shows correlations between aliphatic and aromatic carbon atoms of Tyr attributed to the stacked clustering of two or more Tyr side groups.

Discussion

There has been growing recognition that LLPS is involved inside cells via membraneless organelles6,7,8,10,11,26 as well as in the processing of extracellular load-bearing structures and bioadhesives of various organisms3,4,12,13,14,32,37,38,39. However, sequence motifs and associated inter- and intra-molecular interactions driving phase separation remain sparsely understood. This study enhances our understanding of LLPS phenomena both at the sequence and molecular levels. Our findings show that phase separation of HBPs is mediated through specific GHGxY modular repeats that must be arranged in a specific configuration. Our results also show that the morphology and rheology of separated phases can be tuned from coacervate microdroplets to hydrogels by incorporating hydrophobic GAGFA repeats into a peptide sequence. Based on solution-state NMR measurements, LLPS of HBPs is a multistep process initially triggered by deprotonation of His residues upon pH increase, followed by stabilization of His ε tautomeric state by transient hydrogen bonding with OH group of Tyr residues. We propose that these events eventually promote hydrophobic intermolecular interactions largely controlled by Tyr residues, as well as hydrophobic collapse of the peptides’ central domains as schematically illustrated in Fig. 7. Investigations of the GY-23 coacervate phase by SAXS and solid-state NMR indicated that it possesses partial internal ordering in the nanometer range that is stabilized by aromatic stacking and clustering of Tyr residues. These findings concur with earlier biophysical studies on the full length HBPs showing that a certain degree of protein folding is achieved in the coacervate state32.

Fig. 7
figure 7

Proposed model of pH-dependent LLPS of HBP-derived peptides. At pH 3–4 His residues are protonated, and the peptides form soluble oligomeric units due to electrostatic repulsion between positively charged His side chains. At pH 4–6 gradual deprotonation of His residues occurs, repulsive forces are weaker but still strong enough to keep the peptide oligomers soluble. At pH 6–7 transient interactions take place between His and Tyr residues located within GHGxY repeats (marked in green) leading to specific peptide-peptide interactions that act as nuclei for LLPS. Further increase of pH above 7 leads to Tyr–Tyr intermolecular stacking and intra-molecular interaction of hydrophobic residues that all together trigger LLPS and the formation of microdroplets. If the central domain of the peptide is enriched with the hydrophobic motif GAGFA (marked in red) or with the His-rich motif GHGLH (marked in blue), LLPS is driven by the same sequence of molecular events but eventually leads to the formation of either a hydrogel or coacervate micro-droplets, respectively.

There are a few reports providing a full picture of molecular events leading to LLPS of IDPs23,24,25. One relevant study by Reichheld et al.23 showed that self-coacervation of ELPs is an entropy-driven mechanism mediated by transient interactions between the highly dynamic and disordered hydrophobic domains of ELPs. Hydrophobic interactions led to gradual exclusion of water and salt molecules, eventually allowing chemical crosslinking of ELP monomers to form an elastic network. According to our recent studies, the mechanism of self-coacervation of HBPs is also entropy-driven and involves hydrophobic interactions of repetitive domains32. In contrast to ELPs, those interactions are triggered by deprotonation of His residues, followed by hydrophobic interactions leading to gradual condensation of HBP coacervates. This is in line with our model of squid beak processing in vivo, which assumes that HBP coacervates infiltrate, condensate, and dehydrate a chitin nanofiber scaffold present in the squid beak and finally undergo chemical crosslinking14,40. Therefore, the partial ordering of the HBP coacervates that we observed by SAXS and solid-state NMR may be an intermediate step before the final crosslinking taking place in vivo. Moreover, the formation of solid materials through condensation processes (i.e. transitions from liquid to solid state) of macromolecular assemblies during LLPS is not typical only for extracellular structures. It is a common process observed also inside cells. For example, a heterochromatin protein 1α41 can undergo time dependent condensation into a gel, while RNA binding proteins42 or stress granule proteins43 can form insoluble aggregates often related with pathological states.

There is increasing evidence that π–π stacking is critical to drive LLPS and stabilize phase-separated structures, for example in the mitotic spindle regulatory protein BuGZ19, the nuclear pore protein Nsp1 (ref. 22), or FUS44,45. Another model of LLPS that involves aromatic residues is based on π-cation interactions between positively charged residues (Arg or Lys) and aromatic moieties of Phe or Tyr18,46,47. Our study shows that Tyr–Tyr interactions are critical to stabilize the biopolymer-rich phase after phase separation, but that they must first be activated through interactions with His side groups in a pH-dependent mechanism. To the best of our knowledge, this multistep interaction mechanism has previously not been reported in IDPs and provides a better understanding of pH-responsive LLPS.

Our findings also have implications in the design of stimuli-responsive protein carriers for various therapeutic treatments. Indeed, the family of GHGxY-containing peptides described in this study expands our molecular toolbox of peptides-forming coacervates for therapeutics delivery48,49,50,51 beyond the classical ELPs52,53, in particular offering the added advantages to design and tune pH-responsive carriers de novo as well as the ability to package hydrophilic drugs inside the coacervate microdroplets.

Methods

Assessment of LLPS properties

LLPS properties of HBP-1 protein, its variants and HBP peptides at different buffer conditions (Figs. 23 and Supplementary Fig. 3, list of the buffers presented in Supplementary Table 1) were assessed using the method described by Tan et al.14. Briefly, protein/peptide stock solution (in 10 mM acetic acid, pH 3.3) was added to a buffer solution in a volume ratio 1:5 (protein/peptide stock:buffer). The mixture was then pipetted onto a microscopy glass slide and imaged using an optical microscope.

Optical microscopy

The phase separation behavior of protein variants and peptides was studied using a Zeiss Axio Scope A1 microscope (Carl Zeiss Pte Ltd., Germany) in the reflection mode, with differential interference contrast (DIC) filters. Images were taken with an AxioCam MRc 5 camera under the control of AxioVision software.

Solution-state NMR spectroscopy

Lyophilized HBP-1 protein or GY-23 peptide samples were dissolved in 10 mM acetic acid (pH 3.3) containing 10% D2O and 0.2 mM DSS prior the NMR experiments. 0.5 M NaOH was used for pH adjustment during pH titration experiments.

For HBP-1 protein backbone assignment, three-dimensional BEST-TROSY HNCO, HNCA, HN(CO)CA, HNCACB, HN(CO)CACB, HN(CA)CO experiments54 were recorded on a 700 MHz Bruker Advance III NMR spectrometer equipped with 5 mm z-gradient TXI cryoprobe operating at 298 K. The spectra were acquired using non-uniform sampling (NUS) with 30% amount of sparse sampling. Processing of the NUS spectra was performed using MDDNMR program55 implemented in TopSpin 3.5 (Bruker) software. Backbone assignment was carried out using CARA software (http://cara.nmr.ch/). 1H-15N-HMQC spectra at different pHs were acquired using SOFAST-HMQC pulse program56 on an 800 MHz Bruker Advance III NMR instrument equipped with 5 mm QCI H/P/C/N solution cryoprobe, at 298 K or 279 K.

Data for GY-23 backbone assignment were collected on the 800 MHz spectrometer. The same set of BEST-TROSY experiments (as for HBP-1 protein, expect of HN(CO)CACB) were recorded utilizing NUS with 10–30% amount of sparse sampling. Processing of the data and backbone assignment was performed as described above. Experiments during pH titration: 1H-15N-HMQC, 1H-13C-HSQC, and long-range 1H-15N-HMQC spectra of His side chains were acquired using standard pulse programs from the TopSpin 3.5 repository on the 700 MHz spectrometer. 15N- and 13C-HSQC-NOESY with 500 ms mixing time were acquired on the 600 MHz Bruker Advance III spectrometer equipped with 5 mm z-gradient TCI cryoprobe, at 298 K.

SAXS

Sample was prepared by dissolving 5.0 mg of lyophilized GY-23 peptide in 100 µL of 10 mM acetic acid (pH 3.3). Coacervation was induced by mixing of the peptide stock with the coacervation buffer (50 mM Tris-HCl, pH 7.0 buffer, containing 1 M NaCl) in 1 to 5 volume ratio. Coacervate-rich phase was collected by centrifugation (13,000g for 5 min at 25 °C) and transferred into a 1.5 mm quartz capillary together with some supernatant to avoid drying. The position of the capillary was then specifically aligned to hit the coacervate-rich phase.

SAXS measurements were performed on a Bruker Nanostar U (Bruker AXS, Karlsruhe, Germany) connected to a sealed-tube Cu anode X-ray source operating at 50 kV and 600 μA (Incoatec IμSCu, Geest-hacht, Germany). A Göbel mirror was used to convert the divergent polychromatic X-ray beam into a focused beam of monochromatic Cu Kα radiation (λ = 0.154 nm). The beam size was 0.3 mm. A sample to detector distance of 1077 mm gave the q-range 0.07 < q < 2.9 nm−1. The 2D SAXS patterns were acquired within 1 h using a VÅNTEC-2000 detector (Bruker AXS, Karlsruhe, Germany) with an active area of 140 × 140 mm2 and a pixel size of 68 μm.

The samples were measured in 1.5 mm quartz capillaries. The scattering curves were plotted as a function of intensity, I vs. q. Scattering from the corresponding buffer was subtracted as background from all samples.

Dynamic light scattering

DLS measurements were performed on ZetaPALs (Brookhaven Instruments Corporation) equipped with a 35 mW red diode laser (640 nm wavelength). The scattering angle was set to be 90° and each sample was measure 5 times.

Sample was prepared by dissolving of 1 mg of the GY-23 peptide in 100 µl of 10 mM acetic acid. Coacervation was induced by rising pH to 7.0 (adjusted with 1 M NaOH) and addition of salt (NaCl).

Solid-state NMR spectroscopy

HBP-1 and GY-23 peptide coacervates were loaded directly into 1.9 mm MAS rotor by ultracentrifugation (100,000 g, 30 min, 20 °C) using spiNpack (Giotto Biotech, Italy) rotor packing device. NMR data were collected on a 600 MHz Bruker Advance III instrument equipped with a 1.9 mm MAS probe operating in HX double resonance mode. One-dimensional (1D) 1H–13C cross-polarization (CP), 13C direct-polarization (DP) and 2D 13C–13C dipolar assisted rational resonance (DARR) experiments were performed with the MAS spinning frequency set at 18 kHz and the variable temperature set at 2 °C. The actual sample temperature was 10 °C based on the external calibration with ethylene-glycol57. Chemical shifts were referenced using the DSS scale with adamantane as a secondary standard for 13C58 (downfield signal at 40.48 ppm) and were calculated indirectly for 1H. The 1H→13C CP transfer was achieved by using 56 kHz 13C and 81 kHz (maximum power) 1H spin-lock rf fields with a 90–100% linear ramp applied on the 1H channel and a contact time of 250 μs. 80 kHz SPINAL-64 1H decoupling was implemented during data acquisition. The recycle delays were 1.5 s and 5 s in the 1D CP and DP experiments, respectively, and the acquisition time was 19.1 ms in both experiments. Additional parameters of the 2D 13C–13C DARR experiment included 1.5 s recycle delay, 72115.4 Hz sweep width and 14.2 ms acquisition time in the direct dimension, 36000 Hz sweep width and 7.1 ms acquisition time in the indirect dimension and 100 ms DARR mixing time. A dipolar based 2D 1H–13C heteronuclear correlation (HETCOR) experiment was conducted with 35 kHz MAS rate. The variable temperature was maintained at 15 °C corresponding to 13 °C actual sample temperature. 86 kHz 1H and 50 kHz (maximum power) 13C spin-lock rf fields with a 90–100% linear ramp applied on the 13C channel were implemented for the 1H→13C and 13C→1H CP transfers and the contact time was 100 μs. Suppression of water signal was achieved by implementing the MISSISSIPPI scheme without the homospoil gradient58. Additional parameters of the 2D 1H–13C HETCOR experiment included 1.5 s recycle delay, 34722.2 Hz sweep width and 11.1 ms acquisition time in the direct dimension, 35000 Hz sweep width and 7.3 ms acquisition time in the indirect dimension, 10 kHz XiX 1H decoupling during 13C chemical shift evolution period and 10 kHz WALTZ-16 13C decoupling during 1H acquisition time.