GOOSE enables the rational design of disordered regions
A major conceptual challenge in exploring IDR ensemble-to-function relationships is the difficulty of exploring IDR sequence space. For folded proteins, single-point mutations directly test sequence-structure relationships. In IDRs, single-point mutations are generally expected to have a negligible impact on conformational behavior (Fig. S1), and the reduced sequence conservation across homologous proteins suggests that single-point mutations often have a limited impact on function1,20–22. In contrast, decoding sequence-ensemble relationships has been effective through mutations that enable titration of sequence properties (e.g., tuning the net charge per residue), but this has been difficult to perform systematically without perturbing multiple sequence features at once17,20,22–25. Moreover, the rational design of IDRs with specified ensembles has historically been impossible until very recently26. To enable the holistic investigation of sequence-ensemble-function relationships in IDRs, we developed GOOSE. GOOSE is a Python-based software package for the rational design of synthetic sequences or sequence variants based on user-defined parameters (see Methods and Table S2). Parameters here include specific sequence features but also predicted ensemble dimensions (e.g., the radius of gyration or end-to-end distance).
As a proof of concept, we used GOOSE to design a library of 32 IDRs that allowed us to systematically test the impact of charged residues and hydrophobicity on ensemble dimensions (Table S1, Fig. S2). All IDRs were 60 residues in length and were designed to enable the comparison between pairs of sequences that vary specific sequence properties while holding others fixed (Table S3). We used a recently developed genetically encoded FRET backbone to profile the dimension of these sequences in live cells15,27. Briefly, by placing each IDR between two fluorescent proteins that form a FRET donor and acceptor pair, FRET efficiency becomes a proxy for the IDR’s ensemble-averaged end-to-end distance (i.e. ensemble dimensions); if the ensemble is compact, transfer efficiency is high, and if the ensemble is extended transfer efficiency is low (Fig. 1F). In this way, we assessed how varying different sequence properties altered ensemble-average conformational behavior for IDRs in cells (Fig. S3, see also discussion on limitations of the method in Supplementary Information).
Charged residues can compact or expand ensemble dimensions
We first designed sequences that test the effects of charged residues on ensemble dimensions. The importance of charged residues is driven by a combination of the long-range nature of electrostatic interactions and their favorable interaction with the solvent (Fig. 2A)8–10,28. Since the intracellular environment drastically differs from dilute aqueous buffers in terms of their ionic composition, we sought to determine whether rules of thumb established in in vitro and in silico studies are recapitulated in cells.
We first asked whether increasing the fraction of charged residues (FCR) would alter IDR dimensions. Based on in vitro studies, we expect that increasing the FCR should lead to ensemble expansion9,10,29. To test this hypothesis, we designed sequence pairs where the FCR varied, but other features (e.g., net charge, hydrophobicity, charge patterning, Fig. 2B) were held fixed. In line with prior work, both polyanionic (negatively charged) and polycationic (positively charged) IDRs with evenly distributed charged residues become more expanded as the FCR increased (Fig. 2C)10,29.
Next, we wondered how charge patterning might alter this behavior11. In vitro and in silico work has established that clustering oppositely charged residues in an IDR can lead to intra-molecular interactions and ensemble compaction (Fig. S4)11. We tested if that behavior persists in cells using new sequence pairs that had identical sequence features to those in Fig. 2C, except that charged residues of the same sign were clustered together (Fig. 2B). In line with expectations, increasing the FCR for sequences with charge clusters led to ensemble compaction for IDRs with a net negative charge (Fig. 2D). In contrast, for IDRs with a net positive charge and charge clusters, increasing the FCR led to ensemble expansion, an unexpected result that we speculate may be driven by these polycationic IDRs interacting with other cellular components (see Discussion). Taken together, our results show that for IDRs with a net charge, both the FCR and the patterning of charged residues can strongly influence IDR dimensions in cells.
Having examined IDRs with a net positive or negative charge (polyelectrolytes), we next considered polyampholytes, sequences with an equal fraction of positive and negative residues, making them net neutral. In agreement with prior in vitro and in silico work, we saw no statistically significant change in ensemble dimensions upon an increase in the FCR for polyampholytes with evenly distributed charged residues (Fig. 2E) 12,17. However, when oppositely charged residues were clustered together, an increase in the FCR led to an expansion of ensemble dimensions, as was seen for sequences with a net positive charge (Fig. 2E). This unexpected result further points to the importance of positively charged clusters, rather than an overall positive charge, in altering ensemble dimensions. This may again be due to interactions of such positive charge clusters with cellular components (see Discussion).
Finally, we asked how IDR ensembles with equivalent FCR values but different charge signs (negative, neutral, or positive) behaved. We designed sequence triplets with an FCR of 0.3 in which the net charge per residue varied (from –0.3 to 0.0 to +0.3), with sets having either evenly distributed or clustered charged residues. In silico and in vitro predict that the ensemble dimensions of IDRs with the same charge but with opposite signs should display similar dimensions8–10,12,17. To our surprise, this behavior was not recapitulated in our experiments (Fig. 2F). In two of the four triplets, we saw a monotonic decrease in IDR dimensions as positively charged residues were added. Moreover, in all four cases, the IDR with a +0.3 net charge was the most compact of the three sequences. This trend held true for a second set of sequence triplets with an FCR of 0.6 (Fig. 2G). In summary, these results suggest that IDRs with a net positive charge are more compact than equivalent IDRs with a neutral or negative charge.
Taken together, while these experiments do identify several differences between in vitro and in-cell observations, our results largely confirm that the same properties that influence IDR dimensions in vitro hold true in cells. Increasing the fraction of charged residues makes polyelectrolytic IDRs (sequences with a net charge) expand, while this effect is more muted for polyampholytic IDRs (sequences without a net charge). Most notably, negative polyelectrolytes are highly expanded, while positive polyelectrolytes are unexpectedly compact.
Modest increases in hydrophobicity do not lead to IDR compaction in cells
Having established that charged residues are key determinants of IDR dimensions in cells as observed in vitro, we next sought to assess how sequence hydrophobicity influences IDR dimensions in cells. The role of hydrophobicity on the global dimensions of unfolded and disordered proteins has received substantial attention30–33. In the protein folding literature, understanding the impact of hydrophobicity on the ensemble dimensions of unfolded polypeptides under folding conditions has been hampered historically by technical challenges but, more recently and perhaps more fundamentally, the limitation that aqueous buffer is not bona fide “native conditions”33,34.
We hypothesized that increasing IDR hydrophobicity would lead to measurable compaction of IDR dimensions in cells (Fig. 3A). To test this hypothesis, we designed twelve pairs of IDRs where hydrophobicity (as defined using the widely-used Kyte-Doolittle hydropathy scale35) increases between the two pairs while many other sequence properties remain fixed (Fig. 3B). For the majority (8/12) of the pairs, increasing hydrophobicity had no impact on IDR dimensions, despite a wide range of sequence backgrounds (Fig. 3C). Beyond the implications for the impact of hydrophobicity, this result also indicates that IDRs with different sequences can have indistinguishable ensemble properties, at least as measured in our assay.
For the four pairs where a change in ensemble dimensions was observed, molecular explanations for the differences are readily available. In one pair where increasing hydrophobicity leads to compaction, the specific sequence changes involve inserting three aromatic tryptophan residues into a subregion devoid of any charged residues (Fig. 3D). This result supports prior work showing a linear response between IDR dimensions and aromatic residue content7. For another pair, the increase in hydrophobicity accompanies the loss of proline residues along with large bulky aliphatic residues being inserted within two opposite charge blocks (Fig. 3E). Both of these changes are expected to drive ensemble compaction independently, such that the combined effect of both may be cooperative13,36. Finally, two of the pairs show an increase in IDR dimensions upon an increase in hydrophobicity, driven by the exchange of glutamine and asparagine residues for serine and threonine. Prior work has shown glutamine and asparagine can drive attractive interactions via the secondary amide group, whereas serine and threonine are not expected to interact as strongly (Fig. 3F)37. In summary, while specific and interpretable sequence features may diverge from the overall trend, our results here suggest that increasing IDR hydrophobicity within the bounds explored here does not, in general, lead to ensemble compaction.
Ensemble dimensions predict IDR function
Given their solvent-exposed nature and the small number of intramolecular bonds that dictate ensemble conformational biases, IDRs are poised to respond to changes in their physicochemical surroundings through a change in their ensemble dimensions15,16,27,38 . We reasoned that we could use our GOOSE-generated library to test how sequence properties map to their sensing ability.
We used live-cell ensemble FRET to determine how each of our synthetic IDRs responded to cell volume changes induced by hyper- and hypo-osmotic shock (Fig. 4A). Osmotic perturbations are sufficiently fast (~30 seconds) that changes in FRET efficiencies reflect a consequence of the immediate change in cellular environment, as opposed to a secondary effect driven by signaling or transcriptional changes39. Sensing cellular volume changes is required for various regulatory and homeostatic processes, including cell division and growth, yet how this is accomplished at the molecular level remains unclear40. We hypothesized that an IDR-based cellular volume sensor would alter its ensemble when the cell experiences volume-induced changes. One mechanism often invoked to explain this is that a decrease in cell volume would increase macromolecular crowding in the cytoplasm, which in turn would drive IDR compaction41. Cell volume increase should have the opposite effect (Fig. 4B). With this mechanism in mind, we measured sequence sensitivity as the change in FRET efficiency following cell volume increase or decrease (∆Ef) in all 32 constructs in our library (Fig. S5).
Our results indicate that not all IDR sequences tested here can act as sensors of the cellular environment. Instead, sequences fell into three behavior types: naive (in line with the macromolecular crowding mechanism described above - cell volume reduction causes ensemble compaction), insensitive (non-responsive to volume changes), and inverse (cell volume reduction causes ensemble expansion) (Fig. 4B, C). Around one-quarter of the sequences with sufficient statistics showed naive behavior, at least half showed insensitive behavior, and a fifth showed inverse behavior (Fig. S6).
What about our library sequences determines their response to cell volume change? We proposed three possible explanations: (1) that changes in subcellular localization, whether in basal conditions or as a result of volume change, would alter ensemble dimensions and, therefore, predict sequence behavior; (2) that sequence features, such as NCPR, FCR, and other features that were varied (Figs. 2 and 3) dictate sequence response; or (3) that ensemble dimensions are the key determinant for sequence response. We test each of these hypotheses below.
We first hypothesized that nuclear localization would expose sequences to a different environment with different chemistry and molecular composition, which could explain the differences in response to cell volume changes. Indeed, a subset of our sequences (n=10) show nuclear localization – likely mediated by tracts of positively charged residues that are recognized by the nuclear import machinery42 (Fig. S7). Despite this, almost all sequences had a similar ratio of in the cytoplasm vs in the nucleus (Fig. S8), and the ratio of between protein in the nucleus and cytoplasm showed little correlation with the response of the ensemble to changes in cell volume, (Fig. S9). We therefore sought to ask if there were simple sequence features that could explain why different sequences showed different responses to cell volume change.
We hypothesized that specific sequence features (Fig. S10A) may govern the response to changes in cell volume. To test this, we correlated the change in FRET upon change in cell volume (∆Ef) with a range of sequence features, but no strong correlations were found (Fig. 4E top, Fig. S10B blue dashed region). We were surprised to find that even charge properties, which showed a strong effect in sequence pairs, had little correlation with the response to volume change. Thus, our data shows that, in this case, molecular function cannot be predicted directly from the sequence features examined here.
Finally, we hypothesized that the response to volume change may be predictable based on IDR ensemble dimensions under iso-osmotic conditions. To test this, we compared the change in FRET upon change in cell volume (∆Ef) to the basal FRET efficiency (Ef) of the construct (Fig. 4E, Fig. 4F, G). The basal FRET efficiency showed a stronger correlation with ∆Ef than any other feature. To summarize this result, sequences that are more expanded prior to a change in cell volume are more sensitive to both volume increase and decrease (Fig. 4F, G, and Fig. S10B, black dashed region). This result is mirrored by results from simple coarse-grained simulations, where ensemble dimensions are predictive of an IDR’s responsiveness to cell volume perturbation (Fig. 4E, Fig. S11).
Our work reveals that, for this system, any single sequence feature is a poor determinant of function. Instead, the best correlation is entirely dependent on the average physical dimensions of the IDR ensemble. These results offer biophysical insight into the molecular basis for IDR sensitivity with implications for the design of de novo disordered sensors43. More broadly, they directly point to the potential for an intimate relationship between IDR ensemble properties and molecular function in living cells.