Protein domains of low sequence complexity—dark matter of the proteome
- Corresponding author: steven.mcknight{at}utsouthwestern.edu
Abstract
This perspective begins with a speculative consideration of the properties of the earliest proteins to appear during evolution. What did these primitive proteins look like, and how were they of benefit to early forms of life? I proceed to hypothesize that primitive proteins have been preserved through evolution and now serve diverse functions important to the dynamics of cell morphology and biological regulation. The primitive nature of these modern proteins is easy to spot. They are composed of a limited subset of the 20 amino acids used by traditionally evolved proteins and thus are of low sequence complexity. This chemical simplicity limits protein domains of low sequence complexity to forming only a crude and labile type of protein structure currently hidden from the computational powers of machine learning. I conclude by hypothesizing that this structural weakness represents the underlying virtue of proteins that, at least for the moment, constitute the dark matter of the proteome.
Keywords
Footnotes
-
Supplemental material is available for this article.
-
Article published online ahead of print. Article and publication date are online at http://www.genesdev.org/cgi/doi/10.1101/gad.351465.123.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genesdev.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.