Recent work published in the Journal of Molecular Biology (e.g., Tracing the Origin of the Genetic Code and Thermostability to Dipeptide Sequence in Proteomes) has provided compelling evidence that signatures of coevolution are embedded within proteomes. Specifically, correlations between dipeptide sequence patterns, thermostability, and codon usage point to a coevolutionary imprint linking genetic coding and protein structure. These findings highlight the simultaneous pressures that shaped both molecular codes and protein domains.
However, such analyses remain primarily descriptive and statistical. They identify correlations but do not explain why or how these interdependencies stabilize, nor do they formalize the mechanisms that synchronize mutational exploration, structural constraints, and adaptive pressures.
Our work builds directly on this foundation by introducing a mathematical CAS framework that explains how RNA and protein codes emerge not sequentially, but as mutually stabilizing attractor states within an evolving dynamical system.
Novelty and Significance Statement
Novelty. This study provides the first rigorous CAS-based mathematical model of RNA--protein coevolution, moving beyond descriptive correlations to formalize emergent synchronization. Unlike RNA-first or protein-first models, our approach demonstrates that both codes emerge simultaneously as stable attractors of coupled evolutionary dynamics.
Significance. By reframing the origin of genetic and protein codes as a problem of complex adaptive dynamics, we unify disparate empirical observations---proteomic motifs, genomic correlations, functional constraints---within a single predictive framework. This approach bridges molecular evolution, systems biology, and complexity science, offering testable hypotheses for comparative genomics, directed evolution experiments, and synthetic biology.
Executive Summary
Problem: Linear hypotheses (RNA-first vs protein-first) cannot explain how interdependent systems such as genetic codes and proteins could evolve without foresight.
Empirical basis: Recent proteomic analyses reveal dipeptide-level correlations linking genetic coding and protein stability, suggesting coevolution.
Approach: We develop a Complex Adaptive Systems (CAS) mathematical framework combining replicator--mutator dynamics, genotype--phenotype mapping, and interdependent fitness functions.
Results: The model produces emergent attractors representing stable RNA--protein complexes, synchronization of evolutionary trajectories, and Red Queen-like cycles.
Contribution: This framework explains how molecular codes coevolve simultaneously, provides predictions for genomic and experimental tests, and reframes molecular evolution as a problem of self-organizing complexity.
Outline
I. Introduction
A. Limitations of RNA-first and protein-first models.
B. The puzzle of synchronized molecular codes.
C. Promise of CAS for emergent coevolution.
II. Background: Empirical and Theoretical Context