In contrast, reinforcement learning enables mutation selection as an active, state-aware decision process. An RL agent learns which mutations---or classes of mutations---are more likely to lead to viable and performant enzymes, based on feedback from the system's emergent dynamics.
2. Agent-Environment Design
The simulation engine defines a custom evolutionary environment in which RL agents interact with protein sequences as evolving states.
a. State Space
The agent's perception of the state includes:
Current amino acid sequence or encoded features (e.g., hydrophobicity, charge profile)
Folding metrics (e.g., predicted RMSD, contact map entropy)
Historical mutation trajectory
Functional scores (binding energy, catalysis efficiency)
Environmental parameters (e.g., simulated pH, temperature)
b. Action Space