Understanding Codon Optimization
The genetic code is degenerate: most amino acids are encoded by two to six synonymous codons. Different organisms show distinct preferences for which codons they use — a phenomenon called codon usage bias. Highly expressed genes in a given organism tend to use codons that match abundant tRNA pools, enabling efficient translation.
Codon optimization replaces codons in a coding sequence (CDS) with synonymous alternatives preferred by the target host organism. The amino acid sequence is invariant — only the DNA encoding changes. The goal is to improve heterologous protein expression by matching the host's translational machinery.
The primary metric for optimization quality is the Codon Adaptation Index (CAI), a value from 0 to 1 measuring how closely the codon usage matches the host's preference. A CAI of 1.0 means every codon is the most frequent for that amino acid in the host. Typical targets are > 0.7 for E. coli and > 0.8 for mammalian expression systems.
Codon usage frequency data is sourced from the Kazusa Codon Usage Database, the canonical public-domain reference for organism-specific codon frequencies. The related metric RSCU (Relative Synonymous Codon Usage) normalizes frequency against the expected value if all synonymous codons were used equally.