Peptide Encoding

Peptide Encoding is the challenge of “backtracking” the central dogma of biology: finding DNA sequences within a genome that can encode a given peptide. Peptide sequencing presents some algorithms for solving this problem.

Approach

Computationally, these stages can be done with reverse-lookup tables:

  1. For the string of amino acids forming the peptide, find the codon (RNA triplet) that forms it (e.g. using the inverse-codon table for each amino acid, below). Note: the RNA is not unique:

    A. Multiple codons can encode the same amino acid. For example, Arginine has 6 codons: (CGU, CGC, CGA, CGG; AGA, AGG).

    B. In the case that the peptide is cyclic, any rotation of the peptide string can be reported (but also, note that some cyclic peptides are actually Non-ribosomal and do not appear in the genome at all).

Amino acidRNA codons
Ala, AGCU, GCC, GCA, GCG
Arg, RCGU, CGC, CGA, CGG; AGA, AGG
Asn, NAAU, AAC
Asp, DGAU, GAC
Asn or Asp, BAAU, AAC; GAU, GAC
Cys, CUGU, UGC
Gln, QCAA, CAG
Glu, EGAA, GAG
Gln or Glu, ZCAA, CAG; GAA, GAG
Gly, GGGU, GGC, GGA, GGG
His, HCAU, CAC
STARTAUG
Ile, IAUU, AUC, AUA
Leu, LCUU, CUC, CUA, CUG; UUA, UUG
Lys, KAAA, AAG
Met, MAUG
Phe, FUUU, UUC
Pro, PCCU, CCC, CCA, CCG
Ser, SUCU, UCC, UCA, UCG; AGU, AGC
Thr, TACU, ACC, ACA, ACG
Trp, WUGG
Tyr, YUAU, UAC
Val, VGUU, GUC, GUA, GUG
STOPUAA, UGA, UAG
Source: Inverse Codon Table

One can find amino-acid coding similarity using the codon wheel below:

codon wheel: start in the middle, and work towards the outside to form a sequence Source: Codon Table page

  1. Convert from DNA to RNA (flip the Us to Ts).

  2. And, as always, the DNA sequence encoding the RNA sequence is also not unique, since

    A. The strandedness allows reverse-complement sequences to be equivalent

    B. There are coding (extrons) and non-coding (introns) regions within the genome at any given point in time.

History

In 1967, Marshall Nirenberg discovered that RNA strads of only uracil produced a peptide of only phenylalanine (Phe). Scientists continued this technique to discover how RNA codons encode amino acids.