PAM Scoring Matrix

Overview

When scoring protein alignments, some amino acids are more likely to be transposed with others naturally during encoding.

The PAM scoring matrix takes these biological observations into account. The scoring matrix will be a 20x20 matrix (for each proteinogenic amino acid), and the value will be the penalty for transposing each amino acid with each other amino acid.

To determine the Point-accepted mutations, one will use a very basic scoring method (e.g. match = +1, penalizing all indel/mismatch with -1) to find high-precision alignments, then use these high-precision alignments to construct the more nuanced scoring matrix penalizing similar amino acids less.

Unit

The $PAM$ unit defines how much mutation is present between two sequences - 1 $PAM$ unit means (roughly) the amount of time that an “average” protein mutates 1% of its amino acids.

Definition

A $PAM_1$ scoring matrix is determined by the mutation scoring matrix of 99% similar proteins.

$$PAM_1[i, j] = \log \lparen \frac{M(i, j)}{f(j)} \rparen$$

where $M(i, j)$ is the number of times amino acid $i$ and $j$ appear in the same position in the peptide string, and $f(j)$ is the total occurrences of amino acid $j$.

This definition can be generalized to span multiple ($n$) $PAM$ units, by multiplying the $M$ matrix $n$ times:

$$PAM_n[i, j] = \log \lparen \frac{M^n(i, j)}{f(j)} \rparen$$

PAM250

Researchers often use $PAM_{250}$ for scoring alignments. Here is an example, hosted externally. Also available on my site as CSV.

For reference, here is the table for PAM250 inlined:

ACDEFGHIKLMNPQRSTVWY
A2-200-31-1-1-1-2-1010-2110-6-3
C-212-5-5-4-3-3-2-5-6-5-4-3-5-40-2-2-80
D0-543-611-20-4-32-12-100-2-7-4
E0-534-501-20-3-21-12-100-2-7-4
F-3-4-6-59-5-21-520-3-5-5-4-3-3-107
G1-310-55-2-3-2-4-300-1-310-1-7-5
H-1-311-2-26-20-2-22032-1-1-2-30
I-1-2-2-21-3-25-222-2-2-2-2-104-5-1
K-1-500-5-20-25-301-11300-2-3-4
L-2-6-4-32-4-22-364-3-3-2-3-3-22-2-1
M-1-5-3-20-3-22046-2-2-10-2-12-4-2
N0-421-302-21-3-2201010-2-4-2
P1-3-1-1-500-2-1-3-2060010-1-6-5
Q0-522-5-13-21-2-11041-1-1-2-5-4
R-2-4-1-1-4-32-23-3000160-1-22-4
S1000-31-1-10-3-211-1021-1-2-3
T1-200-30-100-2-100-1-1130-5-3
V0-2-2-2-1-1-24-222-2-1-2-2-104-6-2
W-6-8-7-70-7-3-5-3-2-4-4-6-52-2-5-6170
Y-30-4-47-50-1-4-1-2-2-5-4-4-3-3-2010