AlphaFold Outputs

November 8, 2021

AlphaFold is a tool from DeepMind to fold proteins.

Here, I have presented some background on the problem it solves, as well as some solved structures for proteins of interest. The following post is intended to be more empirical in nature, while the AlphaFold Architecture post goes into more detail about how the AlphaFold model works.

Objective

The main goal is to visualize AlphaFold protein structure solutions for a few proteins and protein complexes. There are three tests: easy, medium, and hard.

The first test (easy) consists of simple, well-understood proteins of average-length. AlphaFold should fold them nearly perfectly:

GFP (from the GFP bacterial transformation experiment): This protein fluoresces when exposed to light.
Hemoglobin: this ring-shaped protein encloses iron for oxygen transport.

The second test (medium) is to fold the SARS-CoV-2 spike protein, an intricate protein that binds to the ACE-2 receptor in human cells. This protein is more difficult to fold.

A final, more novel test (hard) is to fold antibody-antigen complexes. Generating antibodies for a particular antigen purely in-silico could revolutionize drug design, particularly given advancements in delivery (e.g. mRNA vaccines, other gene therapies). Accurate antigen-antibody complex structure predictions would be a good first step towards antibody generation for a particular antigen. Although AlphaFold is not trained explicitly for predicting antibody-antigen contact interfaces, AlphaFold Multimer can serve as a baseline for other, more sophisticated systems. Two areas of interest are:

Prostate Cancer (PCa)-antibody binding: according to Monoclonal Antibody Therapy for Prostate Cancer, Jakobvits 2008, PCa is a good candidate for monoclonal antibody treatment, since the organ is non-essential and the tumors aren’t localized to one location.
HIV-antibody binding: HIV itself is a difficult target for the immune system, due to its envelope glycoprotein and high genome mutation rate. However, targeting critical HIV enzymes has been effective in a number of drugs, and some antibodies achieve similar disruption.

Caveat: computational bottlenecks in my local setup limited the extent to which I could run the immunocomplex structure predictions, as ~1600aa hit the limits of my RTX 2080-TI GPU. Further investigation of AlphaFold model architecture, and cloud hardware could help to understand the plausibility of antibody-antigen binding.

Protein Structure Prediction Problem

The protein structure prediction problem is: given an input amino acid sequence $S$, identify the positions of all atoms in the residues once folded.

The CASP contest is the most famous competition for protein structure prediction.

Tractability of the Protein Structure Prediction Problem

This problem is relatively difficult. From 20 amino acids, in theory, there are $20^n$ possible polypeptide chains that are $n$ amino acids long. With $n \gt 300$, there are more ways to fold a protein than exist atoms in the universe.

Yet, there are a few points of hope for why in-silico methods could solve for the structures:

The proteins themselves fold in milliseconds, as noted in Levinthal’s paradox. The speed of the folding process hints that some relatively simple rules could guide the overall process, and final folded structure, for biologically useful proteins.
Natural selection favors biologically useful proteins, of which there are far fewer than the theoretical maximum ¹. Natural selection also favors tinkering, so similar structures tend to appear across species.

Empirical support for the tractability of the protein folding problem is the superfold: in 1994, it was found that nine protein folds accounted for 30% of all determined structures. Today, the CATH project groups ~150 million protein domains into roughly 6000 “superfamilies” of similar structure (visualization).

Quality Metrics

To evaluate structure solution quality, one can compare a reference model $M_R$ (solved with e.g. x-ray crystallography, electron microscopy, NMR, or cryo-EM) to a predicted model $M_P$ (solved with e.g. AlphaFold, Rosetta).

The following metrics can evaluate a group of atoms against a reference. They can work at the level of residue, protein subunit, or even protein complex:

Global Distance Test, or GDT determines the approximate maximum number of atoms in the predicted structure that are within $d$ of the corresponding atom in the reference structure, provided that the two structures are aligned. This number can then be normalized by the total number of atoms. In CASP, the GDT_TS (“total score”) is the average GDT for $d$ values of 1, 2, 4, and 8 angstroms.
Local Distance Difference Test, or $lDDT$, determines the fraction of preserved local distances (higher is better). A local distance is the distance between two atoms in the structure, that are both (a) within an inclusion radius $R_0$ and (b) on different residues. The $lDDT$ score is the fraction of $M_R$ local distances preserved in $M_P$, where a distance $l$ is preserved if the difference between reference and prediction is less than a threshold (|$l_R - l_P| \lt \epsilon$ ). If an atom is not present in the prediction, then the missing distances are also not preserved.
Template modeling ($TM$) score, determines the average inverse square distance of each pair of aligned residues (higher is better). The inverse square score function is: $\frac{1}{1 + d_{i, norm}^2}$ where $d_{i, norm}$ is a normalized distance between the $i^{th}$ pair of (reference, predicted) residues in a global alignment between the predicted and reference amino acid sequences. In the alignment, only matching amino acids are scored; insertions/deletions in the alignment are not.

The AlphaFold model also estimates each metric. AlphaFold’s estimates can be used as noisy quality metrics:

$plDDT$ is a predicted $lDDT$ for each residue in the sequence. On a held-out test set, the line of best fit between $plDDT$ and $lDDT$ had a 0.76 correlation coefficient with $lDDT$ (Fig 2.c of AlphaFold).
$pTM$ is a predicted $TM$ score. The line of best fit between $pTM$ and $TM$ had a 0.85 correlation coefficient (Fig 2.d of AlphaFold).

For evaluating protein complex structure (quaternary structure), the contacts between the protein subunits are of critical interest. Specific metrics to evaluate the contacts regions are:

DockQ: is the average of several scoring functions used in the CAPRI competition. The scoring functions ($F_{nat}$, $LRMS$, and $iRMS$)
Predicted aligned error (pAE), AlphaFold’s estimated error distance between pairs of residues in the structure. Low inter-chain error likely indicates good docking; low intra-chain error likely indicates good tertiary structure prediction.

GFP

Green Fluorescent Protein (GFP, pdb:4kw4) is a good test-case for AlphaFold.

GFP has a single amino acid sequence. Consisting of a single chain of 270 amino acids, GFP is the simplest structure investigated.
GFP is interesting: the spontaneous folding produces helical sheets surrounding a central chemophore.
GFP is well understood, and is even engineered (e.g. to create more colors or fluoresce brighter). There are many structures in uniprot to serve as a baseline to compare against AlphaFold predictions.
GFP templates are in the protein databank, so AlphaFold should perform well.

According to the protein databank, there are two main components of interest for the GFP:

A serine-tyrosine-glycine connection at the center of the helix.
A helical structure surrounding it.

Analysis: Alphafold did solve both components quite well (TM score of 0.99). The confidences of the model were strongest in the outside helix, weaker on the internal structure, and weakest (low confidence) on some edge structures. The comparison with the x-ray crystallography is given below: AlphaFold on top, x-ray crystallography on the bottom.

Graphic Legend

Very low (pLDDT < 50)

Low (70 > pLDDT > 50)

Confident (90 > pLDDT > 70)

Very high (pLDDT > 90)

X-ray structure. Resolution of 1.75 Angstroms (Source, Download)

TM Score:

0.996

AlphaFold Input: GFP (Aequorea victoria) sequence. (FASTA)

Hemoglobin

Hemoglobin (pdb:4hhb) is another moderately-sized, well-understood protein. Hemoglobin is a complex that forms a ring of proteins, which bind to a central heme group. The heme group contains an iron ion, which serves as a carrier for other molecules, particularly oxygen.

Two pairs of protein subunits form the ring complex. Each pair consists of an alpha unit (purple) and beta unit (green).

Analysis: AlphaFold’s solution of the structure (top) recapitulated the reference (bottom). The predicted aligned error also indicated high confidence in the complex overall, with the interfaces having the highest predicted error. The DockQ score of 0.936 is categorized as “high quality.”

Graphic Legend

Hemoglobin subunit alpha

Hemoglobin subunit beta

(Bottom): X-ray structure, resolution of 1.75A (Source, Download)

TM Score:

0.997

DockQ Score:

0.936 (High quality)

Predicted Aligned Error: estimated accuracy of multimer folding. Red indicates chain boundary. Green indicates lower error; white indicates error. See: explanation here.

AlphaFold Input: Human Hemoglobin. (FASTA)

(Experimental): SARS-CoV-2 Spike Protein

Note: this protein folding experiment is tagged, “experimental,” pending a further analysis of AlphaFold computational requirements and architecture.

The SARS-CoV-2 Spike Protein enables the COVID-19 virus to bind to ACE-2 receptors in different types of human cells. Three identical protein subdomains form the spike protein trimer. The below depiction is the closed form of the protein.

Analysis: In this experiment, only one protein subunit (rather than three) of the protein domain was folded. While the overall shape of the predicted model closely matches the reference model (TM score = 0.97!), the off-template chain had low pLDDT values. It is possible that re-running AlphaFold with three copies of the duplicated protein domain will result in a more accurate homomer; to run that experiment may require cloud resources.

Graphic Legend

Very low (pLDDT < 50)

Low (70 > pLDDT > 50)

Confident (90 > pLDDT > 70)

Very high (pLDDT > 90)

Electron Microscopy structure. Resolution of 2.8 Angstroms. (Source, Download)

TM Score:

0.972

AlphaFold Input: SARS-CoV-2 Spike Protein. (FASTA)

(Experimental) Prostate Cancer Antibody

Note: this protein folding experiment is tagged, “experimental,” pending a further analysis of AlphaFold computational requirements and architecture.

There are many candidate antigens for prostate cancer (PCa).

One over-expressed protein in PCa cells is STEAP-1 (Six-Transmembrane Epithelial Antigen of Prostate-1). The transmembrane complex appears to enhance tumor growth: certain naked antibodies helped reduce cell-cell communication in-vitro, and reduce tumor size in-vivo, according to Jakobvits 2008, section 3.2.5.

An antibody for STEAP-1 (mAb120.545) has been developed, and researchers imaged how fragments of the antibody bind to STEAP-1 (paper by Oosterheert and Gros, pdb:6Y9B). The structure is a trimeric homomer. Oosterheert and Gros found that the STEAP-1 seems to act like an iron reductase, when combined with STEAP-4, and that the antibody binding to STEAP-1 inhibited interaction with STEAP-4.

I ran AlphaFold on a single protein subdomain, to determine docking accuracy for an immunocomplex.

Analysis: AlphaFold’s model of the complex had high template model score for each protein subunit (pdb:6Y9B), averaging 0.984. However, the solved model had a DockQ score of nearly zero. These quality numbers are quite extreme: it’s surprising to see a nearly perfect template score, with a nearly perfectly incorrect DockQ score. It is possible I have made some mistake in my use of the toolchain.

Caveats aside, however, qualitative analysis of the solved model seems consistent with these scores. While the structure seems quite close, the AlphaFold structure missed all critical interfaces of the immune complex. Both the binding site and orientation of the Fabs were incorrect. Further, the predicted aligned error emphasizes this uncertainty, as the inter-chain residue pairs have higher estimated error than the intra-chain residue pairs. It’s possible that a complete multimer fold of the complex would add more constraints, and perhaps produce better results (here, I was only able to fold one of the three trimeric protein subunits of the STEAP-1-Fab complexes, due to computational constraints).

Graphic Legend

STEAP-1

Fab120.545 (light chain)

Fab120.545 (heavy chain)

(Bottom): Electron Microscopy of STEAP-1 (Source, Download)

TM Score:

0.984

DockQ Score:

0.004 (Incorrect)

Predicted Aligned Error: estimated accuracy of multimer folding. Red indicates chain boundary. Green indicates lower error; white indicates error. See: explanation here.

HIV-1 Protease Antibody

According to the protein databank, HIV-1 protease is critical for the HIV virus to properly form. The HIV-1 protease carefully surrounds HIV protein chains and cuts them up into useful pieces, which then form a mature HIV virus. Since this cutting step requires intricately timed operations, the protease is a good target for drugs. Some drugs that mimic the viral protein chain to disrupt the HIV-1 protease’s timing.

There are also antibodies that bind to the chain, such as mab-1696. Rescova, et. al. 2001 used x-ray crystallography to identify binding behavior to the N-terminus of the protease (aa sequence: PQITLWQ). However, the full immunocomplex with HIV-1 protease has not been imaged. Lack of experimental data provides an interesting niche for purely in-silica methods, such as AlphaFold’s multimer configuration.

Analysis: Interestingly, the solved structure from AlphaFold positions the antibody on the exact opposite side from the N-terminus. The TM scores were quite high, averaging 0.976 when aligning to the protease (pdb:7HVP) and aligning to the antibody chain (pdb:1CL7). However, the predicted aligned error was high, particularly across the antigen-antibody boundaries. The result is curious, but I would not be too confident.

Graphic Legend

HIV-1 Protease

mAB 1696 antibody (light chain)

mAB 1696 antibody (variable heavy chain)

mAB 1696 antibody (constant heavy chain)

TM Score:

0.976

Predicted Aligned Error: estimated accuracy of multimer folding. Red indicates chain boundary. Green indicates lower error; white indicates error. See: explanation here.

Summary

The AlphaFold structure solutions had high template model scores for each protein investigated, likely due to exact input template matches. Quaternary structure prediction worked well for single proteins (like Hemoglobin), but had high error when predicting immunocomplexes. AlphaFold’s predicted error ($pTM$, $plDDT$, and $pAE$) correlated well with the conventional scoring mechanisms.

AlphaFold is an important breakthrough, and will be interesting to test going forward.

Sources

Highly accurate protein structure prediction with AlphaFold and Protein complex prediction with AlphaFold-Multimer, with code from GitHub. I was able to run the code with minimal issues, and it is high-quality and well documented. I also found Protein structure prediction by AlphaFold2: are attention and symmetries all you need? by Nazim Bouatta, Peter Sorger, and Mohammed Al-Quraishi to be a helpful reference.

Protein structures were found via uniprot, and the protein databank. The protein databank’s molecule of the month provides high-quality, simply explained articles with great visuals.

This page uses the 3Dmol library for 3d interactivity, which is simple for its most basic task: viewing a single molecule, and coloring the atoms by property. PDB’s mol* was also a good reference.

The TM-Align program was used to compute the TM score.

Molecular Biology of the Cell, 5th Edition, p. 136-137 ↩︎