Solving a 50-Year-Old Puzzle: AlphaFold 2 and Protein Folding
With the world watching on, DeepMind’s AlphaFold 2 outperformed every other team at the 2020 biennial protein-structure prediction challenge, also known as CASP 14. This was a significant step forward in the history of molecular biology and medicine, but it would have not been made possible by prior efforts that finally culminated in the creation of AlphaFold 2. Let’s take a look at the history of protein folding before delving into the significance of this milestone in the field of molecular biology and beyond.
Table of Contents
Proteins, Amino Acids and Folding
Proteins are molecules that oversee many different roles within cells. The particular function of each protein comes from its 3-dimensional structure and its primary amino acid sequence dictates this. Since Anfinsen’s experiments in the 1960s, molecular biologists from all around the globe have been trying to understand the nature of this amino acid code and how it is responsible for ‘protein folding’. After 50 years of tireless efforts, a new piece of software has achieved one of the most important steps towards our understanding of protein folding.
The software in question, AlphaFold 2, is able to solve the protein structure of an amino-acid sequence with unprecedented accuracy, making headways into finally solving the 50-year-old puzzle. Created by DeepMind (of AlphaGo fame), AlphaFold 2 was made possible by newly available technologies and the growing field of artificial intelligence networks to their team of scientists, engineers, machine learning experts and others.
Nobel Prize Laureate Christian B. Anfinsen postulated that the only information that a protein needs to fold into its native structure, is stored in its amino acid sequence. He worked with a pancreatic protein, ribonuclease A, for his experiments on protein folding1.
This small protein made of only 124 amino acids, oversees catalyzing the breakdown of RNA. Within its structure, there are 4 disulfide bonds, formed between Cysteine amino acids, which control the activity of the enzyme depending on their reduction state. In its unfolded or denatured state, these 4 disulfide bonds are reduced, and the protein presents no activity. Anfinsen managed to denature this protein by adding and removing 2-mercaptoethanol (HOCH₂CH₂SH) or 2ME, and the denaturant urea in vitro.
Once denatured, the next step was to try and reverse the process, for which he removed these two denaturing agents in different orders:
- In one case, he removed 2ME and then the urea. This produced an activity restoration of around 1% of the native state activity.
- In another case, when removing urea first and 2ME afterward, almost all the original function was successfully restored.
The explanation for this phenomenon is related to the order where the disulfide bonds get formed. In the first case, removing 2ME caused these bonds to form back, but there was still a denaturing agent in the environment and thus the order in which the Cysteine residues bonded was modified, thus changing the original folding conformation and altering the activity of the protein.
In the second case, since the denaturing agent was removed, the protein could re-fold as it would in its natural environment, and once 2ME was removed, the disulfide bonds were able to form, this time in the right order. The conclusion of Anfinsen’s experiments was that the primary structure of the protein, the amino acid sequence, held within it all the folding instructions necessary to generate the active form of the protein.
The Importance of Protein Prediction
What happens if proteins do not fold properly? This phenomenon is called protein misfolding, and while there are certain proteins called chaperones that assist in the protein folding process to allow a protein trapped in an incorrect conformation to unfold and retry, it is not completely failproof. Some of the problems associated with protein misfolding are the development of severe conditions such as Alzheimer’s disease, Parkinson’s disease, Huntington’s disease and many other degenerative disorders2.
Protein prediction has, thus, important applications in medicinal chemistry as part of structure-based drug design. This field aims to predict the ways that different compounds, such as novel drugs, interact and bind with each protein (this topic is discussed more in-depth in the author’s prior article “Molecular Docking: Bioinformatics in Drug Discovery”).
Since 1994, part of the scientific community dedicated to structure prediction has been participating in the CASP or Critical Assessment of protein Structure Prediction challenge, a bi-annual competition where teams from all around the globe submit their algorithm and strategies to tackle this computationally complex problem3.
Each submission is tested in a double-blind fashion: those proteins that their software attempts to predict must have been crystallized or are about to be crystallized by physical techniques, usually X-ray crystallography or Nuclear Magnetic Resonance (NMR) spectroscopy. To make it fair, the structural information of each protein must have never been made public, and neither the organizers, assessors nor the participants can know anything about them.
The percentage of accuracy of the different predictors, including the first published version of AlphaFold by DeepMind, had only been able to reach a ~60% prediction accuracy. That is until the most recent CASP challenge was held in 20204.
AlphaFold 2, the winner algorithm of CASP 14, showed an amazingly increased accuracy in its prediction when compared to its first version in CASP 13 and its competitors. The first attempt of DeepMind’s software made use of Artificial Intelligence (AI) algorithms that used the distance between pairs of amino acids in a protein to create a prediction, a logical step since the amino-acid interaction seemed to be the main key to deciphering protein folding5.
This approach did not, however, take into account other environmental and physical elements that affected the folding and focused only on amino-acid interactions. AlphaFold 2, however, uses these global structural constraints, giving place to a different deep learning method and thus achieving a further 30% improvement to its predecessor’s accuracy.
Many professionals in the field believed that this challenge was going to remain unsolved for another one or two decades. However, the remarkable reality is that DeepMind’s AlphaFold 2 might kickstart the protein folding revolution in an already flourishing field of Molecular and Computational Biology.
- Anfinsen, C., 1973. Principles that Govern the Folding of Protein Chains. Science, 181(4096), pp.223-230.
- Chaudhuri, T. and Paul, S., 2006. Protein-misfolding diseases and chaperone-based therapeutic approaches. FEBS Journal, 273(7), pp.1331-1349.
- Moult, J., Pedersen, J., Judson, R. and Fidelis, K., 1995. A large-scale experiment to assess protein structure prediction methods. Proteins: Structure, Function, and Genetics, 23(3), p.ii-iv.
- Callaway, E., 2020. ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature, 588(7837), pp.203-204
- Senior, A., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., Qin, C., Žídek, A., Nelson, A., Bridgland, A., Penedones, H., Petersen, S., Simonyan, K., Crossan, S., Kohli, P., Jones, D., Silver, D., Kavukcuoglu, K. and Hassabis, D., 2020. Improved protein structure prediction using potentials from deep learning. Nature, 577(7792), pp.706-710.
- Predictioncenter.org. 2020. CRITICAL ASSESSMENT OF TECHNIQUES FOR PROTEIN STRUCTURE PREDICTION. .