Menu
AI
Artificial Intelligence 10 min read

AlphaFold: How AI Solved Biology's 50-Year Grand Challenge

In 2020, DeepMind's AlphaFold predicted protein structures with atomic-level accuracy — solving a problem that had defeated structural biology for fifty years. The result: 200 million protein structures freely available to every researcher on Earth, a Nobel Prize, and a template for using AI to compress decades of scientific progress.

AlphaFold: from amino acid sequence to 3D protein structure

In 1972, Christian Anfinsen accepted the Nobel Prize in Chemistry with a prediction. His work had established that the three-dimensional shape of a protein is determined entirely by its sequence of amino acids — the linear chain of chemical building blocks the gene encodes. The implication was bold: given a sequence, the structure should be computable. Nature did it spontaneously, in milliseconds. Science should eventually be able to do it too.

It took forty-eight years.

The Grand Challenge

Proteins are the machines of life. They catalyse every chemical reaction in the cell, carry oxygen through the blood, replicate DNA, fight pathogens, and transmit signals across nerve membranes. Their function is almost always inseparable from their shape — the precise, intricate three-dimensional fold that a linear chain of amino acids twists itself into within milliseconds of being manufactured by a ribosome.

Understanding that shape is not a curiosity. It is the basis of most modern medicine. Drug designers spend years trying to determine the shape of a target protein so they can engineer a molecule that fits into its active site like a key into a lock. Vaccine developers need to know the shape of a pathogen's surface proteins to build antibodies that can recognise them. Researchers studying genetic disease need to understand how a mutation changes a protein's fold — and therefore its function.

The experimental methods for determining protein structure — X-ray crystallography, cryo-electron microscopy, NMR spectroscopy — are painstaking. A single protein structure can take a graduate student months or years of work. By 2020, the global scientific community had collectively solved roughly 170,000 protein structures in 70 years. There are approximately 200 million proteins in nature. At that pace, the gap would never close.

This was the protein folding problem. It was considered one of the great unsolved problems in biology. It had its own competitive benchmark — the Critical Assessment of Protein Structure Prediction (CASP) — that had been running since 1994, grading computational attempts every two years. Progress had been slow and incremental. Most experts thought a real solution was still decades away.

The 20 amino acids — building blocks of every protein, grouped by chemical property
THE 20 AMINO ACIDS · Single letter · Three letter · Chemical property

The Man Who Chose the Problem on Purpose

Demis Hassabis founded DeepMind in London in 2010 with a specific ambition: to build artificial general intelligence, and then use it to solve science's hardest problems. Protein folding was always on his list. He had studied neuroscience and AI simultaneously, and the intersection he cared about was not chatbots or recommendation engines — it was using machine learning to accelerate the rate at which science produces knowledge.

The moment he became certain protein folding was solvable came in March 2016, during the match between AlphaGo and the world Go champion Lee Sedol. Hassabis watched the game. When AlphaGo won, he turned to his team and said — and the quote has been verified by those present — "We can solve protein folding. I'm sure we can do that now."

The logic was not obvious. Go and protein folding appear to have nothing in common. But Hassabis understood something that most biologists did not: the techniques that had allowed a neural network to master Go — learning from vast libraries of game states, discovering patterns invisible to human players, training through self-play against evolving positions — could be applied to the evolutionary information encoded in protein sequences.

Every species that has ever lived has been a protein folding experiment. Evolution has been running protein structures through billions of years of selection, keeping the ones that work and discarding the rest. The entire history of that experiment is written in the genomes of living organisms — in the variations between related proteins across different species. If you knew which amino acids had changed and which had stayed fixed across 3.5 billion years of evolution, you would know something deep about which residues are touching each other in the folded structure (because changing one and not compensating with a paired change would break the protein). This evolutionary co-variation was the hidden signal AlphaFold was built to read.

How AlphaFold Works

John Jumper, the DeepMind researcher who led the AlphaFold 2 development and shared the 2024 Nobel Prize in Chemistry with Hassabis, describes the core insight simply: "This process takes a year in the lab. The notion that we'll turn that work into a machine that gives you a really good answer in five minutes — that was the goal."

AlphaFold 2, released in 2020, works in several stages. First, it constructs a multiple sequence alignment — a comparison of the query protein against thousands of related proteins from other organisms, cataloguing which positions vary and which stay fixed across species. This is the evolutionary memory of the protein. Second, it feeds this alignment into a novel transformer architecture called the Evoformer, which builds a pairwise representation: for every pair of amino acid positions in the chain, it learns to predict the probability that those two residues are physically close in the final structure. Third, a structure module uses these pairwise distances to predict the actual three-dimensional coordinates of every atom in the protein. Finally, it outputs a confidence score for each residue — a number from zero to one hundred indicating how certain the model is about each prediction.

At CASP14 in December 2020, AlphaFold 2 achieved a median accuracy of 92.4 GDT — a score so far above the second-place competitors (which scored in the 70s) that the organizers described it as a solution to the problem. Many structures were predicted with atomic-level accuracy indistinguishable from experimental determination. The protein folding problem — officially — was solved.

The Database That Changed Science Overnight

DeepMind could have kept AlphaFold proprietary. They chose not to. In July 2021, they released the AlphaFold Protein Structure Database in partnership with the European Bioinformatics Institute, initially containing structures for all 20,000 proteins encoded by the human genome, and the proteomes of twenty model organisms. In 2022 they expanded to 200 million structures — essentially every protein from every organism whose genome had been sequenced.

The effect on biology was immediate and global. Within two years, more than three million researchers in 190 countries were using the database. Over 35,000 scientific papers have cited AlphaFold. The structures are free, downloadable, and computed. A graduate student who would have spent a year solving one protein structure can now download a highly accurate prediction in seconds.

The scope of what this enables is difficult to overstate. Drug designers can examine the binding pockets of disease targets they could never previously visualise. Researchers in low- and middle-income countries — who lack the expensive cryo-EM equipment required for experimental structure determination — can now conduct structural biology on a laptop. A researcher in Uganda, working on a potential breast cancer vaccine, described using AlphaFold to narrow 15,000 candidate protein sites to 15 in the time it would have previously taken to characterise a handful experimentally.

AlphaFold 3: Beyond Proteins

AlphaFold 2 predicted protein structures. AlphaFold 3, released in 2024, predicts interactions between proteins, DNA, RNA, and small molecules — including drug candidates. The architectural shift was significant: where AlphaFold 2 used a custom transformer (the Evoformer), AlphaFold 3 uses a diffusion model, the same class of neural network that powers image generation systems like DALL-E and Stable Diffusion. Instead of predicting atomic coordinates directly, AlphaFold 3 learns to start from a cloud of random atomic noise and progressively refine it into a physically plausible molecular structure.

For drug discovery, this is the critical extension. Most drugs are not proteins — they are small organic molecules that bind to proteins. AlphaFold 2 told you the shape of the target. AlphaFold 3 tells you how the target and the drug fit together. Hassabis has described this as potentially compressing the early stages of drug discovery — which typically takes five to ten years and costs hundreds of millions of dollars — by an order of magnitude. His commercial vehicle for this, Isomorphic Labs, is applying AlphaFold 3 to drug pipeline development in partnership with major pharmaceutical companies.

The Democratisation of Hard Science

One of Hassabis's consistent themes is democratisation. Experimental structural biology requires equipment costing millions of dollars, expertise that takes decades to develop, and access to synchrotrons or cryo-EM facilities that exist in perhaps fifty places on Earth. AlphaFold runs on a laptop with an internet connection. The knowledge embedded in 3.5 billion years of evolution, distilled into a neural network, is now accessible to any researcher anywhere in the world who can frame a scientific question.

This matters most at the frontier of neglected disease. Pathogens that cause the largest burden of disease globally — malaria, tuberculosis, schistosomiasis — have historically received the least structural biology attention because the commercial return does not justify the experimental investment. AlphaFold allows researchers in affected countries to work on these proteins directly, without needing to ship samples to a facility in Europe or the United States and wait months for results.

The Nobel and What It Means

In October 2024, the Royal Swedish Academy of Sciences awarded the Nobel Prize in Chemistry to Demis Hassabis and John Jumper for AlphaFold, and to David Baker (University of Washington) for the related work of computational protein design — using similar techniques to design new proteins that don't exist in nature. The committee called it "a discovery that has fundamentally changed our understanding of the relationship between amino acid sequence and protein structure."

It is the first Nobel Prize awarded primarily for a machine learning system. That is worth pausing on. The prize is not for the hardware, not for the dataset, and not for the biology alone — it is for the architecture of a neural network and the insight that evolutionary information could be used to train it. A piece of software, trained on publicly available data, solved a fifty-year problem that had defeated generations of experimental scientists.

Jumper has noted the strangeness of this recognition: "35,000 papers cite AlphaFold. That's the measure of it. Not that we built something impressive — but that it changed what other people could do."

What Comes Next

Hassabis has described the protein folding result not as an endpoint but as a template. The same framework — identify a grand challenge in science, find the data in which the answer is implicitly encoded, build the architecture to read it — is being applied to other problems: the prediction of gene regulatory networks, the modelling of whole-cell behaviour, the understanding of protein dynamics rather than just static structure.

His broader ambition is an AI system that can function as a scientific collaborator — not automating the craft of science but accelerating the rate of hypothesis generation and experimental design. "We want to compress decades of scientific progress into just a few years," he has said. Given that AlphaFold compressed fifty years of structural biology into a prediction that runs in five minutes, there is reason to take that ambition seriously.

The protein folding problem was supposed to be a long way off. It wasn't. That's the most important lesson AlphaFold teaches — not about proteins, but about the pace of change when the right algorithm meets the right data.

Based on interviews and lectures by Demis Hassabis (Google DeepMind) and John Jumper (Nobel Prize in Chemistry 2024), including their discussions of AlphaFold's development, the CASP14 results, and the AlphaFold protein structure database.

More in Artificial Intelligence
🌐 ID
Health Q&A
Hi! Ask me anything about species-appropriate nutrition, metabolic health, or ancestral eating.
Not medical advice. Consult a healthcare provider.