It wouldn’t be far-fetched to say that humans are like machines. After all, we are packages of cells, trillions of them that perform various tasks the way machines do – from turbines that generate energy to transporters that walk along tracks pulling cargo, all of which are built from proteins.
The shapes of proteins are critical to their functions. These shapes are determined by the sequence of 20 amino acids that are chained together into protein molecules. It is easy to work out the sequence of any protein because this is determined by the DNA that codes for it. But determining the shape of a protein from its amino acid sequence is far more challenging. Until recently, this was done by humans in the lab using experimental methods such as X-ray crystallography which involves examining the diffraction pattern formed when an X-ray beam is fired through a protein crystal. This is time-consuming work and one that is ridden with errors. Brute-force computing based on physics alone isn’t an option because proteins are too complex. Instead, many researchers worldwide have turned to machine learning, where Artificial Intelligence (AI) systems are trained using data sets of known protein structures. One such AI group is DeepMind based in Oxford, UK.
In 2020, DeepMind revealed a path-breaking result it achieved from a competition to unravel protein shapes: its AlphaFold AI algorithm scored above 90 out of 100 for two-thirds of the proteins whose shapes have been previously determined using experimental methods but not yet published. This is an impressive result given that it took decades for scientists to unlock the structure of just 17 per cent of the proteins in the human body.
So how did the DeepMind team achieve this breakthrough? For each target protein, DeepMind’s algorithm looks for variants found in related species and feed their sequence and structure into the AI system, along with the sequence of the target protein. The idea is to train the system to learn to work out the shape of the target protein by looking at patterns linking sequence and structure.
Separate from the competition, Andrei Lupas at the Max Planck Institute for Developmental Biology in Germany had been trying to work out the structure of a particular protein for a decade until DeepMind offered to help. A few tweaks were needed to improve accuracy, but Lupas’s team had the final structure within half an hour of receiving AlphaFold’s prediction. “It’s astonishing,” he says. “It’s really astonishing.” Lupas thinks for the next few years researchers will still need to do some experimental work to check shape predictions but will eventually be able to rely on computation alone, allowing new, highly targeted drugs to be designed more efficiently.
The pace of progress since 2020 has been breathtaking. By July 2021, DeepMind has mapped the structure of 98.5 per cent of the 20,000 or so proteins in the human body (including proteins relevant for Covid-19). For 35.7 per cent of these, the algorithm gave a confidence of over 90 per cent accuracy in predicting its shape. Demis Hassabis, chief executive and founder of DeepMind, says that the codes for AlphaFold (which is a collection of 32 algorithms) has been made open source, and is now capable of solving protein shapes in minutes or, in some cases, seconds using hardware no more sophisticated than a standard graphics card. His excitement when asked to describe AlphaFold’s capabilities is palpable:
“It takes one [graphics processing unit] a few minutes to fold one protein, which of course would have taken years of experimental work. We’re just going to put this treasure trove of data out there. It’s a little bit mind blowing in a way because going from the breakthrough of creating a system that can do that to actually producing all the data has only been a matter of months. We hope it’s going to become a sort of standard tool that all biologists around the world use.”