Proteolabio: an AI compass for biomolecules design
By Guido Uguzzoni, Andrea Pagnani, Jorge Fernandez-de-Cossio-Diaz Politecnico di Torino.
Imagine you are close to some mountains and you want to reach the highest peak but you can’t see around you because of the mist. What do you do? Probably you start to take the direction of the ground with the highest slope. But depending on whether you are at the fall of Mount Fuji or near the Dolomites the result can be different: you can end up reaching the highest peak or just the closest peak surrounded by higher mountains.
That’s the situation with Protein Design when you use the traditional Directed Evolution approach, except that you are not moving in two dimensions as on the earth's surface but in a huge multidimensional space.
But let’s take a step back: what is Protein Design? And what is Directed Evolution? And why are they important?
Proteins are the most versatile molecules we know. It is not by chance that every gene codes for a protein. Inheritance in life can be reduced to transmit the information to build and to regulate the expression of these wonderful molecules: the difference between you, your parents, and a bat lies in the type and the regulations of these molecules.
To visualize a protein you can imagine a necklace of 20 types of pearls: the amino acids. The miraculous thing about these chains is that if you choose carefully the sequence of pearls, the chain can fold itself spontaneously and form a stable object. Of these objects, we have millions of examples in Nature. Each of these peculiar chains not only folds itself but is able once folded to perform a huge variety of functions.
The problem of decoding which chains are able to perform these incredible folding and which is the shape of the final object is an old problem that has tantalized the scientific community for 50 years.
It’s interesting that instead of solving it, Frances Arnolds invented a way to circumvent the problem and to obtain protein variants optimized for new “unnatural” tasks. The ingenious way was to mimic Nature and accelerate Darwinian selection in the lab. By randomly mutating protein millions of times and selecting for performing a function and repeating the process many times it leads to variants that are better than the natural protein in performing the task. And led Frances Arnolds to the Nobel prize in 2018. This method is called Directed Evolution.
The problem has huge technological implications. Being able to design molecules that can perform a desired task can lead to new therapeutic solutions, to substitute long and polluted chains of chemical reactions and a lot more.
The limitations of Directed Evolution are due to the astronomical number of possible variants of a protein (for a protein of 100 amino acids is higher than the number of atoms in the universe) and the purely experimental approach: you can’t test even a small fraction of the possible variants. What you can test is just the surroundings of your starting point and enhance a bit just to reach the closest peak.
Astonishingly, last year (2020) Deep Mind, a well-known company that championed AI innovations (see Alpha Go), solved the 50-old Protein Folding problem: his algorithm Alpha Fold 2 predicts the structure of a protein starting from the sequence within the confidence error of the experimental one. Problem solved? Not exactly, because the folding is only a part of the wider technological problem: now that we can predict the structure, how to design a new protein that performs a different task?
The Proteolabio solution integrates experiments and Artificial Intelligence, the approach of Directed Evolution with the Alpha Fold one. The idea is to give the whole information of what happens inside a Directed Evolution experiment to a machine learning algorithm that generates a fitness map of all possible variants.
This in-silico method is used to propose new candidates to be tested in the following iteration of the experiment. This approach efficiently finds the way to the highest “peak” using the AI map that updates as more information is available. In a virtuous interaction between a lab experiment and AI guidance.
The method is patented and relies on an innovative algorithm that leverages to the best the rich information of the experiments.
The applications of this novel approach are numerous: from protein therapeutics such as monoclonal antibodies, engineered carT cells, viral carriers, to artificial enzymes and their plethora of environmentally sustainable solutions in many fields: detergent, food & beverage, biofuel, plastic degradation and bioremediation among others.