The rapid growth of text-to-image AI models such as OpenAI’s Dal-E2 (programs trained to create diverse images) has excited creative industries from the fashion industry to the film industry by providing strange and surprising images. The same technology behind these programs has also attracted the attention of biotechnology laboratories. Laboratories have begun using this type of generative artificial intelligence, known as diffusion models, to create designs for proteins that have not been observed in nature.
Recently, two labs independently reported programs that use diffusion models to generate designs for new proteins with greater precision than previously possible. Generate Biomedicines, a Boston-based startup, has introduced a program called Chroma. In the meantime, a team from the University of Washington is under your supervision David Baker has developed a similar program called RoseTTAFold Diffusion.
In a newly published preprint paper, Baker and his colleagues show that their model can generate precise designs for new proteins that can then be made in the lab. Brian Tripp“We are making proteins that are not similar to existing proteins,” says RoseTTAFold’s developers. These protein generators can be directed to produce designs for proteins with specific properties such as shape or size or function. In fact, with their help, we can finally make new proteins that perform a specific task.
Researchers hope that their findings will lead to the development of new and more effective drugs. George Grigorian“We can discover in minutes what took millions of years for evolution to achieve,” says Generite Biomedicine’s chief technical officer. Ava Amini“The important point of this work is to create proteins based on the properties we want,” says the biophysicist at the Microsoft Research Institute in Cambridge, Massachusetts.
Symmetrical protein structures produced by Chroma.
Proteins are the basic building blocks of living organisms. In animals, they digest food, contract muscles, detect light, direct the immune system, and perform other functions. Proteins also play a role in disease. Therefore, proteins are the main target of drugs and many new drugs are based on proteins. “Nature uses protein for everything,” Grigorian says. “The promise this achievement holds for therapeutic interventions is really great.”
Meanwhile, drug designers must now use a list of natural proteins. The goal of protein generation is to expand this list with an almost infinite reservoir of computer-designed proteins. Computational techniques for designing proteins are not new; But previous methods for designing large proteins or protein complexes (molecular machines made up of several proteins linked together) have been slow and lack powerful performance. Often such proteins are vital for the treatment of diseases.
Protein structure produced by RoseTTAFold Diffusion (left) and the same structure created in the laboratory (right).
The two programs just announced are also not the first applications of diffusion models to protein synthesis. Several studies published over the past few months by Amini and others have shown that diffusion models are promising techniques; But the aforementioned studies were proof of concept. Chroma and RoseTTAFold Diffusion are the first complete programs that can generate detailed designs for a wide variety of proteins.
Namrata Anand who recently developed one of the first diffusion models for protein synthesis, recognizes the importance of Chroma and RoseTTAFold Diffusion in that they have scaled up their work and trained it on more data and more computers. “It’s fair to say the model is more DALL-E-like because of the way it’s scaled up,” he says. Diffusion models are neural networks that are trained to remove noise from their inputs (random disturbances added to the data).
In Chroma, noise is taken into account by unraveling the amino acid chains that make up the protein. Assuming a random mass of chains, Chroma tries to fit them together to form a protein. Chroma can generate new proteins with specific properties by guiding the set constraints on what the result should look like. Tim Baker takes a different approach. Although the final results are similar, their diffusion model starts with a more chaotic structure.
Another important difference is that RoseTTAFold Diffusion uses information about how parts of a protein fit together, which provides a separate neural network. This network is trained to predict the structure of proteins (as DeepMind’s AlphaFold does). Genreit Biomedicine and Tim Baker both present an impressive array of results. They can produce proteins with different degrees of symmetry, including circular, triangular or hexagonal proteins. Generite Biomedicine produced proteins in the form of 26 letters of the Latin alphabet and the numbers zero to ten to demonstrate the capabilities of its program.
Most of the structures on display don’t actually serve a purpose; But since the function of the protein is determined by its shape, the ability to produce different structures is very important. Of course, creating strange designs with the help of a computer is different from turning these designs into real proteins. To test whether Chroma produced designs that could actually be built, Generit Biomedicine took some of its design sequences (the amino acid strings that make up proteins) and ran them through another AI program. They found that 55% of the designs had the ability to transform into real and stable proteins in the lab.
Tim Baker conducted a similar experiment; But Baker and his colleagues have gone beyond generic biomedicine in evaluating their model. They have produced some RoseTTAFold Diffusion designs in the lab. The company says it is conducting laboratory tests; But it is not yet ready to publish the results. “It’s more than a proof of concept,” says Tripp. “We use it to make really great proteins.”
Structure of RoseTTAFold Diffusion-generated protein binding to SARS-CoV-2 spike protein.
For Baker, the main result is the production of a new protein that binds to parathyroid hormone. Parathyroid hormone controls blood calcium levels. “We gave the hormone to the model and told it to make a protein that binds to it,” he says. When they tested the new protein in the lab, they found it bound to the hormone more tightly than any other protein that could be generated using other computational methods, and more tightly than existing drugs.
Grigorian acknowledges that inventing new proteins is their first step. “We are a pharmaceutical company,” he says. “At the end of the day, what matters is, can we make effective drugs?” Protein-based drugs must be produced in large quantities and then tested in the laboratory and ultimately on humans. This may take several years. However, Grigorian thinks they will find ways to speed up the process as well. Baker says: “The speed of scientific progress is insane; But right now we are in the middle of what can be called a technical revolution.”