TomoTwin provides the way for the automated identification and localization of proteins directly within their cellular environment, thereby expanding the potential of cryo-ET, according to co-first author Gavin Rice. Cryo-ET has the potential to decipher how biomolecules function within a cell, thereby revealing the origins of life and disease.
In a cryo-ET experiment, scientists acquire tomograms, or three-dimensional images, of the cellular volume containing complex biomolecules using a transmission electron microscope.
In order to obtain a more detailed image of each unique protein, they average as many copies of it as possible – similar to how photographers take the same photo at various exposures in order to create a precisely exposed image.
Before averaging the various proteins in the image, it is crucial to correctly identify and locate each one. “Scientists can acquire hundreds of tomograms per day, but we lacked the tools to accurately identify the molecules contained within them,” Rice explains.
To date, researchers have used error-prone algorithms based on templates of already-known molecular structures to search for matches in tomograms. Identifying molecules manually assures high-quality selection, but requires days to weeks per dataset.
A further option would be to utilize supervised machine learning. In order to train the software for each novel protein, these tools require thousands of manually-labeled examples, which is nearly impossible for small biological molecules in congested cells.
Many of these obstacles are surmounted by the newly developed software TomoTwin: It learns to select similar-shaped molecules within a tomogram and map them to a geometric space; the system is rewarded for positioning similar proteins close together and penalized otherwise.
In the new map, scientists can isolate and precisely identify the various proteins and use this information to locate them within the cell. “One advantage of TomoTwin is that we provide a pre-trained picking model,” Rice explains.
By eliminating the training phase, the software can even operate on local computers; whereas processing a tomogram on the MPI supercomputer Raven typically takes 60 to 90 minutes, runtime is reduced to 15 minutes per tomogram.
TomoTwin enables researchers to select dozens of tomograms in the time it takes to select a single one manually, thereby increasing data throughput and the rate of image averaging to produce a superior image.
Currently, the software can locate globular proteins or protein complexes larger than 150 kilodaltons in cells; in the future, the Raunser group plans to add membrane proteins, filamentous proteins, and smaller proteins.