When it comes to identifying patterns in data mountains, humans are not on a par with artificial intelligence (AI). In particular, a branch of AI called machine learning is often used to find regularities in data sets, be it for stock market analysis, image and speech recognition, or cell classification. To reliably differentiate cancer cells from healthy cells, a team led by Dr. Altuna Akalin, head of the Bioinformatics and Omic Data Science Platform at the Max Delbrück Center for Molecular Medicine at the Helmholtz Association (MDC), has now developed a machine learning program. called “ikarus”. The program found a pattern that is common in different types of cancer cells in tumor cells, consisting of a characteristic combination of genes. The team also published in the journal Genome Biology that the algorithm also detected gene types in a model that had never been linked to cancer.
Machine learning basically means that an algorithm uses training data to learn how to answer certain questions on its own. It does this by looking for patterns in the data that help it solve problems. After the training phase, the system can generalize according to what has been learned to evaluate the unknown data. “The big challenge was to get the right training data, where experts already clearly distinguished between‘ healthy ’and‘ cancerous ’cells,” says Jan Dohmen, the first author of the article.
Surprisingly high success rate
In addition, single-cell sequencing data sets are often noisy. This means that the information about the molecular characteristics of individual cells is not very accurate, perhaps because a different number of genes are detected in each cell or because the samples are not always processed in the same way. Dohmen and his colleague Dr. Vedran Franke, the head of the research, reported that they searched numerous publications and contacted a few research groups to get the right data sets. Ultimately, the team used data from lung and colon cancer cells to train the algorithm before applying it to data sets from other tumor types.
During the training phase, ikarus had to find a list of gene traits and then use them to classify cells. “We tried and refined different approaches,” says Dohmen. It was a time-consuming job, according to three scientists. “The key was for Ikarus to ultimately use two lists: one for cancer genes and one for other cell genes,” Frank explains. After the learning phase, the algorithm was able to reliably differentiate healthy cells and tumors from other types of cancer, such as liver cancer or tissue samples from patients with neuroblastoma. His success rate was usually extremely high, which also surprised the research team. “We didn’t expect there to be a common signature that defined tumor cells of different types of cancer so precisely,” says Akalin. “But we still can’t say whether the method works for all types of cancer,” Dohmen added. To make Ikarus a reliable tool for diagnosing cancer, researchers now want to test it on additional types of tumors.
AI as a fully automated diagnostic tool
The project aims to go far beyond classifying “healthy” and “cancerous” cells. In early testing, ikarus has already shown that the method can also distinguish other cell types (and some subtypes) from tumor cells. “We want to make the vision more comprehensive,” says Akalin, “by further developing it to differentiate all possible cell types in biopsy.”
In hospitals, pathologists only examine tumor tissue samples under a microscope to identify different cell types. It’s a tedious job that takes a lot of time. With Ikarus, this step can one day become a fully automated process. In addition, Akalin noted that the data can be used to draw conclusions about the immediate environment of the tumor. And this can help doctors choose the best therapy. Cancer tissue and microenvironment often indicate whether a particular treatment or medication will be effective. In addition, AI can also be useful in developing new drugs. “Icarus allows us to identify genes that are potential agents of cancer,” says Akalin. New therapeutic agents could then be used to target these molecular structures.
Home and office collaboration
A notable aspect of the publication is that COVID was prepared in its entirety during the pandemic. Not all of the participants were present at its regular meetings of the Berlin Institute of Medical Systems Biology (BIMSB), which is part of the MDC. Instead, they were in home offices and only communicated digitally with each other. According to Frank, “the project shows that a digital structure can be created to facilitate scientific work under these conditions.”
Reference: Dohmen J, Baranovskii A, Ronen J, Uyar B, Franke V, Akalin A. Identify tumor cells using single-cell automatic learning. Genome Biol. 2022; 23 (1): 123. doi: 10.1186 / s13059-022-02683-1
This article has been republished from the following materials. Note: The material may have been edited for length and content. For more information, contact the source.