Researchers at the Paul Scherrer Institute PSI have developed an AI that could open up a new, cost-effective approach to identifying genetic perturbation patterns in cell images – potentially enabling the development of new drugs.
The therapeutic strategies adopted by modern medicine focus on the early detection and targeted manipulation of disease-related genes. Particularly for complex diseases such as cancer, neurodegenerative diseases such as Alzheimer’s and chronic inflammations associated with ageing, the challenge is not just to identify individual genes but also to understanding their regulatory networks. It is becoming increasingly clear that the three-dimensional organisation of DNA in the cell nucleus – known as chromatin – plays a key role.
Until now, analysing such genetic changes has relied on complex and expensive gene expression or sequencing techniques. In a paper published in the journal Cell Systems, researchers led by G.V. Shivashankar – head of the Laboratory of Multiscale Bioimaging at the PSI Center for Life Sciences and Professor of Mechano-Genomics at ETH Zurich – and Caroline Uhler, Director of the Eric and Wendy Schmidt Center at the Broad Institute and Professor of Electrical Engineering and Computer Science at MIT have now presented a new approach: Together with their student colleagues Daniel Paysan, Adityanarayanan Radhakrishnan, and Xinyi Zhang, they developed an artificial intelligence (AI) system called Image2Reg. This may allow genetic perturbations and potential drug targets to be identified based solely on simple microscopic images of the cell nucleus, such as those obtained after Hoechst staining with blue fluorescent dyes. “The combination of imaging, machine learning and molecular networks may ultimately enable diagnostic and therapeutic access that is both rapid and cost-effective,” explains Shivashankar.
AI develops an “eye” for activity in chromatin …
In the first step of Image2Reg (which stands for Image to Regulation, i.e. from cell image to gene regulation), the researchers make use of the fact that the three-dimensional structure of the cell nucleus – the chromatin in which the DNA is packaged – is visibly altered when the activity of a particular gene is disrupted or experimentally overstimulated. The resulting images often display very subtle but systematic changes in the appearance of the chromatin.
The researchers then trained an adaptive algorithm – a convolutional neural network (CNN), a form of artificial intelligence specialising in image processing – to recognise these patterns. The AI developed an “eye” for the typical traces left by an activity in the chromatin image.
… builds a network of cellular relationships …
While image analysis is taking place, Image2Reg constructs a biological network specific to the type of cell in question, which describes how the genes inside the cell are related to each other. This network is based on two well-established principles of molecular biology: Firstly, on known protein-protein interactions – in other words, which proteins come into contact with each other and trigger biochemical processes. Secondly, it includes data on gene expression – drawn both from the analysis of individual cells and from classical collective measurements, in which the average activity of a large number of cells is recorded. These data show which genes are active at the same time and might influence or regulate each other.
Finally, this information is summarised in the form of a mathematical model. The model processes the complex links between the genes and creates a numerical representation of each individual gene, reflecting its function, role and connectivity within the biological system. The aim is to create a kind of “rule book” for the cell, a description of which genes work together, regulate each other or are assigned to certain signalling pathways – regardless of how they manifest themselves in the cell image.
... and combines image and biology
In the final and decisive step, Image2Reg brings together the two spheres of understanding previously acquired – the image-based visualisation of the cell nucleus and the molecular knowledge of the gene network. This is done using a mathematical process known as a kernel machine, more specifically the neural tangent kernel (NTK). This model learns how the numerical patterns deduced from image analysis – i.e. what the neural network deduces from the cell images – can be matched with the functional relationship between the genes in the biological network.
Correct predictions far above random level
Almost a million single cell images were available to the researchers for training the AI – some came from untreated control cells, the others from cells in which a single gene had been deliberately overactivated.
In the subsequent test, the model was asked to identify which of 41 possible genes had been altered, based solely on the cell images. The AI was 26 percent accurate. In other words, in one in four cells it correctly determined which gene had been modified. Based on chance alone, by comparison, its success rate would have been around two percent. The researchers see this as a clear indication that recognisable patterns exist in the structure of the cell, that these are linked to certain genes and that they can be identified using image analysis. “It builds a bridge between form and function, between image and biology,” explains Caroline Uhler.
A tool for diagnosis and treatment
The Image2Reg process is still in its infancy. However, it opens the way to a range of practical applications with great medical pertinence. Many diseases – such as cancer, Alzheimer’s or autoimmune disorders – arise from disturbances in gene regulation. The authors envision that Image2Reg could be used to identify which genes are affected by a disease or a therapy – without the need for elaborate experimental analyses. Ultimately the hope is that this process will allow diseases to be identified in their early stages, long before classical markers such as proteins, RNA or known symptoms can be measured.
At a newly founded start-up company, the researchers want to use their findings to develop a therapy for fibrosis, which is currently incurable. In this clinical condition, the body replaces functional tissue with hard, scar-like tissue. In the long term, this reduces organ function. The lungs, liver and heart are frequently affected.
How Image2Reg works
1. Analysing cell image
The AI identifies subtle changes in the structure of the chromatin using a convolutional neural network (CNN).
2. Developing network
At the same time, a network is being created specific to the type of cell – based on known protein interactions and gene expression data. Each gene is assigned a mathematical representation of its function within the cell framework.
3. Integrating the two levels
The neural tangent kernel (NTK) merges the image and network data to make the gene activity visible.
Text: Werner Siefer