Artificial Intelligence can find Disease Related Genes

Researchers at Linkoping University have found that an artificial neural network has the ability to reveal patterns in very large amounts of gene expression data and discover groups of genes that are disease related. The findings give hope that this new method can eventually be applied to precision medicine and treatments that are individualized.

Scientists are creating maps of biological networks based on how different genes or proteins interact with each other. The team behind the new study used artificial intelligence (AI) to research whether it is possible to discover biological networks through deep learning in which entities which are known as artificial neural networks are trained through experimental data.

Artificial neural networks are excellent at learning how to find patterns in huge amounts of complex data. They are being used in applications such as image recognition. However until now, this machine learning method has seldom been used in biological research. For the first time, researchers have used deep learning to find genes that are disease related. This method is very powerful in analyzing enormous amounts of biological information or big data.

The team used a large database containing information about the expression patterns of 20,000 genes in a large amount of people. This information was unsorted meaning the team did not give the artificial neural network any information about which gene expression patterns were from people with diseases and which were from people that were healthy. The AI model was then trained to discover patterns of gene expression.

A challenge with machine learning is that it is impossible to see exactly how an artificial neural network will solve a task. AI is at times described as a black box meaning we only see the information that is put into the box and then the result it produces. The steps between cannot be seen.

Artificial neural networks consists of many layers in which information is processed mathematically. The network comprises an output layer and an input layer that delivers results of the information processing carried out by the system. Between the two layers are several hidden layers where calculations are carried out. When the team had trained the artificial neural network, they wondered if it was possible to lift the lid of the black box and then understand how it works. They wondered if the designs of the neural network and the familiar biological networks are similar.

When they analyzed the neural network it turned out that the first hidden layer to a large extent represented interactions between various proteins. Deeper within the model in contrast, on the third level they found groups of different types of cells. It was interesting to them that this type of biologically relevant grouping was automatically produced given that the network was started from unclassified gene expression data.

The team then investigated whether the gene expression model could be used to determine which particular gene expression patterns are normal and which are associated with disease. They were able to confirm that the model finds relevant patterns that agree well with biological mechanisms within the body. Since the model had been trained using unclassified data, it is possible that the artificial neural network had found totally new patterns.

The team plans to now investigate whether these previously unknown patterns are relevant from a biological perspective. They believe the key to progress in the field is to fully understand the neural network. This could teach many new things about biological contexts such as diseases wherein many factors interact. They believe that their method gives models which are easier to generalize and can be used for many different types of biological information.

The hope is that close collaboration with medical researchers will enable the team to apply the method developed in the study in precision medicine. It could be possible to determine which groups of patients should receive a particular type of medicine or identify patients who are most severely affected.

To view the original scientific study click below

Deriving disease modules from the compressed transcriptional space embedded in a deep autoencoder.