Metastatic Cancer Detection in Histopathological Image Scans

栏目: IT技术 · 发布时间: 5年前

Metastatic Cancer Detection in Histopathological Image Scans

Using Machine Learning to predict if Cancer is within Microscopic Images.

Feb 29 ·7min read

The Terrible Tale of Cancer

Cancer continues to be one of the world’s most deadly diseases, killing over 10 million+ people a year. One of the reasons that cancer is so deadly is that if it’s left unattended even for a short period of time, it can have already taken over a biological system. Early diagnoses are the key to battling cancer, and machine learning is revolutionizing early diagnosis.

Machine learning is able to deduce patterns in things that humans can’t. This includes finding very abstract patterns in images. Using the power of machine learning, a model can be created to deduce cancerous cells from non-cancerous cells in image scans. Convolutional Neural Networks can also help with this problem, as they are perfect for finding relationships in spatial sensitive data like image scans.

Metastatic Cancer

Cancer usually starts in a primary place in the body and then spreads viciously. Cancer that results from the primary place in the body is metastatic cancer. It’s essentially secondary cancer that is a direct result of primary cancer. So why are we looking for secondary cancer and not the root cause? Because it’s much easier to deduce secondary cancer when scanning the whole body and when someone has cancer, metastatic cancer is usually throughout their body in small quantities. Finding metastatic cancer is like an indicator to determine if someone has a crippling case of cancer somewhere else in their body.

Specifically, metastatic cancer can be found in different organs in the body. We can examine histopathological image scans , which are microscopic images of organs and cells within the body. Histopathological image scans have a great detail of different cells within what portion of the body one is looking at and can examine different diseases. However, it’s hard to tell in these scans if cancer is present or not since we aren’t all experts within the field of cancer research. Machine learning can be used to solve this issue, to tell if metastatic cancer is present in histopathological image scans.

Machine Learning + Convolutional Neural Networks

Machine learning is all about getting machines to learn how to perform a specific task. Instead of giving explicit instructions, a machine learns to create its own instructions . In reality, all these instructions are mathematical values. A machine learning model is just a differentiable function , that takes in a predefined amount of input, uses variables (weights) to perform actions on the input and produces an output. It’s like y = mx + b . the x is the input of the model, m and b are the variables (weights) and the y is the output. The machine algorithm has to optimize the m and b weights so when given any input, it will always produce the desired output. How do we optimize our variables? There are essentially two steps, generating a loss and backpropagation.

The Loss Function

The loss function is a function that represents how bad the model performed. The higher the loss, the worse the model performs. We can generate the loss by using labelled data . We can give the model input of x and let it produce its own output based on its weights. Since we have labelled data, we know what the output of the model should be. We can compare the model’s output to the real output and see the difference. For example, the model’s output might be 3 and the real output might be 5. A loss for this example could be 2, which is how far apart the two outputs were. We want the model to always produce a loss of 0, which means we need to minimize the loss function .

Backpropagation

Backpropagation is the process of going back to the model’s weights and changing them to minimize the loss function. It does this by taking the loss function and calculating it’s gradient. The gradient is the instantaneous rate of change of the generated loss, and looking upon the gradient, we can go to the lowest point of our gradient where the weights produce a lower loss. This is called gradient descent and is key in a lot of machine learning algorithms. We can repeat this process until our loss keeps getting smaller and smaller, and while our model keeps getting better and better. Usually, a model will have thousands of weights, so the model can become very accurate when producing the output.

Convolutional Neural Networks

Convolutional Neural Networks employ all the principals of machine learning mentioned above yet incorporate a method that can analyze spatial data, such as images and audio. It uses filters , a machine learning algorithm that uses weights to scan across the input data. Filters are a grid of weights that scan over the input data and at each scan, multiply the weight by the input. Since images are also just grids of values, convolutional filters can easily go over them. After a convolution or when it’s done scanning the image, the results of the scanning (the multiplied product) are pooled together. What the pooling is basically doing is representing the input data in a smaller concise way with all the important features. Essentially what convolutional neural networks do is that they deduce the special features of an image and put it all together.

For a more comprehensible explanation, check out this article I wrote explaining convolutional neural networks in more detail.

Metastatic Cancer Detection Model

The Data

The data used to create the model was the Histopathologic Cancer Detection labelled data on Kaggle. It comes with over 150 000+ images of histopathologic image scans with every image labelled with either a 1 or a 0. 0 Indicates no metastatic cancer is prevalent in the image and 1 indicated that metastatic cancer is within the image. Each image is 96 pixels by 96 pixels by 3 colour channels (Red, Blue, Green). We can represent each image by a grid of values of 96x96x3.

The Model

The conception of the model was based on the principles of machine learning and tailored toward the task at hand.

Convolutional layers were used for the feature extraction of the images, to generalize what a cancerous image looks like and what a non-cancerous image looks like. The layers are 2D convolutional neural networks meaning that the convolutional layers take 3D matrices of values (the representation of the images).

The max pool layers take the feature maps created by the convolutional layers and generalize them. It does this by taking max value for every 3x3 square in the image. This way, when the feature maps are given to the other parts of the neural network, it can have an easier time figuring out if an image contains cancer or not.

The dropout layers that are scattered throughout the network are to make sure that the model learns to perform the task and doesn’t just memorize the data . It does this by turning off certain weights during the loss calculation, which leads to a slightly lower accuracy yet generalizes to a wide array of images.

Lastly, the dense layers make sense of the feature maps and actually figure out if there is cancer or not . The last layer contains one node or output, which will either be a 0 or 1. 0 meaning no cancer and 1 meaning cancer. The loss function will use this to calculate the loss and backpropagation will optimize the weights in these layers.

The model contains over 51 million+ parameters, meaning it can attain a really good accuracy.

Loss Function + Optimization

The loss function used for this model was the categorical cross-entropy loss . This loss takes the model output and performs the log function and multiplies it with the real loss. It uses the log function because as you start getting to the real answer, the loss gets exponentially better.

The optimizer used for this model was the Adam optimizer. This optimizer performs regular backpropagation but as it gets closer to the global minimum of the loss function, it slows down how fast it optimizer to prevent overstepping the minimum. It’s a pretty standard optimization function.

Training

The model trained for over 10 epochs or performed backpropagation in batches of 100 images each 10 times over the whole dataset. 20% of the images were used for validation or making sure that the model was training. After a couple of hours of training, the model was finished.

The Results are in…

The model got an 80% accuracy rate of detecting cancer. Which means, 80% will be right in its prediction. Here are a few examples:

Overall, the model works very well and is a viable option when detecting cancer.

References:

https://mc.ai/gradient-descent-and-its-types/ (Gradient Descent Image)

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Metastatic Cancer Detection in Histopathological Image Scans

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

你凭什么做好互联网

曹政 / 中国友谊出版公司 / 2016-12 / 42.00元

为什么有人可以预见商机、超越景气，在不确定环境下表现更出色？在规则之外，做好互联网，还有哪些关键秘诀？当环境不给机会，你靠什么翻身？本书为“互联网百晓生”曹政20多年互联网经验的总结，以严谨的逻辑思维分析个人与企业在互联网发展中的一些错误思想及做法，并给出正确解法。从技术到商业如何实现，每个发展阶段需要匹配哪些能力、分解哪些目标、落实哪些策略都一一点出，并在......一起来看看《你凭什么做好互联网》这本书的介绍吧!

码农工具

Metastatic Cancer Detection in Histopathological Image Scans

Metastatic Cancer Detection in Histopathological Image Scans

Using Machine Learning to predict if Cancer is within Microscopic Images.

The Terrible Tale of Cancer

Metastatic Cancer

Machine Learning + Convolutional Neural Networks

The Loss Function

Backpropagation

Convolutional Neural Networks

Metastatic Cancer Detection Model

The Data

The Model

Loss Function + Optimization

Training

The Results are in…

References:

你凭什么做好互联网

html转js在线工具

UNIX 时间戳转换

HEX CMYK 转换工具