Deep learning experiments on a medical dataset

栏目: IT技术 · 发布时间: 4年前

内容简介:Along with the image, it contains a bunch of machine recorded meta-data. We can use this metadata to get insights about our scans. Hence we save them in a data frame (since dicoms are too slow to access and use).A useful tip to view the head of a data fram

The use of deep learning for medical applications has increased a lot in the last decade. Whether it’s to identify diabetes using retinopathy , predict

pnuemonia from Chest X-rays or count cells and measure organs using image segmentation , deep learning is being used everywhere. Datasets are being made freely available for practitioners to build models with.

In this article, you will learn about a bunch of experiments we conducted while working with brain MRIs. These experiments are available on Github as a sequence of notebooks.

Deep learning experiments on a medical dataset

The reason for numbering the notebooks in this way is to allow others to systematically go through them to see what steps we took, what intermediate results we got and what decisions we made along the way. Instead of providing a final polished notebook, we wanted to show all the work that goes into one project because that’s where the real learning lies.

The repository also includes useful links to blogs for domain knowledge, research papers and all our trained models. I would recommend everyone to fork this repository and run every notebook themselves, understand what every bit of code does and why it was written how it was written.

Let’s take a look at each notebook. All of these notebooks were run on Kaggle or Google Colaboratory.

00_setup.ipynb

This notebook contains steps to be followed to download and unzip the data on Google colaboratory. On Kaggle you can access the data directly.

01_data_cleaning.ipynb

For this project, we have used Jeremy Howard’s clean dataset . He has a notebook on the steps he performed to clean the data. We replicate some of those steps in our notebook. In general it is a good idea in deep learning to clean the data quickly for rapid prototyping.

The data is available in the form of dicom files which is the standard extension for medical data. Let’s look at the contents of one of them.

Deep learning experiments on a medical dataset

Along with the image, it contains a bunch of machine recorded meta-data. We can use this metadata to get insights about our scans. Hence we save them in a data frame (since dicoms are too slow to access and use).

Deep learning experiments on a medical dataset

A useful tip to view the head of a data frame with a lot of columns would be to transpose it.

Now let’s take a look at some of the images.

Deep learning experiments on a medical dataset

We see that we have images of various slices of a person’s brain from the top of his head right down to his teeth. This is how brain scans are performed as a series of slices starting from top and gradually going down. Sometimes the top or the bottom slice can be completely black (a slice before or after the face). Such slices will be useless for our model and should be removed.

There is a helpful column in the meta-data known as img_pct_window which tells us about the percentage of pixels in the brain window. If we plot this column we get the following graph:

Deep learning experiments on a medical dataset

We see that a lot of images have hardly any pixels in the brain window. Hence we discard such images. If we look at some of the images we are discarding

Deep learning experiments on a medical dataset

we see that they’re the early ones or the latter ones where the teeth start to appear.

The second step in our data cleaning process is fixing the RescaleIntercept column. Refer to Jeremy’s notebook for more information about this. Finally we center crop (to eliminate background) and resize the images to (256,256). Although high quality images would give a better accuracy, this size is enough for prototyping.

02_data_exploration.ipynb

Now that we’ve cleaned our data, we can explore it a bit. We will be doing the modeling in 2 stages. In stage 1, we only predict whether a person has a bleed or not.

We start by checking for null values. The labels data frame does not have null values while the metadata dataframe has some columns which are almost completely null. We can safely eliminate these columns.

We then move on to checking the target variable.

Deep learning experiments on a medical dataset

We have a very balanced dataset. This is usually not the case with medical datasets. They’re usually imbalanced with the number of positive class samples much much less than the number of negative class samples.

Next, we check the count of each subcategory.

Deep learning experiments on a medical dataset

Subdural seems like the most common type of hemorrhage. One interesting column in the metadata is the “bits stored” column which indicates the number of bits used to store the data. It has two distinct values: 12 and 16.

Deep learning experiments on a medical dataset

This might indicate that the has come from 2 different organizations. In deep learning, it is generally advised to have data from the same distribution. Finally, we can view our images in various windows.

Deep learning experiments on a medical dataset

However, this is only for human perception. A neural network can process floating point data and does not require any windowing.

03_data_augmentation.ipynb

The first step in training a good deep learning model is to get enough data. However, that’s not always possible. What is possible though, it to apply some transformations on the data .

What we do is, instead of feeding the model with the same pictures every time, we do small random transformations (a bit of rotation, zoom, translation, etc…) that doesn’t change what’s inside the image (for the human eye) but changes its pixel values. Models trained with data augmentation will then generalize better.

In this notebook, we try some of these transformations on our dataset and see if they make sense. Also note that these transformations are only applied on the training set. We apply very minimalistic transformations on the validation set.

Something like a flip

Deep learning experiments on a medical dataset

or slight rotation

Deep learning experiments on a medical dataset

would make sense, however some other transformations might not be useful at all because they wouldn’t resemble the actual data.

Deep learning experiments on a medical dataset

Also note that we’ve already cropped and resized so we won’t be doing those.

05_metadata.ipynb

Recent research has shown that metadata can prove useful in image classification. We can either do this by averaging the predictions or feed both the data to a neural netowork .

To do this, we fisrt try to classify the hemorrhages solely using the metadata to gauge its usefulness. We find that even after using a robust model like Random Forest, we get an accuracy of only 50%. Hence, we decide to discard the metadata and focus on the images itself.

04_baseline_model.ipynb

For our baseline model, we start with a resnet18 as our backbone and attach a custom head to it. Transfer learning has shown good results when it comes to image classification . It gives better results, faster training and reduces resource consumption significantly.

However, for things like medical images, there are no pretrained models available . Hence we make do with resnet. We train 2 models in this notebook, one with and without pretraining. In both the models, we are able to achieve an accuracy of about 89% in just 5 epochs.

Deep learning experiments on a medical dataset

The batch_size is set to 256. We use Leslie Smith’s learning rate finder to find a good learning rate during training.

Deep learning experiments on a medical dataset

In simple terms, we can run a mock training on our data varying our learning rate from a very low value to a value as high as 10. We then calculate the loss and plot it against the learning rates. We then select a learning rate with the maximum decreasing slope.

In this case we can select something like 3e-3 since 1e-1 and 1e-2 are on the edge and the loss can shoot up very easily. We can also use a slice for our learning rate, for example 1e-5 to 1e-3 . In this case, different learning rates will be applied to different groups of our network.

For training, we initally freeze the pretrained part of the model and only train the new head. However, we do update the batchnorm layers of the resnet to maintain a mean of 0 and standard deviation of 1 among every layer.

A plot of the losses is as shown below.

Deep learning experiments on a medical dataset

We can then show some of the results.

Deep learning experiments on a medical dataset

The red ones are the ones our model misclassified.

That will be it for now. Before I end this article, I would like to give one very important tip: “Don’t set random seed when training a model, let it train on different datasets every time. It will help you see how robust your model is.”

This article and project is a work in progress, in the next stage you will see the following notebooks. Till then, happy leanring.


以上所述就是小编给大家介绍的《Deep learning experiments on a medical dataset》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

设计模式之禅(第2版)

设计模式之禅(第2版)

秦小波 / 机械工业出版社 / 2014-2-25 / 89.00元

本书是设计模式领域公认的3本经典著作之一,“极具趣味,容易理解,但讲解又极为严谨和透彻”是本书的写作风格和方法的最大特点。第1版2010年出版,畅销至今,广受好评,是该领域的里程碑著作。深刻解读6大设计原则和28种设计模式的准确定义、应用方法和最佳实践,全方位比较各种同类模式之间的异同,详细讲解将不同的模式组合使用的方法。第2版在第1版的基础上有两方面的改进,一方面结合读者的意见和建议对原有内容中......一起来看看 《设计模式之禅(第2版)》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具