Reading Color Blindness Charts: Deep Learning and Computer Vision

栏目: IT技术 · 发布时间: 5年前

内容简介：There are plenty of online tutorials where you can learn to train a neural network to classify handwritten digits using the MNIST dataset, or maybe telling the difference between cats and dogs. Us, humans, are always very good at these and can easily match

Reading Color Blindness Charts: Deep Learning and Computer Vision

There are plenty of online tutorials where you can learn to train a neural network to classify handwritten digits using the MNIST dataset, or maybe telling the difference between cats and dogs. Us, humans, are always very good at these and can easily match or beat the performance of a computer.

However, there are some cases where computers can actually help humans do something that we have difficulty with. For instance, I have mild red-green colorblindness. Thus, charts such as these have usually been difficult if not impossible to see:

This is a 6 (I think?)

What if I could make the computer do this test for me, without me having to squint and inevitably get the question wrong either way.

Well, this task seems simple. Let’s take some images, split them into a training and test sets, train a convolutional neural network, and bam, we are finished. Except… there is no dataset. Online, I was able to find only 54 different images, which is not enough for a training set, given that there are 9 classes (digits 1–9).

So what now? Well, we still have our good old MNIST dataset. We can use it to train a neural network that is amazing at classifying individual digits. With some OpenCV transformations, we can get our charts to look similar to MNIST, which looks like this:

Handwritten 5

Let’s do it!

Training Convolutional Neural Network on MNIST Dataset

There are many tutorials on this, but I will nonetheless give a high-level overview on how this is done.

First, we will need Tensorflow installed, which is available using pip.

pip install tensorflow

Or if you have a GPU:

pip install tensorflow-gpu

Now we will create a mnist.py file and get our data:

Next, we will setup our Convolutional Neural Network using one Conv2D layer, followed by MaxPooling, and Dropout. Then, our 2D output is flattened and put through a Dense layer with 128 units, followed by our classification layer with 10 classes (number 1–10). The output will be a vector of a length of 10 to indicate the prediction. For instance, a 2 will be predicted like this:

[0, 0, 1, 0, 0, 0, 0, 0, 0, 0]

since the 1 is in the 2nd index. Here is the code:

Now, we compile the model, run it on our training data, evaluate on our test data, and save as an .h5 file in our directory:

When we run this code, we get a training accuracy of about 99% and test set accuracy of:

Not bad! Now, you should see a mnist.h5 file in your directory. Now, let’s move to the next step.

OpenCV Chart Processing

For this part, we will need a few more libraries:

pip install opencv-python
pip install imutils
pip install numpy
pip install sklearn
pip install scikit-image

We need to convert our charts to look somewhat like the MNIST dataset. At first, I thought, let’s just convert the image to grayscale. Well, then this happens:

What number is that? I have no idea. As you see, color matters. We cannot just ignore it. Our goal is this:

So, after hours of experimentation, here is what processing we will need to do:

Increase the contrast
Apply median and Gaussian blurring
Apply K-means color clustering
Convert to grayscale
Apply thresholding (this one will be tricky)
More blurring and thresholding
Morphology open, close, erosion
Skeletonizing
Dilation

Wow, that is a lot. Let’s get started. First, contrast. The way contrast works. To be honest, I copied a function online that takes in an image and applies customized brightness and constrast transformations. I put this in a file ContrastBrightness.py and made it a class:

I am did not look too deeply into how this works, but on a higher level, increasing brightness adds values to the RGB channels of an image while increasing contrast multiplies the values by some constant. We will only use the contrast feature.

Another complex part of our algorithm is clustering. Again, I made a file Clusterer.py and put the necessary code into it that I got online:

This code will take an image and a number as an input. That number will determine how many color clusters we will use. Now, let’s make our last file main.py. We will start with imports:

Notice that we are importing our two classes that we just created. Now, please download the images in the charts directory from my Github .

These have all been sorted (with the help of people without color blindness) into appropriate folders.

Now, we will loop through all the images in our path and apply transformations 1–4. I commented my code pretty extensively.

Here is what some of those images look like:

That is an obvious improvement. The digits are all clear. Two problems remain:

They do not look very hand-written and are too thick
They are not fully white on a black background

So, we will threshold. However, due to the various coloring of the images, each needs a different threshold to work. We will automated the search for the perfect threshold. I noticed that a digit typically takes up 10–28% of the total image based on the number of pixels.

Thus, we will threshold until we reach that percent white. First, we will define a function that tells us what percent of an input image is white:

Now, we will start at a threshold of 0 and works up to 255 in increments of 10 until we are in the 0.1–0.28 zone (this code goes in our for loop):

Awesome! The finish is in sight. Now we get images like this (if we use the threshold we found):

Most images look pretty good! However, some are problematic. It turns out that some digits are darker than the background. Thus, thresholding makes them black, not white! The 0.1–0.28 zone is never reached.

We can check if our threshold was successful by the value of the variable. If the value of the variable is 260, it means that the while loop ended without finding a perfect threshold. For those images, we will have a separate procedure.

Essentially, we will

Invert the images to make the inside bright compared to background
Convert to black and white
Create a circular mask to mask out the background (that went from black to white when we inverted).

Here is the visual process:

The last step is the most difficult, so I commented it in my code. Here is our whole function for this:

We will adjust our code to use the new function and also do steps 6–7. These are all built-in OpenCV transformations, so nothing surprising here:

Let’s see what the images look like!

Awesome! Clearly recognizable. Originally, our neural network accuracy was 11% because it was choosing among the 10 classes that it can guess randomly. If we stopped here, our accuracy would be about 63%, almost 6X better than random! However, we can do a little bit more.

We will skeletonize and dilate our images. This will give us a consistent width and look more uniform overall. Let’s do it:

As a reminder, this code is the last code we put inside the big for loop. Here is what everything looks like:

Yeah, maybe a little uneven, but so is handwriting. Should be no problem for the neural network. Now, we just reshape our list, load the model, and evaluate:

Put this code below your for loop. As a recap, we load our model, reshape the data, and print the accuracy after evaluating. The code may take a while to run because it takes some time for all 54 images to be transformed.

Results

After running the code, here is what gets printed for me:

Let’s take a look! We got an overall accuracy of…. 78%! That is 7–8 times better than random and probably a lot better than a person with medium to severe colorblindness can do. This is outstanding!

If we look at our recall (ratio of correctly predicted positive observations to the all observations in actual class) for our digits, we see that we had great performance for 1–5 and 9. We had okay performance for 8, and our neural network really struggled with 6s and 7s.

This approach clearly has limitations and the transformations I listed do not work for all possible color blindness images (there is actually one image in the dataset that it does not work with after the thresholding step). Try printing all the processed 9s and you will see that the thresholding step results in a ration between 0.1 and .28 but that is because the background becomes partly white. I did not try to find a solution for this because this only affected one image.

Conclusion

I hope this tutorial has been informative on how a similar dataset can be used to make predictions on a different one. Also, I hope that this tutorial helps beginners become more comfortable with OpenCV, Tensorflow, and Python in general.

To view the complete code and download the images and model, check out my Github .

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Reading Color Blindness Charts: Deep Learning and Computer Vision

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

创意，未来的工作方式

方军 / 中信出版社 / 2016-11-20 / 58.00元

知识工作者已成过去，创意工作者才是未来工作的本质是创意纵观我们身处的世界，除了自然美景，世间或伟大或平凡的事物，几乎都是人观念革新的产物，它们多数是我们在工作过程中群体创意的产物。从工业时代到知识时代，大多数人通过掌握新知、持续学习，获得社会的认可和回报；但进入以大数据、人工智能、机器人为标志的新时代，单纯的学习已经不能满足社会对人的要求。算法和机器人正在取代人类很多重复性......一起来看看《创意，未来的工作方式》这本书的介绍吧!

码农工具