Monday, July 11, 2016

Self coloring books

Machine learning is eating software. Here at comma.ai we want to build the best machine learning. This makes us all work really hard and sometimes need some stress relief. Our art therapist suggested us to try adult coloring books to relax. They worked so well for us that we decided to share the love with the world and built commacoloring, comma.ai adult coloring books .

commacoloring was really well received and made it to the front page of Product Hunt. We got a lot of feedback from our users (we love users!). A feature was requested to automatically color the easy parts of the image, letting the user focus in the details. We used our self-driving car engineering skills to build a self-coloring book.

We call this new feature Suggestions. You can try right now by clicking the "suggest" button!

The engineering

Note: you can skip that section without affecting your coloring experience, but if you are familiar with deep learning jargon, please read along.

To automate the coloring process we trained a deep neural network for pixel level semantic parsing, i.e a network that will classify (color) each pixel using information of its surroundings. Given the state of the art, we knew the right approach would be a fully convolutional neural network. We started by trying an encoder-decoder like architecture with 4 convolutions down and 4 deconvolutions up [1], with one output channel per class. This was taking too long to converge though.

We later noticed that [2] claims that retraining the encoder network is not really necessary. They used a pre-trained VGG for dense classification in low resolution and bilinear interpolation followed by Conditional Random Fields for upscaling the image back to its desired size. Also [3] stated that the job of the decoder/deconvolution network is to mainly upscale and smooth the segmented output image and it can be a smaller network. Reddit brought our attention to ReSeg [4] that uses only the convolutional layers of VGG as the encoder.

Our final solution combined ideas from [3] and [4] and used fixed VGG convolutional layers as the encoder and trained a simple deconvolutional network as the decoder. Each layer of our decoder used only 16 filters of 5x5 pixels with upscaling stride of 2. We tried faster upscaling with stride 4 but the results didn't look sharp enough.

In one of our experiments we reinitilized the VGG weights to random values and were still able to learn a successful decoder. We called this architecture Extreme Segmentation Network, since it resembles Extreme Learning Machines. Unfortunately, we were aware that the acronym would compete with Echo-State Networks' and we decided to use the original VGG filters in production. Our final network is called Suggestions Network (SugNet). Some results are shown in Figure 1 and 2.


Figure 1. Input image and self colored Suggestions example.


Figure 2. Sample outputs of the segmentation network after 400 training epochs compared to human colored images.

All our method was implemented with Keras using Tensorflow backend. The VGG image preprocessing used Theano backend. At test time, using Tensorflow only the results didn't match and we doubted our engineering skills for a while before remembering that Theano implements correlation instead of convolution. Here is how to convert convolutional wieghts from Theano to Tensorflow. Keras didn't have a proper deconvolution layer, but we started working on a PR for that.

References:  
[1] Vijay Badrinarayanan, Ankur Handa and Roberto Cipolla "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling". arXiv:1505.07293   
[2] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille "Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs". arXiv:1412.7062  
[3] Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello "ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation". arXiv:1606.02147.
[4] Francesco Visin, Marco Ciccone, Adriana Romero, Kyle Kastner, Kyunghyun Cho, Yoshua Bengio, Matteo Matteucci, Aaron Courville "ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation". arXiv:1511.07053.

We hope that Suggestions will inspire you to build even more fun apps with the open source commacoloring product. Let us know about all the amazing things you build with it.

24 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. The new features were all welcome! I'd suggest that a double click on the mouse would close the polygon (it is hard to catch the first point sometimes).
    On the NN perspective, maybe you can try the U-Net (http://arxiv.org/abs/1505.04597) as well.

    ReplyDelete
  3. It would greatly help if you can do keyboard shortcuts for brushes and brightness.
    It can save few kilometers on mouse, which will not have to run over whole screen. They should be simple and accessible with keys next to each other (maybe numeric keyboard?).

    ReplyDelete
  4. Are you guys planning to open source the codes w.r.t the net?

    ReplyDelete
  5. George, please don't destroy the world with AI. Some of the stuff you're saying is really ignorant. I know you think it's cool and all, but it's going to end humanity before it ever really had a chance. Google/NSA/the public relations/advertising industries will completely own us. You should know better!

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. What could help on the infrastructure (as in civil engr infrastructure). Sim-city style: what could you add to road/streets/highways that would make your job easier or results better? Machine vision cues, conventions on striping, acoustics, anything... Even if applied to other vehicles, visible border marks.

    In a nutshell if you could influence civil projects to make them "friendly" (compatible) with your use cases and implementation methodologies - what would be on your wishlist or nice-to-have list. Callout game changers as needed (if UV striping or IR strip, or radio beacons, etc...)

    http://www.wsp-pb.com/

    ReplyDelete
  8. Great project, can't wait for self driving kits under $1,000.00. Kept me posted.

    ReplyDelete
  9. anyway to contact you? i would like to be the installer of this product.

    ReplyDelete
  10. anyway to contact you? i would like to be the installer of this product.

    ReplyDelete
  11. These lotteries are devided into three parts and from their one part which is published on 4pm called as lottery sambad 4pm result.today lotterysambad

    ReplyDelete
  12. Thereafter the vehicle for money Sydney strategy normalized by them pays the merchants a decent return against the old disposed of pony.
    Scrap Car Removal Brantford

    ReplyDelete
  13. Thanks for the detailed info on this topic. It’s very hard to find nowadays to know about the basics but you did it so much well and I love FMWhatsApp . I would love to see more about it.

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. Your post is helping me a lot. Its really nice and epic. Thanks a lot for the useful info on this topic. You did it so much well. I love to see more about. Keep sharing and updating. Also share more posts with us. Thank you. FMWhatsapp Apk

    ReplyDelete
  16. Thanks for sharing such a great article, I am going to download FMwhatsapp apk this can be logged into two on the same device APK of an account to find my friends to share this high-quality article!

    ReplyDelete
  17. Excellent Blog! I would like to thank for the efforts you have made in writing this pos Coloring Page

    ReplyDelete