Skip to the content.

Hotdog or Not-Hotdog

Author: Lau Johansson
16th September 2020
Reading time: 5 min.

The wait is over...

Now you don't have to find out by yourself whether an object is a hotdog or not. The guys from HBO's Silicon Valley have made an app where you can take a picture of an object - then, it tells you if it is a hotdog or not a hotdog.

IMAGE ALT TEXT HERE

What is the point?

This seems like a simple problem for us humans. But how simple is it in terms of machine learning? We, at Explorifydata, will try to show how deep learning can solve such a binary classification problem.

Before you close the page down or look for another more interesting post, you could relate the problem into other domains. This could be:

The problems above has a common denominator: binary classification problems. Working deep learning and images, like hotdog-images, is called image classification.

Our post will emphasize on how neural networks can perform well in classification problems. This time we do NOT strive to make a "perfect" model. The goal is just to make a model that performs better than blind-guessing.

From Silicon Valley to Explorifydata

The model

We have tried to replicate the work of Jian Yang with a deep learning approach. To be able to classify hotdogs / not-hotdogs a Convolutional Neural Network (CNN) has been implemented. It takes a hotdog image and split it into three-layer image representing respectively the blue, red and green colors of the hotdog image. The CNN works by applying small filters on the image which then decompose the image to a smaller scale. Maxpooling is also a technique for decomposing images; a 2x2 filter runs through the image - for every stride it picks out the highest pixel-value.

At last the decomposed picture is feed to a fully connected layer which represents the data as one vector. At last the network detects whether the image contains a hotdog or a not-hotdog.

The data and augmentation

To slightly improve the network performance we've used dataaugmentation. In short, dataaugmentation is a technique for increasing the size of a dataset by e.g. transformation of the existing data. In this blogpost we apply three transformations: rotation, horizontal flipping and centercropping. To imply some randomness in the transformations each of them is done with a probability of 50%.

The results

Even though we have implemented is a very simple network the model has a accuracy of almost 80%. How can we interpret this result? In general, the number of times the model guessed on hotdog and got it right!

Evaluating a model is actually not that simple. One must also take a big dive into performance evaluation tools such as confusion matrix, precision and F1-score. These terms do not sound exactly exciting as a hotdog app - so this will not be examined here. But! We can look at some of our prediction! Take a look by yourself - did the model detect the hotdogs?

As you can see, the model fails to predict some of the images which are obviously not-hotdogs - at least it is obvious for you as a human. The way a neural network "interprets" an image is not always straightforward. Who would have thought that the network will think that the dog with the funny hair (left bottom corner) was a hotdog?

Can we do better? Here we will refer to a Towards Data Science Post which explains how using a pretrained InceptionV3 model on hotdog/not-hotdog dataset gave an accuracy of 96%! That seems pretty sufficient from our point of view.

What can we learn?

The most important take home from this post is: A very simple convolutional neural network can perform with a high accuracy in binary (image) classification problems. To improve the performance of the model, the hotdog-images can be rotated, flipped and cropped. An app that predicts hotdog and not-hotdog can immediately seem silly - but by putting it into perspective with other problem domains - simple deep learning models could be a tool to solve many classification problems.