How To Label Where’s Wally/Waldo images using OpenCV and Deep Learning (YOLO v2)

A hand on guide to learn how to use a object detection network

Arthur Fortes
7 min readFeb 14, 2020
Where’s Wally puzzle example.

In this article, our goal is to recognize the famous character Wally or Waldo (yes, he has a few names around the world) in an image. Although there are already some consistent solutions, I decided to implement this challenge using an architecture called YOLO (You Only Look Once), a real-time object detection system that I am currently studying.

In this purpose, I used an Anaconda environment (Python 3.7) with the following Python libraries:

The experiments were performed on a machine with the following hardware: Ryzen 5 3400g (4 CPU), 16 GB memory, NVIDIA GeForce GTX 1060, Ubuntu 18.04)

There are 3 steps to achieve our goal: First we need to build a dataset, then we need to perform YOLO model and finally we create an output image highlighting Wally.

Step1 : Wally Dataset

Our dataset was composed by images that we found online, Keagle and data augmentation (Vertical Flip, Brightness modification) using Keras (ImageDataGenerator). As a second task, we label the images obtained, highlighting the location of Wally in each one. LabelImg was using for this propose, since it is a great tool witch allow annotate images in Pascal VOC format (generate xml files).

Images from our dataset with annotation in LabelImg

As result of this task, we built a dataset with 350 images.

Step2 : Perform Deep Learning Architecture

In order to detect Wally we used Yolo algorithm, an deep learning object detection architecture based on convolution neural networks.

Yolo is a single network trained end to end to perform a regression task predicting both object bounding box and object class proposed by Joseph Redmon , Ali Farhadi, Ross Girshick and Santosh Divvala in 2015.

The paper is available in this link.

I have used open source implementation, “Darkflow”, so you don’t need to worry about the detail.

Installing Darkflow

I am currently updating the original version of Darkflow, updating dependencies and creating new features. In this way. I strongly recommend, that you use my github repository to follow this tutorial. In the repository, you will find the complete installation guide and other important instructions. I am just going to leave the instruction as simple as possible here (using conda and pip).

conda install Cython
git clone https://github.com/arthurfortes/darkflow.git
cd darkflow
python setup.py build_ext - inplace
pip install .

Downloading Weights

To train Yolo model it is recommended that you use the weights and the configurations already trained from the deep learning network. There are two ways of downloading these files. First of all, you can download it from the official YOLO project webpage. Second, you can download it from my wheigts repository.

Building and Training the Model

In order to organize our project, we need to follow this repository tree:

Wally_Project
|- built_graph
|- cfg
|---- yolov2-wally.cfg
|---- yolov2-wally.cfg
|- ckpt
|- labels.txt

The files downloaded in the last section must be pasted in the cfg path and the other repositories may be empty. The labels.txt file must contain in each line the labels that our model will provide. In our case, our file contains only one line: “wally”.

As you can see in the darkflow repository, it is quiet simple to build the model. First, you need to create a train_model.py file and then define options object. Finally, you need to instantiate TFNet class object with the options.

As you can see below, it is quiet simple to build the model. First, you need to define options object. Then, you need to instantiate TFNet class object with the options.

from darkflow.net.build import TFNet
import cv2
options = {
"model": "cfg/yolov2-wally.cfg",
"load": "cfg/yolov2-wally.weights",
"batch": 8,
"epoch": 1000,
# comment the line bellow if you don't have GPU
"gpu": 0.8,
"train": True,
"annotation": "data/total/annotations/",
"dataset": "data/total/images/",
"load": -1
}

The options is a specification of the model and its environment. These and other specifications are described below.

'imgdir': path to testing directory with images        
'binary': path to .weights directory
'config': path to .cfg directory
'dataset': path to dataset directory
'labels': path to labels file
'backup': path to backup folder
'summary': path to TensorBoard summaries directory
'annotation': path to annotation directory
'threshold': detection threshold
'model': configuration of choice
'trainer': training algorithm
'momentum': applicable for rmsprop and momentum optimizers 'verbalise': say out loud while building graph
'train': train the whole net
'load': how to initialize the net? Either from .weights or a checkpoint, or even from scratc
'savepb': save net and weight to a .pb file
'gpu': how much gpu (from 0.0 to 1.0)
'gpuName': GPU device name
'lr': learning rate
'keep': Number of most recent training results to save
'batch': batch size
'epoch': number of epoch
'save': save checkpoint every ? training examples
'demo': demo on webcam
'queue': process demo in batch
'json': Outputs bounding box information in json format 'saveVideo': Records video from input video or camera
'pbLoad': path to .pb protobuf file (metaLoad must also be specified
'metaLoad': path to .meta file generated during --savepb that corresponds to .pb file

As I suggested to download yolo.weights from here, I specified it. If you have your own pre-trained weight files, this is where you let the model knows (like after you trained custom objects, your system will produce the specific weight file).

That done, just run the script with this command:

python train_model.py

and wait a few minutes … hours … days …

Good time to have a coffee and see how the day is outside.

Important Trick: I had a lot of difficulty when I started using Yolo. First because the models never performed as desired, second because generally the predictions were meaningless (sometimes they didn’t even appear). After a while, I discovered that there is a trick for this model to work: training the first time with few examples (3 or 5 images) for many times (I usually use 1000). That done, we trained yolo algorithm with all dataset at once.

Step3 : It’s time to find Wally!

In this step, we need to create a script to find Wally, using our trained weights.

options = {"pbLoad": "yolov2-wally.pb",
"metaLoad": "yolov2-wally.meta"}
yolo_sensor = TFNet(options)

To highlight Wally in the image, I developed a function using OpenCV to draw a rectangle around the object, as seen below.

def highlight_wally(img, predictions, padding=20):
predictions.sort(key=lambda x: x.get('confidence'))

for pred in predictions:
xtop = pred.get('topleft').get('x')
ytop = pred.get('topleft').get('y')
xbottom = pred.get('bottomright').get('x')
ybottom = pred.get('bottomright').get('y')
cv2.rectangle(img, (xtop-padding, ytop-padding), (xbottom+padding, ybottom+padding), (0, 0, 0), 3)
font_scale = 2
font = cv2.FONT_HERSHEY_PLAIN
rectangle_bgr = (0, 0, 0)
text = "Wally {}%".format(round(float(pred.get('confidence')), 2)*100)
# get the width and height of the text box
(text_width, text_height) = cv2.getTextSize(text, font, fontScale=font_scale, thickness=1)[0]
# set the text start position
text_offset_x = xtop
text_offset_y = ytop - 30
# make the coords of the box with a small padding of two pixels
box_coords = ((text_offset_x, text_offset_y), (text_offset_x + text_width + 2, text_offset_y - text_height - 2))
cv2.rectangle(img, box_coords[0], box_coords[1], rectangle_bgr, cv2.FILLED)
cv2.putText(img, text, (text_offset_x, text_offset_y), font, fontScale=font_scale, color=(255, 255, 255), thickness=2)

return img

Finally, I built a prediction function using Yolo v2 model and the OpenCV function:

def yolo_predict(img, output_path):
imgcv = cv2.imread(img)
result = yolo_sensor.return_predict(imgcv)
labeled_img = highlight_wally(imgcv, result)
cv2.imwrite('{}_labaled.png'.format(img[:img.rfind('.')]), labeled_img)

And the result was:

Wally was found with 72% certainty.
For this experiment we built a dataset with 350 images.

The trained weights are in my Google Drive.

You can find the whole code in my GitHub.

Final Remark

In this article, we created a model to find Wally in images using YOLO architecture. YOLO is a relatively simple way to generate object detectors with a good accuracy rate and quickly (if you are using GPU).

It is difficult to create a robust solution that works in all Wally’s puzzle scenarios. We could have improve our YOLO model in charge of that but it would need a lot more data and it’s very painful to annotated these images.

In the next steps, I will try realize this experiment using YOLO v3 model. I hope this article has been useful for you and until next time.

Useful links

--

--

Arthur Fortes

Data scientist and Python developer with experience in research and industrial projects. Innovation Enthusiast.