Tags: , ,

To prepare an image database for training a neural network in object recognition, you need to recognize the images in the database yourself. This means giving them a label and a recognition zone.

This tutorial follows on from the article Creating an image bank.

Preparation objective

The aim is to create datasets that will facilitate training with TensorFlow, Yolo or Keras tools.

There are two ways to do this:

  • Use labelImg
  • Create a folder architecture and use a script (training with Keras only)

Preparing an image bank with labelImg

You can download and install labelImg

  • Linux
python3 -m pip install labelImg
  • Windows

Follow the build instructions on github. You can also find an executable labelImg.exe

Add a box and a label

Launch labelImg and select the folder using the “Open Dir” button.

For each image, surround the object to be detected and assign it a label using the “Create RectBox” button.

N.B.: avoid protruding beyond the image when drawing the box. This can cause problems during training.

Convert to PascalVOC format

Convert to YOLO format

N.B.: You can save both formats in succession, or save in VOC and convert to YOLO using the convert_voc_to_yolo.py script.

Preparing an image bank with folder architecture

The idea is to place the images in sub-folders named after the class. For training, the image bank should contain between 1 and 3 folders: train, test, validation (the test and validation folders are optional, as they can be created from the first folder).

N.B.: this method requires only one object per image.

  • images
    • train
      • cats
      • dogs
    • validation
      • cats
      • dogs

To create files containing name and detection zone info from image folders, you can use the generate_voc_files.py script:

  • access paths to the various folders (folders[‘train’

Class names will be defined by folder names, and the detection area by image size.


import glob
import os
import pickle
import cv2
import xml.etree.ElementTree as ET
import xml.dom.minidom
from os import listdir, getcwd
from os.path import join

dirs = ['train', 'test']
classes = ['mug']

def getImagesInDir(dir_path):
	image_list = []
	for filename in glob.glob(dir_path + '/**/*.png', recursive=True):
	return image_list

def generate_voc(image_path):

	#get image data
	basename = os.path.basename(image_path)
	basename_no_ext = os.path.splitext(basename)[0]
	im = cv2.imread(image_path)

	root = ET.Element('annotation')
	folder = ET.SubElement(root, 'folder')

	filename = ET.SubElement(root, 'filename')

	path = ET.SubElement(root, 'path')

	source = ET.SubElement(root, 'source')
	database = ET.SubElement(source, 'database')
	database.text = 'Unknown'

	size = ET.SubElement(root, 'size')
	width = ET.SubElement(size, 'width')
	height = ET.SubElement(size, 'height')
	depth = ET.SubElement(size, 'depth')

	segmented = ET.SubElement(root, 'segmented')
	segmented.text = '0'

	objec = ET.SubElement(root, 'object')
	name = ET.SubElement(objec, 'name')
	pose = ET.SubElement(objec, 'pose')
	truncated = ET.SubElement(objec, 'truncated')
	difficult = ET.SubElement(objec, 'difficult')

	bndbox = ET.SubElement(objec, 'bndbox')
	xmin = ET.SubElement(bndbox, 'xmin')
	ymin = ET.SubElement(bndbox, 'ymin')
	xmax = ET.SubElement(bndbox, 'xmax')
	ymax = ET.SubElement(bndbox, 'ymax')

	tree = ET.ElementTree(root)

	return outxml

def convert(size, box):
	dw = 1./(size[0])
	dh = 1./(size[1])
	x = (box[0] + box[1])/2.0 - 1
	y = (box[2] + box[3])/2.0 - 1
	w = box[1] - box[0]
	h = box[3] - box[2]
	x = x*dw
	w = w*dw
	y = y*dh
	h = h*dh
	return (x,y,w,h)

def convert_annotation(in_file):
	basename = os.path.basename(in_file)
	basename_no_ext = os.path.splitext(basename)[0]
	out_file = open(join(dirname, basename_no_ext + '.txt'), 'w')
	tree = ET.parse(in_file)
	root = tree.getroot()
	size = root.find('size')
	w = int(size.find('width').text)
	h = int(size.find('height').text)

	for obj in root.iter('object'):
		difficult = obj.find('difficult').text
		cls = obj.find('name').text
		if cls not in classes or int(difficult)==1:
		cls_id = classes.index(cls)
		xmlbox = obj.find('bndbox')
		b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
		bb = convert((w,h), b)
		out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

cwd = getcwd()

for dir_path in dirs:
	full_dir_path = join(cwd,dir_path)
	image_paths = getImagesInDir(full_dir_path)

	for image_path in image_paths:

		xml_path=generate_voc(image_path) #generate voc file
		convert_annotation(xml_path) #convert to yolo file

	print("Finished processing: " + dir_path)

This method quickly produces a database that can be used for training (TensorFlow and Yolo), even if the recognition zone is approximate.

N.B.: Once the XML and TXT files have been created, you can open the lableImg to refine the labels and the detection zone.


How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?