fbpixel
Text recognition with Python

Text recognition with Python

In this tutorial, we’ll look at how to recognize text from images using Python and Tesseract. Tesseract is a tool for recognizing characters, and therefore text, contained in an image (OCR, Optical Character Recognition).

Installing Tesseract

  • Under Linux

To install tesseract, enter the following commands in a terminal

sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
  • Under Windows

you can download and run the installer for your OS

Once installed, add C:\Program Files\Tesseract-OCR to your Path environment variable.

You can now run tesseract and test the result with the following command

tesseract <path_to_image> <path_to_result_file> -l <language>

ex:

tesseract test.png result -l fra

Tesseract will recognize the text contained in the test.png image and write the raw text to the result.txt file.

N.B.: Tesseract may have difficulty with punctuation and text alignment.

Text recognition with Pytesseract

You can then install the pytesseract package

pip install pytesseract

‘The beauty of using Python, and OpenCV in particular, is that you can process images and implement the tool in larger software. Here’s a list of some of the advantages:

  • text detection in video
  • Image processing and filtering for obstructed characters, for example
  • Detect text from a PDF file
  • Write the results in a Word or Excel file

In the following script, we load the image with OpenCV and draw rectangles around the detected text. Position data is obtained using the image_to_data function. The raw text can also be obtained using the image_to_string function.

from PIL import Image
import pytesseract
from pytesseract import Output
import cv2
 
source = 'test.png'
img = cv2.imread(source)
text=pytesseract.image_to_string(img)
print(text)

d = pytesseract.image_to_data(img, output_type=Output.DICT)
 
NbBox = len(d['level'])
print ("Number of boxes: {}".format(NbBox))

for i in range(NbBox):
	(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
	# display rectangle
	cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
 
cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

The script also works on document photos

Bonus: Text recognition with Python à partir d’un fichier PDF

Installing the pdf2image library

pip install pdf2image

pdf2image requires poppler to be installed

Quite simple on Linux

sudo apt-get install poppler-utils

Under Windows

  • download folder zip
  • Extract files wherever you want (C:
  • Add bin folder to Path environment variable (C:\Users\ADMIN\Documents\poppler\Library\bin)
  • test with the command pdftoppm -h

Script to retrieve text from a PDF

from pdf2image import convert_from_path, convert_from_bytes
from PIL import Image
import pytesseract
from pytesseract import Output

images = convert_from_path('invoice.pdf')

# get text
print("Number of pages: {}".format(len(images)))
for i,img in enumerate(images):
    print ("Page N°{}\n".format(i+1))
    print(pytesseract.image_to_string(img))

Script to display rectangles on a PDF

from pdf2image import convert_from_path, convert_from_bytes
from PIL import Image
import pytesseract
from pytesseract import Output
import cv2
import numpy

images = convert_from_path('invoice.pdf')
for i,source in enumerate(images):
	print ("Page N°{}\n".format(i+1))
	
	#convert PIL to opencv
	pil_image = source.convert('RGB') 
	open_cv_image = numpy.array(pil_image) 
	# Convert RGB to BGR 
	img = open_cv_image[:, :, ::-1].copy() 
	#img = cv2.imread(source)

	d = pytesseract.image_to_data(img, output_type=Output.DICT)
	 
	NbBox = len(d['level'])
	print ("Number of boxes: {}".format(NbBox))

	for j in range(NbBox):
		(x, y, w, h) = (d['left'][j], d['top'][j], d['width'][j], d['height'][j])
		# display rectangle
		cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
	 
	cv2.imshow('img', img)
	cv2.waitKey(0)
	cv2.destroyAllWindows()

Applications

  • Reading scanned documents
  • Real-time text recognition from video

Sources

Displaying an OpenCV Image in a PyQt interface

Displaying an OpenCV Image in a PyQt interface

For certain applications, you may find it useful to embed OpenCV in a PyQt interface. In this tutorial, we’ll look at how to correctly integrate and manage a video captured by OpenCV in a PyQt application.

N.B.: We use Pyside, but conversion to PyQt is quite straightforward.

Prerequisites:

  • Installing Python
  • OpenCV installation (pip install opencv-python)
  • PySide or PyQt (pip install pyside6 or pip install PyQt5)

Code to capture a video with OpenCV

Here’s the basic code for displaying a webcam video with openCV

import sys
import cv2

def main(args):

	cap = cv2.VideoCapture(0) #default camera

	while(True):
		ret, frame = cap.read()
		if ret:
			frame=cv2.resize(frame, (800, 600)) 
			cv2.imshow("Video",frame)
			
		if cv2.waitKey(1) &amp; 0xFF == ord('q'): #click q to stop capturing
			break

	cap.release()
	cv2.destroyAllWindows()
	return 0

if __name__ == '__main__':
    
    sys.exit(main(sys.argv))
  • The run function is the function containing the openCV code that will loop when the QThread.start function is called.
  • The stop function is used to cleanly stop the thread
  • The changePixmap signal lets the application know that a new image is available.
class Thread(QThread):
	changePixmap = pyqtSignal(QImage)

	def run(self):
		self.isRunning=True
		cap = cv2.VideoCapture(0)
		while self.isRunning:
			ret, frame = cap.read()
			if ret:
				rgbImage = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
				h, w, ch = rgbImage.shape
				bytesPerLine = ch * w
				convertToQtFormat = QImage(rgbImage.data, w, h, bytesPerLine, QImage.Format_RGB888)
				p = convertToQtFormat.scaled(640, 480, Qt.KeepAspectRatio)
				self.changePixmap.emit(p)
				
	def stop(self):
		self.isRunning=False
		self.quit()
		self.terminate()

Creating the PyQt application

For the application, we’ll create a QLabel in a simple QWidget that will contain the video image and instantiate the QThread. The video will be updated automatically using the setImage function, which is called when the changePixmap signal is received.

  •  function setImage
	@pyqtSlot(QImage)
	def setImage(self, image):
		#update image	
		self.label.setPixmap(QPixmap.fromImage(image))
  •  signal changePixmap
		self.th.changePixmap.connect(self.setImage)
class VideoContainer(QWidget):
	def __init__(self):
		super().__init__()
		self.title = 'PySide Video'
		self.left = 100
		self.top = 100
		self.fwidth = 640
		self.fheight = 480
		self.initUI()

	@pyqtSlot(QImage)
	def setImage(self, image):
		#update image	
		self.label.setPixmap(QPixmap.fromImage(image)) 
	
	def initUI(self):
		self.setWindowTitle(self.title)
		self.setGeometry(self.left, self.top, self.fwidth, self.fheight)
		self.resize(1200, 800)
		
		# create a label
		self.label = QLabel(self)		
		self.label.resize(640, 480)
		self.th = Thread(self)
		self.th.changePixmap.connect(self.setImage)
		self.th.start()
		self.show()

Complete code for displaying a video in a PyQt window

import cv2
import sys
#from PyQt5.QtWidgets import  QWidget, QLabel, QApplication
#from PyQt5.QtCore import QThread, Qt, pyqtSignal, pyqtSlot
#from PyQt5.QtGui import QImage, QPixmap

from PySide6.QtWidgets import  QWidget, QLabel, QApplication
from PySide6.QtCore import QThread, Qt, Signal, Slot
from PySide6.QtGui import QImage, QPixmap
pyqtSignal = Signal
pyqtSlot = Slot

class Thread(QThread):
	changePixmap = pyqtSignal(QImage)

	def run(self):
		self.isRunning=True
		cap = cv2.VideoCapture(0)
		while self.isRunning:
			ret, frame = cap.read()
			if ret:
				# https://stackoverflow.com/a/55468544/6622587
				rgbImage = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
				h, w, ch = rgbImage.shape
				bytesPerLine = ch * w
				convertToQtFormat = QImage(rgbImage.data, w, h, bytesPerLine, QImage.Format_RGB888)
				p = convertToQtFormat.scaled(640, 480, Qt.KeepAspectRatio)
				self.changePixmap.emit(p)
				
	def stop(self):
		self.isRunning=False
		self.quit()
		self.terminate()

class VideoContainer(QWidget):
	def __init__(self):
		super().__init__()
		self.title = 'Video'
		self.left = 100
		self.top = 100
		self.fwidth = 640
		self.fheight = 480
		self.initUI()

	@pyqtSlot(QImage)
	def setImage(self, image):
		#update image	
		self.label.setPixmap(QPixmap.fromImage(image)) 
	
	def initUI(self):
		self.setWindowTitle(self.title)
		self.setGeometry(self.left, self.top, self.fwidth, self.fheight)
		self.resize(1200, 800)
		
		# create a label
		self.label = QLabel(self)		
		self.label.resize(640, 480)
		self.th = Thread(self)
		self.th.changePixmap.connect(self.setImage)
		self.th.start()
		self.show()

if __name__ == '__main__':
	
		app = QApplication(sys.argv)
		ex = VideoContainer()
		sys.exit(app.exec())

A “Video” window appears, containing the image from the webcam.

Bonus: improved closing interface

The code works well and may be sufficient, but there are a few problems with this implementation:

  • Application cannot be closed with Ctrl+C (KeyboardInterrupt)
  • When you close the window, Qthread does not stop
  • If you resize the window, the video size does not change.

In order to close the application with Ctrl+C, you can use the interrupt signal. To do this, simply add the following code before calling the application (cleaner methods exist)

import signal #close signal with Ctrl+C
signal.signal(signal.SIGINT, signal.SIG_DFL)

To terminate the QThread when the window is closed, you can use the application’s aboutToQuit signal to call the QThread’s stop function

app.aboutToQuit.connect(ex.th.stop) #stop qthread when closing window

Finally, to resize the video with the window at each refresh, we use the size of the window to calculate the size of the image and the position of the label so that it’s centered and the video retains its proportions.

	@pyqtSlot(QImage)
	def setImage(self, image):
		#resize image with window and center
		imWidth=self.width()-2*self.padding
		imHeight=self.height()-2*self.padding
		image = image.scaled(imWidth, imHeight, Qt.KeepAspectRatio) # remove Qt.KeepAspectRatio if not needed
		self.label.resize(image.width(), image.height()) #(640, 480)
		self.label.move((self.width()-image.width())/2, (self.height()-image.height())/2)
			
		#update image	
		self.label.setPixmap(QPixmap.fromImage(image)) 

Here is the complete code with enhancement

import cv2
import sys
#from PyQt5.QtWidgets import  QWidget, QLabel, QApplication
#from PyQt5.QtCore import QThread, Qt, pyqtSignal, pyqtSlot
#from PyQt5.QtGui import QImage, QPixmap

from PySide6.QtWidgets import  QWidget, QLabel, QApplication
from PySide6.QtCore import QThread, Qt, Signal, Slot
from PySide6.QtGui import QImage, QPixmap
pyqtSignal = Signal
pyqtSlot = Slot

class Thread(QThread):
	changePixmap = pyqtSignal(QImage)

	def run(self):
		self.isRunning=True
		cap = cv2.VideoCapture(0)
		while self.isRunning:
			ret, frame = cap.read()
			if ret:
				# https://stackoverflow.com/a/55468544/6622587
				rgbImage = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
				h, w, ch = rgbImage.shape
				bytesPerLine = ch * w
				convertToQtFormat = QImage(rgbImage.data, w, h, bytesPerLine, QImage.Format_RGB888)
				p = convertToQtFormat.scaled(640, 480, Qt.KeepAspectRatio)
				self.changePixmap.emit(p)
				
	def stop(self):
		self.isRunning=False
		self.quit()
		self.terminate()

class VideoContainer(QWidget):
	def __init__(self):
		super().__init__()
		self.title = 'PySide Video'
		self.left = 100
		self.top = 100
		self.fwidth = 640
		self.fheight = 480
		self.padding = 10
		self.initUI()

	@pyqtSlot(QImage)
	def setImage(self, image):
		#resize image with window and center
		imWidth=self.width()-2*self.padding
		imHeight=self.height()-2*self.padding
		image = image.scaled(imWidth, imHeight, Qt.KeepAspectRatio) # remove Qt.KeepAspectRatio if not needed
		self.label.resize(image.width(), image.height()) #(640, 480)
		self.label.move((self.width()-image.width())/2, (self.height()-image.height())/2)
			
		#update image	
		self.label.setPixmap(QPixmap.fromImage(image)) 
		
	def initUI(self):
		self.setWindowTitle(self.title)
		self.setGeometry(self.left, self.top, self.fwidth, self.fheight)
		self.resize(1200, 800)
		
		# create a label
		self.label = QLabel(self)		
		self.label.resize(self.width()-2*self.padding,self.height()-2*self.padding) #(640, 480)
		self.th = Thread(self)
		self.th.changePixmap.connect(self.setImage)
		self.th.start()
		self.show()

import signal #close signal with Ctrl+C
signal.signal(signal.SIGINT, signal.SIG_DFL)

if __name__ == '__main__':
	
		app = QApplication(sys.argv)
		ex = VideoContainer()
		app.aboutToQuit.connect(ex.th.stop) #stop qthread when closing window
		
		sys.exit(app.exec())

Sources

Shape and color recognition with Python

Shape and color recognition with Python

,

The OpenCV library is used for image processing, in particular shape and color recognition. The library has acquisition functions and image processing algorithms that make image recognition fairly straightforward, without the need for artificial intelligence. This is what we’ll be looking at in this tutorial.

This tutorial can be applied to any computer with a Python installation with OpenCV and a Camera. In particular, the Raspberry Pi.

Hardware

  • Computer with a python3 installation

Preparing the working environment

To implement the shape recognition script, we install the OpenCV, numpy and imutils modules to manipulate and process the images.

pip3 install opencv-python numpy imutils

Color detection using webcolors and scipy modules (KDTree)

pip3 install webcolors scipy

To test and validate the algorithm, we create an image containing objects of different shapes and colors. You can create your own image with Paint, or use this one:

Operating principle

In the following code, we’ll create a shape detection class that will allow us to select a shape based on the number of contours found. Then we’ll define a function to retrieve the name of the color based on its RGB code. Finally, we’ll use OpenCV to load, filter and mask the image in order to detect the shapes and colors contained in the image.

Complete code for simple shape and color recognition

You need to create the ObjectDetection.py python file in the same folder as the image you wish to analyze.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# import the necessary packages
import cv2
import numpy as np
import imutils

#colors
from webcolors import rgb_to_name,CSS3_HEX_TO_NAMES,hex_to_rgb #python3 -m pip install webcolors
from scipy.spatial import KDTree

def convert_rgb_to_names(rgb_tuple):
    # a dictionary of all the hex and their respective names in css3
    css3_db = CSS3_HEX_TO_NAMES#css3_hex_to_names
    names = []
    rgb_values = []    
    for color_hex, color_name in css3_db.items():
        names.append(color_name)
        rgb_values.append(hex_to_rgb(color_hex))
    
    kdt_db = KDTree(rgb_values)    
    distance, index = kdt_db.query(rgb_tuple)
    return names[index]
    
class ShapeDetector:
	def __init__(self):
		pass
	def detect(self, c):
		# initialize the shape name and approximate the contour
		shape = "unidentified"
		peri = cv2.arcLength(c, True)
		approx = cv2.approxPolyDP(c, 0.04 * peri, True)

		# if the shape is a triangle, it will have 3 vertices
		if len(approx) == 3:
			shape = "triangle"
		# if the shape has 4 vertices, it is either a square or
		# a rectangle
		elif len(approx) == 4:
			# compute the bounding box of the contour and use the
			# bounding box to compute the aspect ratio
			(x, y, w, h) = cv2.boundingRect(approx)
			ar = w / float(h)
			# a square will have an aspect ratio that is approximately
			# equal to one, otherwise, the shape is a rectangle
			shape = "square" if ar &gt;= 0.95 and ar &lt;= 1.05 else "rectangle"
		# if the shape is a pentagon, it will have 5 vertices
		elif len(approx) == 5:
			shape = "pentagon"
		elif len(approx) == 6:
			shape = "hexagon"
		elif len(approx) == 10 or len(approx) == 12:
			shape = "star"
		# otherwise, we assume the shape is a circle
		else:
			shape = "circle"
		# return the name of the shape
		return shape	

if __name__ == '__main__':
	# load the image and resize it to a smaller factor so that
	# the shapes can be approximated better
	image = cv2.imread('python_shapes_detection_base.PNG')
	resized = imutils.resize(image, width=300)
	ratio = image.shape[0] / float(resized.shape[0])
	
	# convert the resized image to grayscale, blur it slightly,
	# and threshold it
	gray = cv2.cvtColor(resized, cv2.COLOR_BGR2GRAY)
	blurred = cv2.GaussianBlur(gray, (5, 5), 0)
	thresh = cv2.threshold(blurred, 60, 255, cv2.THRESH_BINARY)[1]
	
	# find contours in the thresholded image and initialize the
	# shape detector
	cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
		cv2.CHAIN_APPROX_SIMPLE)
	cnts = imutils.grab_contours(cnts)
	sd = ShapeDetector()

	# loop over the contours
	for c in cnts:
		# compute the center of the contour
		M = cv2.moments(c)
		cX = int((M["m10"] / M["m00"]) * ratio)
		cY = int((M["m01"] / M["m00"]) * ratio)  
		
		#detect shape from contour
		shape = sd.detect(c)
		
		# resize the contour
		c = c.astype("float")
		c *= ratio
		c = c.astype("int")
		cv2.drawContours(image, [c], -1, (0, 255, 0), 2)
		
		#draw contour with mask
		mask = np.zeros(image.shape[:2], np.uint8)
		cv2.drawContours(mask, [c], -1, 255, -1)
		img = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
		
		#Convert to RGB and get color name
		imgRGB = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
		mean=cv2.mean(imgRGB, mask=mask)[:3]
		named_color = convert_rgb_to_names(mean)
		
		#get complementary color for text
		mean2 = (255-mean[0],255-mean[1],255-mean[2])
		
		#display shape name and color
		objLbl=shape+" {}".format(named_color)
		textSize = cv2.getTextSize(objLbl,cv2.FONT_HERSHEY_SIMPLEX,0.5,2)[0]
		cv2.putText(image, objLbl, (int(cX-textSize[0]/2),int(cY+textSize[1]/2)), cv2.FONT_HERSHEY_SIMPLEX,0.5,mean2, 2)
		
		#show image
		cv2.imshow("Image", image)
		#cv2.waitKey(0)
	cv2.waitKey(0)		

Results

To launch the script, you can either run it from your code editor (such as Geany) or issue the following command in a command terminal opened in your working folder.

python3 ObjectDetection.py

Once the code has been executed, the image will be displayed with each of the shapes surrounded by green and text in the center of the shape containing the name and color of the shape.

N.B.: This algorithm won’t work for all shapes. To detect other shapes, you must either adapt the detect function of the ShapeDetector class to identify all possible cases, or use artificial intelligence.

Applications

Sources

Create your image bank with Python

Create your image bank with Python

, ,

To train a neural network in object detection and recognition, you need an image bank to work with. We’ll see how to download a large number of images from Google using Python. To train a neural network, you need a large amount of data. The more data, the better the training. In our case, we want to train a neural network to recognize a particular object. To do this, we create a Python script that downloads files from the Internet and places them in a folder.

Configuring Python3

Download Selenium and OpenCV libraries (optional)

python3 -m pip install selenium
python3 -m pip install opencv-python

download file geckodriver

N.B.: We use the OpenCV library only to check that OpenCV can open and use the images, so as not to clutter up the folder unnecessarily.

Image download Python script

This script launches a search on Google Image and saves the images found in the folder specified for the image bank.

N.B.: Don’t forget to specify the path to the geckodriver GECKOPATH file, the path to the destination folder and the keywords for Google search.

import sys
import os
import time 

#Imports Packages
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException,WebDriverException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

import cv2

########################################################################
GECKOPATH = "PATH-TO-GECKODRIVER"
parent_dir = "PATH-TO-FOLDER"
search='coffee mug'
########################################################################

# path 
folderName=search.replace(" ","_")
directory = os.path.join(parent_dir, folderName,'img') 
   
# Create the directory 
try: 
	if not os.path.exists(directory):
		os.makedirs(directory) #os.mkdir(directory)
except OSError as error: 
	print("ERROR : {}".format(error)) 


sys.path.append(GECKOPATH)  
#Opens up web driver and goes to Google Images
browser = webdriver.Firefox()#Firefox(firefox_binary=binary)

#load google image
browser.get('https://www.google.ca/imghp?hl=en')

delay = 10 # seconds
try:
	btnId="L2AGLb"
	myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID , btnId))) #id info-address-place-wrapper 
	elm=browser.find_element_by_id(btnId)
	elm.click()
	print("Popup is passed!")
except TimeoutException as e:
	print("Loading took too much time!")



# get and fill search bar
box = browser.find_element_by_xpath('//*[@id="sbtc"]/div/div[2]/input')
box.send_keys(search)
box.send_keys(Keys.ENTER)


#Will keep scrolling down the webpage until it cannot scroll no more
last_height = browser.execute_script('return document.body.scrollHeight')
while True:
	browser.execute_script('window.scrollTo(0,document.body.scrollHeight)')
	time.sleep(5)
	new_height = browser.execute_script('return document.body.scrollHeight')
	try:
		browser.find_element_by_xpath('//*[@id="islmp"]/div/div/div/div/div[5]/input').click()
		time.sleep(5)
	except:
		print("button not found")
		pass
		
	if new_height == last_height:
		break
	last_height = new_height
	
imgList=[]
for i in range(1, 1000):
	try:
		browser.find_element_by_xpath('//*[@id="islrg"]/div[1]/div['+str(i)+']/a[1]/div[1]/img').screenshot(directory+'\{}.png'.format(i))
		imgList.add(directory+'\{}.png'.format(i))
		
	except:
		pass
browser.quit()
 
#Test images with OpenCV
for img in imgList:
	try:   
		cv2.imread(img)
	except Exception as e:
		os.remove(img)
		print("remove {}".format(img))

BONUS: Managing a popup

In the code I’ve added a command to manage the popup that appears when the web page is opened. It will wait for the button with the correct identifier to be loaded before pressing it.

delay = 10 # seconds
try:
	btnId="L2AGLb"
	myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID , btnId))) #id info-address-place-wrapper 
	elm=browser.find_element_by_id(btnId)
	elm.click()
	print("Popup is passed!")
except TimeoutException as e:
	print("Loading took too much time!")

 

You now have an image bank that you can use for visual recognition, for example, or for image processing.

Applications

  • Develop image processing algorithms
  • Training neural networks for object detection and recognition

Sources

Line detection with Python and OpenCV

Line detection with Python and OpenCV

An interesting application in robotics is pattern recognition. In this tutorial we are going to use the OpenCV library in a Python code that will allow us to detect a cable at its centre. In order to achieve this line tracking we will perform image processing with OpenCV.

This will then allow you to make your regulation to always keep the centre of the line in the middle of the camera and thus follow the trajectory! We have decided to do this tutorial on the Raspberry Pi as the purpose of this tutorial is to perform image processing for a robot using the Pi Camera.

Material

Configuration

  • Install OpenCV on Raspberry Pi
  • A photo of a cable or black line (by default download the photo below to work on the same example, it was taken with the Pi Camera)

Code

To start with, so that you can reuse the detection of a line or a cable on a video (succession of images), we will implement a Class. This class will take as a parameter the path of the image. By default, if the image is in the same folder as the code, it will suffice to enter the name of the image, for example: “cam.jpg”).

Then save the python code below in a file named: suivi_ligne.py

# -*- coding: utf-8 -*-
"""
@author: AranaCorp
"""
import cv2
import time
import numpy as np
import matplotlib.pyplot as plt


class LineTracking():
	"""
	Classe permettant le traitement d'image, la délimitation d'un contour et permet de trouver le centre de la
	forme detectée
	"""
	def __init__(self,img_file):
		"""The constructor."""
		self.img = cv2.imread(img_file)
		self.img_inter = self.img
		self.img_final = self.img
		self.cendroids = []
		self.mean_centroids = [0,0]

	def processing(self):
		"""Méthode permettant le traitement d'image"""
		#self.img=cv2.resize(self.img,(int(self.img.shape[1]*0.2),int(self.img.shape[0]*0.2))) #redimensionner l'image d'origine
		print(self.img.shape)
		#self.img = self.img[199:391, 149:505] #on recentre l'image en excluant les zones extérieures afin d'avoir une plus grande précision pour la suite
		gray = cv2.cvtColor(self.img, cv2.COLOR_BGR2GRAY) #on passe l'image en nuances de gris
		blur = cv2.GaussianBlur(gray,(5,5),0) #on floute l'image
		ret,thresh = cv2.threshold(blur,60,255,cv2.THRESH_BINARY_INV) #on binarise l'image

		self.img_inter=thresh
		"""Une ouverture permet d'enlever tous les élements qui sont plus petits que l'élement structurant (ou motif)
		Une fermeture permet de "combler" les trous qui ont une taille inférieur à l'élement structurant """
		kernel_open = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,5)) #on créé l'élement structurant de l'ouverture
		kernel_close = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(10,10)) #on créé l'élement structurant de la fermeture

		thresh = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel_open) #on fait une ouverture suivant un motif
		thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel_close) #on fait une fermeturesuivant un motif

		connectivity = 8

		output = cv2.connectedComponentsWithStats(thresh, connectivity, cv2.CV_32S) #permet de délimiter une forme
		num_labels = output[0]
		labels = output[1]
		stats = output[2]
		self.centroids = output[3] #donne les centres de la ou des formes de l'image

		for c in self.centroids :
			"""Permet de faire la moyenne des centres de la forme, en effet sur l'image test,
			   il y a deux centres qui sont très proches et la moyenne de deux convient.
			   On pourra imaginer que dans un cas général on modifie cela
			"""
			self.mean_centroids[0] += c[0]/len(self.centroids)
			self.mean_centroids[1] += c[1]/len(self.centroids)

		self.img_final = cv2.cvtColor(thresh, cv2.COLOR_GRAY2BGR)

		#permet de rajouter un carré rouge à l'endroit du centre de la forme
		#self.img_final[int(self.mean_centroids[1])-10 : int(self.mean_centroids[1])+20, int(self.mean_centroids[0])-10 : int(self.mean_centroids[0])+20] = [0,0,255]
		for c in self.centroids :
			self.img_final[int(c[1])-5 : int(c[1])+10, int(c[0])-5 : int(c[0])+10] = [0,255,0]

Finally, create a new python script, for example: test_tracking.py

if __name__ == '__main__' :
	test = LineTracking('cam.png') #créer un objet LineTracking qui est la Classe créée au dessus .png ou .jpg
	test.processing() #lance le traitement d'image
	while True :
		cv2.imshow('image',test.img) #affiche l'image original après redimensionnement
		#cv2.imshow('process',test.img_inter ) #affiche l'image après traitement
		cv2.imshow('cable',test.img_final) #affiche l'image après traitement
		key= cv2.waitKey(1);
		if  key == ord(' '): #pour fermer les fenêtres appuyer sur la barre 'espace'
			break
	cv2.destroyAllWindows()

You now have all the codes to test your image processing. Issue the command :

python3 test_tracking.py

Result

Two windows open with the original image and the processed image. You can see that a green square marks the position of the cable. This point can be used to direct a robot or a mobile camera.

To stop the display press the space bar.

To conclude, the processing has just been done on an image. You can now implement it in a processing loop for a video.

Applications

You can now use the LineTracking class in your main file which opens the Raspberry camera. For more information on how to install a PiCam on a Raspberry Pi you can follow our tutorial.

Sources