fbpixel
Text recognition with Python

Text recognition with Python

In this tutorial, we’ll look at how to recognize text from images using Python and Tesseract. Tesseract is a tool for recognizing characters, and therefore text, contained in an image (OCR, Optical Character Recognition).

Installing Tesseract

  • Under Linux

To install tesseract, enter the following commands in a terminal

sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
  • Under Windows

you can download and run the installer for your OS

Once installed, add C:\Program Files\Tesseract-OCR to your Path environment variable.

You can now run tesseract and test the result with the following command

tesseract <path_to_image> <path_to_result_file> -l <language>

ex:

tesseract test.png result -l fra

Tesseract will recognize the text contained in the test.png image and write the raw text to the result.txt file.

N.B.: Tesseract may have difficulty with punctuation and text alignment.

Text recognition with Pytesseract

You can then install the pytesseract package

pip install pytesseract

‘The beauty of using Python, and OpenCV in particular, is that you can process images and implement the tool in larger software. Here’s a list of some of the advantages:

  • text detection in video
  • Image processing and filtering for obstructed characters, for example
  • Detect text from a PDF file
  • Write the results in a Word or Excel file

In the following script, we load the image with OpenCV and draw rectangles around the detected text. Position data is obtained using the image_to_data function. The raw text can also be obtained using the image_to_string function.

from PIL import Image
import pytesseract
from pytesseract import Output
import cv2
 
source = 'test.png'
img = cv2.imread(source)
text=pytesseract.image_to_string(img)
print(text)

d = pytesseract.image_to_data(img, output_type=Output.DICT)
 
NbBox = len(d['level'])
print ("Number of boxes: {}".format(NbBox))

for i in range(NbBox):
	(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
	# display rectangle
	cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
 
cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

The script also works on document photos

Bonus: Text recognition with Python à partir d’un fichier PDF

Installing the pdf2image library

pip install pdf2image

pdf2image requires poppler to be installed

Quite simple on Linux

sudo apt-get install poppler-utils

Under Windows

  • download folder zip
  • Extract files wherever you want (C:
  • Add bin folder to Path environment variable (C:\Users\ADMIN\Documents\poppler\Library\bin)
  • test with the command pdftoppm -h

Script to retrieve text from a PDF

from pdf2image import convert_from_path, convert_from_bytes
from PIL import Image
import pytesseract
from pytesseract import Output

images = convert_from_path('invoice.pdf')

# get text
print("Number of pages: {}".format(len(images)))
for i,img in enumerate(images):
    print ("Page N°{}\n".format(i+1))
    print(pytesseract.image_to_string(img))

Script to display rectangles on a PDF

from pdf2image import convert_from_path, convert_from_bytes
from PIL import Image
import pytesseract
from pytesseract import Output
import cv2
import numpy

images = convert_from_path('invoice.pdf')
for i,source in enumerate(images):
	print ("Page N°{}\n".format(i+1))
	
	#convert PIL to opencv
	pil_image = source.convert('RGB') 
	open_cv_image = numpy.array(pil_image) 
	# Convert RGB to BGR 
	img = open_cv_image[:, :, ::-1].copy() 
	#img = cv2.imread(source)

	d = pytesseract.image_to_data(img, output_type=Output.DICT)
	 
	NbBox = len(d['level'])
	print ("Number of boxes: {}".format(NbBox))

	for j in range(NbBox):
		(x, y, w, h) = (d['left'][j], d['top'][j], d['width'][j], d['height'][j])
		# display rectangle
		cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
	 
	cv2.imshow('img', img)
	cv2.waitKey(0)
	cv2.destroyAllWindows()

Applications

  • Reading scanned documents
  • Real-time text recognition from video

Sources

Create your image bank with Python

Create your image bank with Python

, ,

To train a neural network in object detection and recognition, you need an image bank to work with. We’ll see how to download a large number of images from Google using Python. To train a neural network, you need a large amount of data. The more data, the better the training. In our case, we want to train a neural network to recognize a particular object. To do this, we create a Python script that downloads files from the Internet and places them in a folder.

Configuring Python3

Download Selenium and OpenCV libraries (optional)

python3 -m pip install selenium
python3 -m pip install opencv-python

download file geckodriver

N.B.: We use the OpenCV library only to check that OpenCV can open and use the images, so as not to clutter up the folder unnecessarily.

Image download Python script

This script launches a search on Google Image and saves the images found in the folder specified for the image bank.

N.B.: Don’t forget to specify the path to the geckodriver GECKOPATH file, the path to the destination folder and the keywords for Google search.

import sys
import os
import time 

#Imports Packages
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException,WebDriverException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

import cv2

########################################################################
GECKOPATH = "PATH-TO-GECKODRIVER"
parent_dir = "PATH-TO-FOLDER"
search='coffee mug'
########################################################################

# path 
folderName=search.replace(" ","_")
directory = os.path.join(parent_dir, folderName,'img') 
   
# Create the directory 
try: 
	if not os.path.exists(directory):
		os.makedirs(directory) #os.mkdir(directory)
except OSError as error: 
	print("ERROR : {}".format(error)) 


sys.path.append(GECKOPATH)  
#Opens up web driver and goes to Google Images
browser = webdriver.Firefox()#Firefox(firefox_binary=binary)

#load google image
browser.get('https://www.google.ca/imghp?hl=en')

delay = 10 # seconds
try:
	btnId="L2AGLb"
	myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID , btnId))) #id info-address-place-wrapper 
	elm=browser.find_element_by_id(btnId)
	elm.click()
	print("Popup is passed!")
except TimeoutException as e:
	print("Loading took too much time!")



# get and fill search bar
box = browser.find_element_by_xpath('//*[@id="sbtc"]/div/div[2]/input')
box.send_keys(search)
box.send_keys(Keys.ENTER)


#Will keep scrolling down the webpage until it cannot scroll no more
last_height = browser.execute_script('return document.body.scrollHeight')
while True:
	browser.execute_script('window.scrollTo(0,document.body.scrollHeight)')
	time.sleep(5)
	new_height = browser.execute_script('return document.body.scrollHeight')
	try:
		browser.find_element_by_xpath('//*[@id="islmp"]/div/div/div/div/div[5]/input').click()
		time.sleep(5)
	except:
		print("button not found")
		pass
		
	if new_height == last_height:
		break
	last_height = new_height
	
imgList=[]
for i in range(1, 1000):
	try:
		browser.find_element_by_xpath('//*[@id="islrg"]/div[1]/div['+str(i)+']/a[1]/div[1]/img').screenshot(directory+'\{}.png'.format(i))
		imgList.add(directory+'\{}.png'.format(i))
		
	except:
		pass
browser.quit()
 
#Test images with OpenCV
for img in imgList:
	try:   
		cv2.imread(img)
	except Exception as e:
		os.remove(img)
		print("remove {}".format(img))

BONUS: Managing a popup

In the code I’ve added a command to manage the popup that appears when the web page is opened. It will wait for the button with the correct identifier to be loaded before pressing it.

delay = 10 # seconds
try:
	btnId="L2AGLb"
	myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID , btnId))) #id info-address-place-wrapper 
	elm=browser.find_element_by_id(btnId)
	elm.click()
	print("Popup is passed!")
except TimeoutException as e:
	print("Loading took too much time!")

 

You now have an image bank that you can use for visual recognition, for example, or for image processing.

Applications

  • Develop image processing algorithms
  • Training neural networks for object detection and recognition

Sources

eSpeak voice synthesizer on Raspberry Pi

eSpeak voice synthesizer on Raspberry Pi

You can turn your Raspberry Pi into an intelligent assistant by using a voice synthesizer like eSpeak. Thanks to this tutorial, you’ll be able to make your robot or application talk.

Hardware

  • Raspberry Pi 3 avec Raspberry Pi OS
  • Internet connection and remote access

Check Audio devices

First, let’s check the available audio devices

aplay -l

Select audio device

By default, sound is output via the jack or HDMI port. If you wish to connect headphones, you need to select the corresponding output on the Raspberry Pi GUI or with sudo raspi-config.

Check audio

First, we’ll check that the audio is working properly and that the audio peripherals are taken into account.

aplay /usr/share/sounds/alsa/*

Installing the espseak speech synthesizer

sudo apt-get install espeak

To test the espeak installation, simply enter the following command:

espeak "Hello World"

You can also install the python package and use it directly in a script.

sudo apt-get install python3-espeak
from espeak import espeak
espeak.synth("hello world")

To obtain a list of available voices, enter the command

espeak --voices

Using eSpeak in a Python script with Subprocess

You can run shell commands from a Python script using the subprocess library.

from time import sleep
import subprocess

def say(something, voice='fr+f1'):
	print("espeak -v {} {}".format(voice,something))
	subprocess.call(['espeak', '-v%s' % (voice), something])
    
text=[u"bonjour",u"aurevoir",u"a bientot", u"sa va", u"merci"]
textf=u"bienvenu admine, comment allez-vous aujourd'hui?"

for t in text:
	say(t)
	sleep(0.5)

say(textf)

Install pyttsx3

To use the voice synthesizer with Python, we’ll use the pyttsx3 package.

To install the pyttsx3 Python package, enter the following command:

pip3 install pyttsx3

Choose the voice you want

for i,voice in enumerate(voices):
	print("----------------{}".format(i))
	print(voice.id)
	engine.setProperty('voice', voice.id)  # changes the voice
	print(voice.age)
	print(voice.gender)
	print(voice.languages)

Code txt2speech

#!/usr/bin/python3.4
# -*-coding:Utf-8 -*

from time import sleep
import pyttsx3 as pyttsx
engine = pyttsx.init()
voices = engine.getProperty('voices')

print(voices)

engine.setProperty('voice', voices[0].id)  # changes the voice
	
#engine.setProperty('voice', voices[14].id)  # changes the voice
voiceNum=0
#Fr=[0 1 6 7 8 14] #6 = AC
"""
for voice in voices:
	print(voiceNum)
	voiceNum=voiceNum+1
	print(voice.id)
	engine.setProperty('voice', voice.id)  # changes the voice
	print(voice.age)
	print(voice.gender)
	print(voice.languages)
	print(voice.name)
"""

text=[u"bonjour",u"aurevoir",u"a bientot", u"sa va", u"merci"]
textf=u"bienvenu admine, comment allez-vous aujourd'hui?"

#rate = engine.getProperty('rate')
engine.setProperty('rate', 120)

#volume = engine.getProperty('volume')
#engine.setProperty('volume', volume)

for t in text:
	engine.say(t)
	sleep(0.5)

engine.say(textf)


engine.runAndWait()

You should hear your Rapsberry Pi speak!

Add more voices

You can create and add your own voice, taking inspiration from existing voices and reading the documentation carefully. This will enable you to concentrate on certain phrases and pronunciations. Then, once you’ve mastered the settings, you’ll eventually have a voice that suits you.

Improving results with MBROLA votes

You can use another voice synthesizer a little more powerful than eSpeak and install new voices with MBROLA. In our case, we’re installing a French female voice. You can find a suitable voice in the list of available voices.

Check for mbrola. The following command will give you the available voices

apt-cache search mbrola
mkdir espeak
cd espeak
wget https://raspberry-pi.fr/download/espeak/mbrola3.0.1h_armhf.deb -O mbrola.deb
sudo dpkg -i mbrola.deb
rm mbrola.deb

Once you have installed mbrola, you can install new voices

sudo apt-get install mbrola-fr4
espeak -v mb-fr4 "bonjour admin, comment allez-vous?"

N.B.: Pour installer le paquet mbrola, j’ai dû mettre à jour l’OS vers la version 11 (bullseye). Vous pouvez vérifier la version de l’os avec la commande cat /etc/os-release

Obviously, mbrola voices cannot be used with pyttsx3. We therefore use subprocess

from time import sleep
import subprocess

def say(something, voice='mb-fr4'):
	print("espeak -v {} {}".format(voice,something))
	subprocess.call(['espeak', '-v%s' % (voice), something])
    
text=[u"bonjour",u"aurevoir",u"a bientot", u"sa va", u"merci"]
textf=u"bienvenu admine, comment allez-vous aujourd'hui?"

for t in text:
	say(t)
	sleep(0.5)

say(textf)

HMDI audio output

If you wish to use HDMI audio output and there is no sound, it may be necessary to modify the config.txt file

sudo nano /boot/config.txt

Uncomment line hdmi_drive=2

Sources

Object recognition with Python

Object recognition with Python

, ,

In this tutorial, we’ll look at how to perform object recognition with Python, using a neural network pre-trained with deep learning.

We saw in a previors tutorial how to recognize simple shapes using computer vision. This method only works for certain predefined simple shapes. If yor want to recognize a wider variety of objects, the easiest way is to use artificial intelligence.

Hardware

  • A computer with a Python3 installation
  •  A camera

Principle

Artificial intelligence is a field of computer science in which the program itself learns to perform certain tasks. Visual recognition in particular. In this tutorial, we’ll use a trained neural network to recognize particular shapes.

You need a lot of data to train a neural network properly. It has been shown that learning is faster on a neural network trained for something else. For example, a neural network trained to recognize dogs will train more easily to recognize cats.

Python configuration

If this is not the case, you can download and install Python 3

You can then install the necessary libraries OpenCV, numpy and imutils

pip3 install opencv-python numpy imutils

or

python3 -m pip install opencv-python numpy imutils

Download ModelNet-SSD

Place the model files in a folder and create the file ObjectRecognition.py

Python script for Object recognition

First, we create a video stream (vs) using the imutils library, which will retrieve the images from the camera.

vs = VideoStream(src=0, resolution=(1600, 1200)).start()

We initialise a neural network with the ModelNet-SSD (net) parameters using the OpenCV library.

net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

We will then create a loop which, at each iteration, will read the camera image and pass it to the input of the neural network for object detection and recognition.

while True:
	# Get video stream. max width 800 pixels 
	frame = vs.read()
	blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 0.007843, (300, 300), 127.5)

	# Feed input to neural network 
	net.setInput(blob)
	detections = net.forward()

Finally, the code displays the detection box and the probability of recognition on the image.

label = "{}: {:.2f}%".format(CLASSES[idx],confidence * 100)
cv2.rectangle(frame, (startX, startY), (endX, endY),COLORS[idx], 2)
y = startY - 15 if startY - 15 > 15 else startY + 15
cv2.putText(frame, label, (startX, y),cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
#  ObjectRecognition.py
#  Description:
#		Use ModelNet-SSD model to detect objects
#
#  www.aranacorp.com

# import packages
import sys
from imutils.video import VideoStream
from imutils.video import FPS
import numpy as np
import argparse
import imutils
import time
import cv2

# Arguments construction
if len(sys.argv)==1:
	args={
	"prototxt":"MobileNetSSD_deploy.prototxt.txt",
	"model":"MobileNetSSD_deploy.caffemodel",
	"confidence":0.2,
	}
else:
	#lancement à partir du terminal
	#python3 ObjectRecognition.py --prototxt MobileNetSSD_deploy.prototxt.txt --model MobileNetSSD_deploy.caffemodel
	ap = argparse.ArgumentParser()
	ap.add_argument("-p", "--prototxt", required=True,
		help="path to Caffe 'deploy' prototxt file")
	ap.add_argument("-m", "--model", required=True,
		help="path to Caffe pre-trained model")
	ap.add_argument("-c", "--confidence", type=float, default=0.2,
		help="minimum probability to filter weak detections")
	args = vars(ap.parse_args())



# ModelNet SSD Object list init
CLASSES = ["arriere-plan", "avion", "velo", "oiseau", "bateau",
	"borteille", "autobus", "voiture", "chat", "chaise", "vache", "table",
	"chien", "cheval", "moto", "personne", "plante en pot", "morton",
	"sofa", "train", "moniteur"]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

# Load model file
print("Load Neural Network...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

if __name__ == '__main__':

	# Camera initialisation
	print("Start Camera...")
	vs = VideoStream(src=0, resolution=(1600, 1200)).start()
	#vs = VideoStream(usePiCamera=True, resolution=(1600, 1200)).start()
	#vc = cv2.VideoCapture('./img/Splash - 23011.mp4') #from video

	time.sleep(2.0)
	fps = FPS().start()
	
	#Main loop
	while True:
		# Get video sttream. max width 800 pixels 
		frame = vs.read()
		#frame= cv2.imread('./img/two-boats.jpg') #from image file
		#ret, frame=vc.read() #from video or ip cam
		frame = imutils.resize(frame, width=800)

		# Create blob from image
		(h, w) = frame.shape[:2]
		blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 0.007843, (300, 300), 127.5)

		# Feed input to neural network 
		net.setInput(blob)
		detections = net.forward()

		# Detection loop
		for i in np.arange(0, detections.shape[2]):
			# Compute Object detection probability
			confidence = detections[0, 0, i, 2]
			
			# Suppress low probability
			if confidence &gt; args["confidence"]:
				# Get index and position of detected object
				idx = int(detections[0, 0, i, 1])
				box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
				(startX, startY, endX, endY) = box.astype("int")

				# Create box and label
				label = "{}: {:.2f}%".format(CLASSES[idx],
					confidence * 100)
				cv2.rectangle(frame, (startX, startY), (endX, endY),
					COLORS[idx], 2)
				y = startY - 15 if startY - 15 &gt; 15 else startY + 15
				cv2.putText(frame, label, (startX, y),
					cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
					
				# enregistrement de l'image détectée 
				cv2.imwrite("detection.png", frame)
				
				
		# Show video frame
		cv2.imshow("Frame", frame)
		key = cv2.waitKey(1) &amp; 0xFF

		# Exit script with letter q
		if key == ord("q"):
			break

		# FPS update 
		fps.update()

	# Stops fps and display info
	fps.stop()
	print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
	print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

	cv2.destroyAllWindows()
	vs.stop()
	vc.release()

Sorrces d’image porr la détection d’objet

Vors porvez utiliser ce script avec différentes sorrces d’image. Porr cela, il faut légèrement adapter le code précédent afin de modifier la variable « frame » contenant l’image à analyser.

  • Your computer’s webcam
vs = VideoStream(src=0, resolution=(1600, 1200)).start()
while True:
	frame = vs.read()
vc = cv2.VideoCapture('rtsp://user:password@ipaddress:rtspPort')
while True:
	ret, frame=vc.read() #from ip cam
  • Raspberry Pi Picam
vs = VideoStream(usePiCamera=True, resolution=(1600, 1200)).start()
while True:
	frame = vs.read()
  • A video file
vc = cv2.VideoCapture('./img/Splash - 23011.mp4') #from video
while True:
	ret, frame=vc.read() #from video
  • An image file
frame= cv2.imread('./img/two-boats.jpg') 

Results

In this example, we send the neural network an image of two boats which are correctly recognised. To obtain slightly different results, you can modify the confidence parameter to avoid false positives.

You can test this code with your webcam or with photos, for example, to see how the model and object recognition perform.

Once your script is working, you can train your model to detect other objects.

Packages and Templates

In this tutorial, we have used the pre-trained ModelNet-SSD model. Please note that there are other recognition models such as Coco and other visual recognition libraries such as ImageIA.

Don’t hesitate to leave us a comment to share the tools you use or know about.

Sources