Projets:Machine a lire IA
Description du projet
Le but est de créer une petite machine à lire portable capable d’acquérir le texte à partir d'une capture d'image et de le lire au moyen d’une synthèse vocale.
Cahier des charges
Pour une première version le texte est dactylographié .
L’utilisateur doit pouvoir :
- Déclencher l’acquisition de la page
- Lancer la lecture et la mettre en pause
- Régler la langue de l’OCR Français ou Anglais (pour une première version)
- Régler le niveau sonore du texte lu
- Régler le débit de la synthèse vocale
- Copier le texte lu dans le presse-papier
- L’envoyer par mail automatiquement à une adresse prédéfinie
- Atteindre une fenêtre de configuration si nécessaire
La solution doit être compacte basée sur une petite caméra reliée à un Raspberry PI 3 ou 4. Muni d’une batterie.
Les message audio seront émis :
- Soit par un petit HP intégré au boitier
- Soit via une oreillette sans fil
En option :
- il sera aussi possible de prévoir un support pour fixer la solution sur un support de bureau.
- La solution pourra permettre de lire une plaque de rue en condition de plein jour ou un nom de salle de réunion en intérieur
Analyse de l'existant
Équipe (Porteur de projet et contributeurs)
- Porteurs du projet : François LB
- Concepteurs/contributeurs : Mickaël Le Cabellec
- Animateur (coordinateur du projet) :
- Fabmanager référent :
- Responsable de documentation
Matériel nécessaire
Outils nécessaires
Coût
Délai estimé
Fichiers source
- Loading the necessary packages
import cv2
import numpy as np
import pytesseract
from imutils.object_detection import non_max_suppression
from matplotlib import pyplot as plt
- Creating argument dictionary for the default arguments needed in the code.
args = {"image": "../input/text-detection/example-images/Example-images/ex24.jpg",
"east": "../input/text-detection/east_text_detection.pb", "min_confidence": 0.5, "width": 320, "height": 320}
- Give location of the image to be read.
- "Example-images/ex24.jpg" image is being loaded here.
cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
if vc.isOpened(): # try to get the first frame
rval, frame = vc.read()
else:
rval = False
while rval:
cv2.imshow("preview", frame)
rval, frame = vc.read()
key = cv2.waitKey(20)
if key == 27: # exit on ESC
break
cv2.destroyWindow("preview")
args['image'] = "../input/text-detection/example-images/Example-images/ex24.jpg"
image = cv2.imread(args['image'])
- Saving a original image and shape
orig = image.copy()
(origH, origW) = image.shape[:2]
- set the new height and width to default 320 by using args #dictionary.
(newW, newH) = (args["width"], args["height"])
- Calculate the ratio between original and new image for both height and weight.
- This ratio will be used to translate bounding box location on the original image.
rW = origW / float(newW)
rH = origH / float(newH)
- resize the original image to new dimensions
image = cv2.resize(image, (newW, newH))
(H, W) = image.shape[:2]
- construct a blob from the image to forward pass it to EAST model
blob = cv2.dnn.blobFromImage(image, 1.0, (W, H),
(123.68, 116.78, 103.94), swapRB=True, crop=False)
- load the pre-trained EAST model for text detection
net = cv2.dnn.readNet(args["east"])
- We would like to get two outputs from the EAST model.
- 1. Probabilty scores for the region whether that contains text or not.
- 2. Geometry of the text -- Coordinates of the bounding box detecting a text
- The following two layer need to pulled from EAST model for achieving this.
layerNames = [
"feature_fusion/Conv_7/Sigmoid",
"feature_fusion/concat_3"]
- Forward pass the blob from the image to get the desired output layers
net.setInput(blob)
(scores, geometry) = net.forward(layerNames)
- Returns a bounding box and probability score if it is more than minimum confidence
def predictions(prob_score, geo):
(numR, numC) = prob_score.shape[2:4]
boxes = []
confidence_val = []
# loop over rows
for y in range(0, numR):
scoresData = prob_score[0, 0, y]
x0 = geo[0, 0, y]
x1 = geo[0, 1, y]
x2 = geo[0, 2, y]
x3 = geo[0, 3, y]
anglesData = geo[0, 4, y]
# loop over the number of columns
for i in range(0, numC):
if scoresData[i] < args["min_confidence"]:
continue
(offX, offY) = (i * 4.0, y * 4.0)
# extracting the rotation angle for the prediction and computing the sine and cosine
angle = anglesData[i]
cos = np.cos(angle)
sin = np.sin(angle)
# using the geo volume to get the dimensions of the bounding box
h = x0[i] + x2[i]
w = x1[i] + x3[i]
# compute start and end for the text pred bbox
endX = int(offX + (cos * x1[i]) + (sin * x2[i]))
endY = int(offY - (sin * x1[i]) + (cos * x2[i]))
startX = int(endX - w)
startY = int(endY - h)
boxes.append((startX, startY, endX, endY))
confidence_val.append(scoresData[i])
# return bounding boxes and associated confidence_val
return (boxes, confidence_val)
- Find predictions and apply non-maxima suppression
(boxes, confidence_val) = predictions(scores, geometry)
boxes = non_max_suppression(np.array(boxes), probs=confidence_val)
for (startX, startY, endX, endY) in boxes:
# scale the coordinates based on the respective ratios in order to reflect bounding box on the original image
startX = int(startX * rW)
startY = int(startY * rH)
endX = int(endX * rW)
endY = int(endY * rH)
# extract the region of interest
r = orig[startY:endY, startX:endX]
# configuration setting to convert image to string.
configuration = ("-l eng --oem 1 --psm 8")
##This will recognize the text from the image of bounding box
text = pytesseract.image_to_string(r, config=configuration)
# append bbox coordinate and associated text to the list of results
results.append(((startX, startY, endX, endY), text))
- Display the image with bounding box and recognized text
orig_image = orig.copy()
- Moving over the results and display on the image
for ((start_X, start_Y, end_X, end_Y), text) in results:
# display the text detected by Tesseract
print("{}\n".format(text))
# Displaying text
text = "".join([x if ord(x) < 128 else "" for x in text]).strip()
cv2.rectangle(orig_image, (start_X, start_Y), (end_X, end_Y),
(0, 0, 255), 2)
cv2.putText(orig_image, text, (start_X, start_Y - 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
plt.imshow(orig_image)
plt.title('Output')
plt.show()