Initial Release
This is a demo of the OCR functionality of the pero-ocr library.
It used in the iris application server in python.
This is an example of input data :
This is the result of the OCR :
In this example you have the following information:
TextEquiv
tagconf
attribute of the TextEquiv
tagCoords
tag
Pero OCR
2022-12-13T08:47:12.207893+00:00
2022-12-13T08:47:12.207893+00:00
IN
CONGRESS, JULY 4, 1776.
Dhe unaniwons Declaratton of te Heten maiss States of TNmerica
hen n lí loune z human venl, i kemu nematy k mpeopě toíohohhehttcal bandí uhích have connechdí tem vith ancthet, andíl
o hi ſhwes f he eail, fie rehatal andequal flohon & ufch lhe laav . kalut and Aloil ped entilt ttem, a dant rafech to the ofunin o manknd tequies fhat thep
imuiaa
Qlver
Vbalřew/
17.
git clone https://github.com/grongierisc/iris-pero-ocr
/!\ This demo requires the models to be installed /!\
To install the model download the model from the realase page and extract it in the misc/pero-ocr-fix-computation-on-cpu of the project.
https://github.com/grongierisc/iris-pero-ocr/releases/download/v1.0.0/OCR_350000.pt.cpu
https://github.com/grongierisc/iris-pero-ocr/releases/download/v1.0.0/ParseNet_296000.pt.cpu
/!\ Both models are required /!\
This is the expected misc folder structure :
misc
├── config_file.ini
├── in
├── out
└── pero-ocr-fix-computation-on-cpu
├── OCR_350000.pt.cpu
├── ParseNet_296000.pt.cpu
└── ocr_engine.json
Then docker-compose up
docker-compose up
Put any sample image in the samples
folder and copy them in misc/in folder and they will be processed by the OCR.
The results will be in the misc/out folder.
You will find the xml files with the results and the images with the detected text.
You can monitor the progress in the logs here http
login with _SYSTEM and SYS
The OCR is an Business Service that parse all the files in the misc/in folder and put the results in a message queue.
The message queue is consumed by a Business Operation that put the results in the misc/out folder.
Code is in the src/python/pero-ocr folder.