InterSystems IRIS Interoperability OCR Service
This is an InterSystems IRIS Interoperability OCR Service to extract text from images and pdfs from a file into a multipart request from form or http request.
What The the service does
This application receive a http multipart request with a file, extract text using OCR from Tesseract and returns the result
Prerequisites
Make sure you have git and Docker desktop installed.
Installation: Docker
Clone/git pull the repo into any local directory
$ git clone https://github.com/yurimarx/ocr-service.git
Open the terminal in this directory and run:
$ docker-compose build
- Run the IRIS container with your project:
$ docker-compose up -d
OCR and NLP working together:
How to Run the Ocr Production
-
Open the production
-
Set host destination folder to the uploaded files. See:
-
Start the production.
-
Now Open Postman or create a multipart request into a form pointing to localhost:9980/ using POST with a form-data file attribute. See sample (use an image or pdf with image inside):
-
See the text returned - support to english and portuguese languages only, in the first version
-
Send 2 or 3 files with some text
-
Go to the NLP Domain Explorer
-
Analyze the texts and enjoy!