Build text index to NLP with no human action now (thanks DeBoe tip)
This is an InterSystems IRIS Interoperability OCR Service to extract text from images and pdfs from a file into a multipart request from form or http request.
This application receive a http multipart request with a file, extract text using OCR from Tesseract and returns the result
Make sure you have git and Docker desktop installed.
Clone/git pull the repo into any local directory
$ git clone https://github.com/yurimarx/ocr-service.git
Open the terminal in this directory and run:
$ docker-compose build
$ docker-compose up -d
Open the production
Set host destination folder to the uploaded files. See:
Start the production.
Now Open Postman or create a multipart request into a form pointing to localhost:9980/ using POST with a form-data file attribute. See sample (use an image or pdf with image inside):
See the text returned - support to english and portuguese languages only, in the first version
Send 2 or 3 files with some text
Go to the NLP Domain Explorer
Analyze the texts and enjoy!
Build text index to NLP with no human action now (thanks DeBoe tip)
NLP support and PEX support
Fix instructions and short description
Initial Release