Home Applications OCR Service

OCR Service

This application is not supported by InterSystems Corporation. Please be notified that you use it at your own risk.
5
2 reviews
2
Awards
590
Views
0
IPM installs
1
5
Details
Releases
Reviews
Awards
Issues
Pull requests
Videos
Articles
OCR Interoperability Service. This app get an image or pdf, extract text using OCR from Tesseract, send to the NLP index and returns to the user. The app uses a PEX Business Operation in ObjectScript consuming a Java class thats uses Tesseract. This app has NLP to the OCR extracted text too. Enjoy OCR + NLP!!

What's new in this version

Build text index to NLP with no human action now (thanks DeBoe tip)

InterSystems IRIS Interoperability OCR Service

This is an InterSystems IRIS Interoperability OCR Service to extract text from images and pdfs from a file into a multipart request from form or http request.

What The the service does

This application receive a http multipart request with a file, extract text using OCR from Tesseract and returns the result

Prerequisites

Make sure you have git and Docker desktop installed.

Installation: Docker

Clone/git pull the repo into any local directory

$ git clone https://github.com/yurimarx/ocr-service.git

Open the terminal in this directory and run:

$ docker-compose build
  1. Run the IRIS container with your project:
$ docker-compose up -d

OCR and NLP working together:

OCR and NLP in Action

How to Run the Ocr Production

  1. Open the production

  2. Set host destination folder to the uploaded files. See:

folder
  1. Start the production.

  2. Now Open Postman or create a multipart request into a form pointing to localhost:9980/ using POST with a form-data file attribute. See sample (use an image or pdf with image inside):

postman
  1. See the text returned - support to english and portuguese languages only, in the first version

  2. Send 2 or 3 files with some text

  3. Go to the NLP Domain Explorer

  4. Analyze the texts and enjoy!

Read more
Made with
Version
1.0.320 Nov, 2020
Category
Integration
Works with
InterSystems IRIS
First published
13 Nov, 2020