InterSystems IRIS NLP Website Analyzer
This is an InterSystems IRIS NLP Website Analyzer. It extracts all HTML content from a site and related content, using crawler and uses IRIS NLP to analyze the website content.
What The the app does
This application receive a URL, use a Crawler to extract all website content and analyze it using NLP
Website-Analyzer - IRIS NLP and Crawler4J in action!
Website-Analyzer IRIS BI in action!
Prerequisites
Make sure you have git and Docker desktop installed.
Installation: Docker
Clone/git pull the repo into any local directory
$ git clone https://github.com/yurimarx/website-analyzer.git
Open the terminal in this directory and run:
$ docker-compose build
- Run the IRIS container with your project:
$ docker-compose up -d
How to Run the Ocr Production
-
Open the production
-
Set Depth and TotalPages to the Crawler. Depth is how many subpages will be crawled and TotalPages is how many pages will be processed. Tip: start with Depth 0 and 5 pages, to be a fast initial test.
-
Start the production.
-
Now Open Postman or create a request in a browser pointing to localhost:9980?Website=https://www.intersystems.com/ using GET. Choose any website changing https://www.intersystems.com/ to any site (e.g.: yoursite.com)
-
Go to the NLP Domain Explorer
-
Go to the BI User Portal
-
Analyze the texts and enjoy!