Home Applications companies-search

companies-search

This application is not supported by InterSystems Corporation. Please be notified that you use it at your own risk.
5
1 reviews
0
Awards
166
Views
0
IPM installs
3
2
Details
Releases
Reviews
Issues
Pull requests
Videos
Articles
Find a company based on Glassdoor with Vector Search and GPT.

What's new in this version

Initial Release

Companies Search

The Companies Search is a REST API based on InterSystems Vector Search, designed to retrieve companies from a vector database using keywords. The vector database was created interpreting a CSV dataset that we acquired of Kaggle website and converting datas on vector type.

The API can locate companies that have semantic similarity with the provided keywords, ensuring relevant and accurate results.

Once the companies are located, the API utilizes ChatGPT for IRIS to process the data and generate informative summaries. These summaries provide users with an overview of the companies returned by the vector search. Additionally, users have the option to request summaries focusing on “positive” or “negative” aspects of the companies.

Essentially, this platform offers users the ability to quickly obtain initial impressions of companies, whether for potential collaboration or service provision purposes.

Índice

Installation

  1. Clone the repo
git clone https://github.com/juliocmomente/companies-search.git
  1. For using companies search, you need a OpenAI key, access: OpenAI Key

  2. Click in “Create new secret key” button, type the optional field and click on “Create secret key” button.

alt text

  1. Update the property OPENAI_API_KEY in .env file with your secret key.

alt text

  1. Start the Docker containers:

    docker-compose up
    

    OBS: The Dockerfile has all configurations you need to start the project.

    The first run will probably take a few minutes to download all dependencies and create the vectorized database, something around of 25 minutes.

Video demo

You can access this video to make it easier to see how the API works and/or continue reading the documentation below:
Video Demo

Solution

An API with a single endpoint has been developed:

  • GET, designed to retrieve companies stored in the vector database and return them in JSON format.

The database was acquired from Kaggle and will be represented by a single table called GlassdoorReview.

Finally, a service class named InitService was created, which includes a method developed in Embedded Python to read the lines from the CSV file, insert them, and transform them into VECTOR type.

Below is a detailed explanation of the modeling used in the project.

Architectural modeling

alt text

Sequence diagram

alt text

Logic modeling

alt text

Data input

The API can be accessed via the following address:

http://localhost:52773/api/companies/search

To perform the company search, this endpoint accepts up to 3 parameters to facilitate the search:

  • key: This parameter is the main search filter. Here you should enter a word or sentence to be searched into companies reviews (pros or cons). The search engine is based on vector search.

    So, you can search for characteristics that you want to find in companies, such as: flexible hours, in this scenario, the API will search for companies operating in this format.

    If the API don’t found any companies, will be returned message: No records found.

    e.g.:

    http://localhost:52773/api/companies/search?key=flexible%20hours
    

    OBS: If this parameter is not provided, the API will not perform any search and will return the following error to the user: No records returned.

  • numberOfResults: This parameter is not mandatory, he can used to specify the number of companies you want to recover. If this parameter is not informed, the initial value will be 3, but the limit that we defined is 10, numbers superiors presented problems of performance.
    So if you inform a number superior of 10, will be return the error message: To many request

    e.g.:

    http://localhost:52773/api/companies/search?key=flexible%20hours&numberOfResults=5
    
  • pros: This parameter is optional and is of boolean type. It tells the API whether we want to locate companies based on Pros or Cons, that is, positive or negative aspects. Based on this parameter, Chat GPT will compile the corresponding summary.

    e.g.:

    http://localhost:52773/api/companies/search?key=flexible%20hours&numberOfResults=10&pros=0
    

    OBS: If this parameter is not informed, the initial value will always be 1 (true), that is, it will only look for positive points.

Data output:

When making the request, consider the following response format in JSON:

[
    {
        "company": "Google",
        "location": "USA",
        "overallRating": 3,
        "summarize": "Summary provided by Chat GPT for the company Google"
    },
    {
        "company": "IBM",
        "location": "USA",
        "overallRating": 4,
        "summarize": "Summary provided by Chat GPT for the company IBM"
    }
]

Technologies

Technologies applied in the project:

  • Vector Search: Used as a company search engine, using words that share similar semantic meaning.

  • Large Language Model (LLM): Chat GPT: Used to prepare a summary of the companies returned by the vector search.

  • Embedded Python: Used to create all application scripts, such as:

    • Assembling the vector database by inserting VECTOR type data.
    • Rest API to receive input data from the external environment and process it with vector search and GPT Chat.
    • Integration script between vector search results and use of GPT Chat.
  • Docker container: Used to create the IRIS environment and application, so that, using a single command: docker-compose up, the entire project is ready for use.

  • InterSystems IRIS: Used for creating the vector database and structuring the Rest API.

  • ObjectScript: Used to create Rest API communication, using native extension: %CSP.REST

Architecture

The project was developed using a layered architecture to separate responsibilities for code, database and business rules.

Packages:

  • data: Structuring persistence
  • rest: Mapping and structuring all Rest API.
    • companies: Contains the application’s only endpoint, /search, for searching for companies.
  • service: Business rules and validations, as well as vector database searches.

alt text

Team members

Made with
Install
zpm install glassdoor-reviews
Version
1.0.017 May, 2024
Category
Technology Example
Works with
InterSystems IRISInterSystems Vector Search
First published
11 May, 2024