Initial Release
With the rise of Gen AI, we believe that now users should be able to access unstructured data in a much simpler fashion. Most people have many emails that they cannot often keep track of. For example, in investment/trading firms, professionals rely on quick decisions leveraging as much information as possible. Similarly, senior employees in a startup dealing with many teams and disciplines might find it difficult to organize all the emails that they receive. These common problems can be solved using GenAI and help make their lives easier and more organized. The possibility of hallucinations in GenAI models can be scary and that's where RAG + Hybrid search comes in to save the day. This is what inspired us to build the product WALL-M ( Work Assistant LL-M).
Lars Quaedvlieg
Arvind Menon
Alejandro Hernandez Cano
Somesh Mehra
This project was completed for the HackUPC 2024 Hackathon in Barcelona! We utilized the Vector Search capability to the InterSystems IRIS Data Platform to solve the problem of question-answering with semantic search whilst trying to prevent model hallucinations.
The repository contains the complete question-answering platform, which you can set up with the steps below. However, note that you currently need an OpenAI and an AI21 Labs key to utilize the models. In the future, we hope this platform can be extended to provide local LLMs instead of commercial solutions. Furthermore, we hope to integrate a direct connection to Outlook.
Clone the repo
git clone git@github.com:lars-quaedvlieg/WALL-M.git
Change your directory to WALL-M
cd WALL-M
Install IRIS Community Edtion in a container, which will open a port on your device for the IRIS database system:
docker run -d --name iris-comm -p 1972:1972 -p 52773:52773 -e IRIS_PASSWORD=demo -e IRIS_USERNAME=demo intersystemsdc/iris-community:latest
:information_source: After running the above command, you can access the System Management Portal via http://localhost:52773/csp/sys/UtilHome.csp. Please note you may need to configure your web server separately when using another product edition.
Create a Python environment and activate it (conda, venv or however you wish) For example:
conda:
conda create --name wall-m python=3.10
conda activate
or
venv (Windows):
python -m venv wall-m
.\venv\Scripts\Activate
or
venv (Unix):
python -m venv wall-m
source ./venv/bin/activate
Install packages for all demos:
pip install -r requirements.txt
Make sure to obtain an OpenAI API Key and an AI21 Labs key. Then, create a .env
file in this repo to store the keys as:
OPENAI_API_KEY=xxxxxxxxx
AI21_API_KEY=xxxxxxxxx
The application in this repository is created using Taipy. To run it, just start Jupyter and navigate to the root folder and run:
python src/core/main.py
Once you have launched the platform, you need to head to 127.0.0.1:5000
. Once there, you need to select a data directory. This directory should contain JSON-files with e-mail descriptions, but we hope to replace this with direct authentication to Outlook in the future. The method to obtain these JSON-files can also be found in the codebase, with instructions below. Alternatively, you may use the example synthetic data in the data/emails
folder. These files are then used to create a database table with IRIS, which can then be queried using Retrieval Augmented Generations and Large Language Models.
In order to scrape your emails, make sure you are on a windows machine. You can then install the required packages by running:
pip install -r requirements_outlook.txt
We need to scrape e-mails from an Outlook account. For this you need to be signed in to your Outlook account in the Windows Outlook application. Then, you can run the following code to scrape e-mails:
python src/outlook/scrape_emails.py --email [YOUR_EMAIL]
This will add the emails in the data
directory with JSON-files containing the e-mail descriptions. These files can then be used to create a database table with IRIS.
http://localhost:52773/csp/sys/UtilHome.csp
, login with username: demo
, password: demo
(or whatever you configured)