idea added
The following is a document explaining the project Klìnic, developed during the Hackathon "HackUPC 2024".
Biomedical research is hard, but we can help. Klìnic is an integrated platform to get insights on clinical trial trends in a scoped domain, helping design new experiments and analyze past failures.
The idea is to help clinicians and researchers easily get an overview of the landscape of clinical research trials in a certain field. They just have to input a general description of a disease, such as "A disease that affects young patients, generally male Caucasians". We get the diseases whose description is more similar to that statement by means of the embeddings of their descriptions. Then, we use a knowledge graph to represent the relationships between diseases to find the most similar diseases to the ones the user is interested in. This way, we can do data augmentation and find more clinical trials that are related to the diseases the user is interested in. We can then use a language model to summarize the clinical trials and extract numerical data from them.
Our team is composed of students from different backgrounds, that include computer science, mathematics, and biomedicine. We wanted to create a tool that could help clinicians and researchers get an overview of the landscape of clinical research trials in a certain field easily. We believe that this tool could help them design new experiments and analyze past failures.
This tool is an integrated platform to get insights on clinical trial trends in a certain domain (for example, diseases that affect young females). The user just has to input a general description of a disease, such as "A disease that affects young patients, generally females, showing symptoms of fatigue and muscle pain".
We get the diseases whose description is more similar to that statement by means of the embeddings of their descriptions. Then, we use a knowledge graph to represent the relationships between diseases to find the most similar diseases to the ones the user is interested in. This way, we can do data augmentation and find more clinical trials that are related to the diseases the user is interested in. We then use a language model to summarize the clinical trials and extract numerical data from them.
We wrote the whole frontend (using Streamlit) and most parts of the backend in Python and some backend-part in Matlab. Our system first has to preprocess data that is fed into IRIS. First, there is our knowledge graph which holds information about the relationship between different diseases. For this, we downloaded the MedGen dataset and trained an embedding model. We took the same approach for clinical trials (source) to represent the relationships.
The heart of our logic comprises the following nine steps:
Our setup relies on the demo provided by InterSystems. As can be seen from above, we used IRIS' vector search frequently when determining the similarities.
This project was built at HackUPC 2024 hackathon : https://devpost.com/software/x-grmvsx
https://huggingface.co/spaces/klinic-hackupc/klinic/tree/main
https://www.youtube.com/live/iS6RO9GHTs0?si=SNjv341-GSMpkPd0&t=1293
git lfs fetch --all
git lfs checkout
docker-compose up -d
OPENAI_API_KEY=<your-openai-api-key>
streamlit run app.py