Home Applications thalamus-og

thalamus-og

This application is not supported by InterSystems Corporation. Please be notified that you use it at your own risk.
0
0 reviews
0
Awards
9
Views
0
IPM installs
0
0
Details
Releases
Reviews
Issues
This app has a demo View
Thalamus revolutionizes LLM routing

What's new in this version

Initial Release


Logo

Thalamus

The Brain Behind Sustainable AI
Ballot »

Audrey Chen · Cindy Yang · Rohan Kalahasty · Siddhant Sood

Introduction

The deployment of large language models (LLMs) has significantly advanced natural language processing applications. However, the high computational costs associated with running these models—both economically and environmentally—pose substantial challenges. Specifically:

  • Environmental Impact: Millions of tons of CO₂ are emitted due to the energy-intensive inference processes of powerful LLMs.

  • Economic Costs: Billions of dollars are wasted on processing simple queries with unnecessarily large models.

Traditional LLM routers attempt to mitigate these issues by directing simple queries to lightweight models. While this approach reduces costs and carbon footprint, it often results in decreased performance and user satisfaction.

How Thalamus Works

Thalamus revolutionizes LLM routing by introducing a sophisticated lightweight layer that intelligently designs multi-agent systems in real-time to handle queries efficiently without compromising performance.

Step 1: Model Fine-Tuning with QLoRA

  • Dataset Preparation: Utilized a subset of data from the LMSYS Arena, focusing on information relevant to our models.
  • Training Objective: Fine-tuned the model using QLoRA (Quantized Low-Rank Adaptation) to predict the most suitable LLM for a given prompt.
  • Outcome: Enabled Thalamus to learn the mapping between different embedding clusters and the preferred LLMs for those clusters, enhancing its decision-making capability.

Step 2: Standardized Router Output

  • Challenge: The desired router output was too complex for existing lightweight models like Phi-3-mini to handle directly.
  • Solution: Created a comprehensive dataset of 1,000 query-output pairs generated using GPT-4o, a robust out-of-the-box router.
  • Fine-Tuning: Trained Phi-3-mini on this dataset using negative log-likelihood loss to ensure accurate and reliable routing decisions.
  • Optimization: Applied 4-bit quantization to compress the model, making it lightweight enough to run directly in the browser without sacrificing performance.

Step 3: Modeling and Evaluation

  • Human Eval Performance: Assessed the ability of Thalamus to maintain high performance relative to larger models.
  • Cost Efficiency: Measured the reduction in financial expenditure by using Thalamus compared to larger LLMs.
  • Carbon Footprint: Evaluated the decrease in CO₂ emissions resulting from Thalamus’s optimized routing.

Performance Results:

  • Relative to GPT-4o:
    • Cost: 73.10% decrease
    • Performance: 0.90% increase
    • Carbon Footprint: 65.38% decrease
  • Relative to Claude:
    • Cost: 68.30% decrease
    • Performance: 1.21% decrease
    • Carbon Footprint: 50.41% decrease

The carbon footprint was loosely modeled via the approach from this paper: https://openreview.net/forum?id=aIok3ZD9to

These results demonstrate that Thalamus significantly reduces costs and environmental impact while maintaining, and in some cases even enhancing, performance.

User Experience and Visualization

Thalamus is seamlessly integrated into standard LLM chatbot interfaces, providing users with an intuitive experience.

Additionally, it offers:

  • Decision Visualization: Displays insights into Thalamus’s decision-making process, helping users understand which model is being selected and why.
  • Multi-Agent Interaction: Showcases the dynamic performance and interaction of multiple agents in real-time, enhancing transparency and trust in the system.

Conclusion

Thalamus presents an LLM layer that allows companies to save millions while simultaneously saving environmental sustainability. By intelligently directing queries to the most appropriate model and leveraging lightweight architectures, Thalamus ensures efficient and effective language processing suitable for a wide range of applications.

(back to top)

Made with
Version
1.0.019 Sep, 2024
Ideas portal
https://ballot.hackmit.org/project/potjl-vinac-wzepq-pkcfj
Category
Solutions
Works with
InterSystems IRISInterSystems Vector Search
First published
19 Sep, 2024