Why Does Model Routing Not Work? How EmaFusion is Revolutionizing the Art and Science of Combining Public and Private Models
banner
November 26, 2024, 15 min read time

Published by Souvik Sen in Agentic AI

closeIcon

KEY TAKEAWAYS:

  • Enterprises have a variety of tasks that can be solved via intelligent automation, but no single public LLM can achieve the accuracy needed in specific and dynamic enterprise contexts.
  • While using multiple models and applying model selection is the right way forward, traditional and ML-based methods of model selection do not reliably deliver high accuracy at low costs and latency.
  • EmaFusion™ intelligently combines the best public and private models to achieve the highest accuracy at the lowest costs and latency, while also preserving the security of sensitive enterprise data.

Enterprises face a daunting challenge when it comes to using AI, given how rapidly it is evolving. The key question is: How best can you leverage Large Language Models (LLMs) for various enterprise tasks, while achieving high accuracy at the lowest possible costs and latency?

With over 100 LLMs available today, each offering unique strengths in accuracy, latency, and cost and for different tasks, the natural inclination might be to hand-pick the best single model for each job—but this approach is fundamentally flawed and inefficient.

Agentic tasks in an enterprise range from highly simple to highly complex. For example, a straightforward task might involve generating routine email responses, or summarizing meeting notes. In contrast, complex tasks could be providing nuanced customer support, synthesizing vast amounts of enterprise knowledge, or conducting predictive analyses based on intricate data sets. The diversity and complexity of these tasks make it impossible for any single LLM to meet enterprise-grade requirements efficiently and securely.

The idea of selecting a "best" model assumes that one LLM can sufficiently balance the trade-offs between accuracy, latency, and cost across all types of tasks. But this is rarely the case. A model optimized for accuracy may be too slow for real-time applications, while a model that excels in speed may lack the depth needed for more sophisticated queries. Moreover, models that are cost-effective at scale might compromise on performance, leading to inadequate results for critical tasks.

An approach that combines multiple models for each task is thus essential. Enterprises need to move beyond the idea of a one-size-fits-all LLM and embrace using multiple models, each tailored to specific aspects of their agentic workflows.

Consider this: a lightweight model handles initial customer interactions or routine tasks, while a more powerful, resource-intensive model is used for in-depth analysis and complex decision-making processes. Such a strategic combination of models would ensure that each task is handled by the most appropriate tool, optimizing for accuracy, latency, and cost in a way that no single model can achieve.

But choosing the right combination of models in this manner, for a wide range of enterprise tasks, is no simple feat either. We explain why common approaches to model selection are inadequate for the enterprise, and how that shaped our vision to build EmaFusion™—our proprietary model that achieves the highest accuracy at the lowest costs and latency, in a manner that is secure, flexible, and efficient across all enterprise tasks.

Why Model Selection or Model Routing is Inadequate

In their quest to leverage large language models (LLMs) for agentic tasks, enterprises have turned towards various model selection strategies hoping to identify the "best" model for each task. But both traditional and machine learning-based approaches to model selection have proven inadequate for the complex and dynamic nature of enterprise environments.

ML Expert Manual Configuration-Based Model Selection

One common approach involves having a machine learning expert painstakingly analyze each agentic task to identify the most suitable LLM. Through extensive trial and error, they configure a routing framework that hardcodes which task should be directed to which model.

While this method may seem thorough, it has significant drawbacks:

Inflexibility: Once a configuration is established, incorporating new models into the framework becomes challenging. The entire system may need to be re-evaluated and reconfigured, which is not only time-consuming but also limits the ability to quickly take advantage of advancements in LLM technology.

Resource-Intensive: This approach requires the expertise of seasoned machine learning professionals who must continuously monitor and adjust the system. The trial-and-error nature of the process is labor-intensive, costly, and prone to human error.

Static Solutions in a Dynamic Environment: Enterprises are dynamic, with constantly evolving tasks and requirements. A rigid configuration may not adapt well to changing needs, leading to sub-optimal performance as tasks evolve or new tasks emerge.

Scalability Issues: As the number of tasks or models increases, the complexity of managing the routing framework grows exponentially, making it difficult to maintain efficiency across the system.

Machine Learning-Based Model Selection

In these solutions, a machine learning model analyzes the prompt body of the agent's task and attempts to route it to a single LLM. While this method appears to solve for the limitations of the above approach, it also suffers from a few issues in enterprise environments.

Misalignment with Enterprise Needs: These models are typically trained on benchmark datasets available on the internet, which do not reflect the specific and nuanced tasks of an enterprise setting. The model selection process is thus not well-tuned to unique enterprise environments, leading to poor performance.

Inadequacy in Handling Complex Prompts: In enterprises, agentic tasks are often complex, with prompts that may encapsulate multiple atomic tasks. For example, a single prompt might require generating a detailed customer report while simultaneously updating internal records and providing recommendations for the next steps. Existing machine learning-based selection models struggle with such complexity, often routing these prompts to high-capacity models that, while powerful, may be unnecessarily expensive and slow.

Over-Reliance on Expensive Models: Due to their inability to decompose complex prompts into manageable sub-tasks, these systems tend to over-rely on the most capable—and typically most expensive—models, leading to inflated costs and increased latency.

Training Data Limitations: The training data for these selection models often lacks the diversity and specificity required to accurately route enterprise-level tasks. This results in models that are not only biased towards general-purpose tasks but also fail to optimize for the diverse needs of an enterprise, where specialized knowledge and context are critical.

Both these model selection strategies, while valuable in certain contexts, fall short when applied to enterprise agentic tasks. They either lack the flexibility to adapt to new models and tasks or are constrained by training data and complexity, leading to inefficiencies and higher costs. There is a need for a more sophisticated approach, one that can effectively combine multiple models to meet the diverse and evolving demands of modern enterprises.

Model Merging

Model merging, the process of combining parameters or knowledge from multiple pre-trained models into a single unified model, has been explored in academic research as a method to leverage diverse model capabilities. Techniques such as parameter averaging, knowledge distillation, and weight interpolation have been proposed to merge models trained on different datasets or tasks.

These approaches remain largely theoretical, however, impractical for real-world use cases, particularly when attempting to merge closed-source and open-source models. Closed-source models often have proprietary architectures, undisclosed training methodologies, and inaccessible weights, making direct parameter alignment or knowledge integration infeasible. In addition, legal and licensing constraints further complicate the practical application of such merging efforts.

The fundamental incompatibilities between open and closed-source models—ranging from architecture differences to training data biases—highlight that model merging is primarily an academic exercise with limited utility in real-world scenarios. Practical approaches such as ensembling, Retrieval-Augmented Generation (RAG), or fine-tuning individual models for specific tasks are far more effective and implementable in production environments.

EmaFusion™: Inspired by how humans solve complex problems

The intuition behind EmaFusion™ lies in the analogy of assembling a team of human experts to tackle a complex problem, where each expert brings unique strengths, perspectives, and domain-specific knowledge. Instead of relying on a single generalist model, EmaFusion™ integrates outputs from a diverse set of foundation models, smaller models, and domain-specific models, enabling a collective intelligence that surpasses the capabilities of any individual component.

This approach is inherently collaborative—each model contributes its specialized understanding to produce a well-rounded, nuanced solution. Unlike model routing, which dynamically selects a single model based on the input, EmaFusion™ leverages the collective output of all models, allowing it to synthesize diverse insights and subtleties that routing to any single model would miss.

This ensures robustness, mitigates biases inherent to any one model, and enables a broader coverage of tasks and domains, making it especially valuable for complex and multidisciplinary challenges.

Hero Banner

Advantages of EmaFusion™

Unlike traditional model selection strategies that fall short in the enterprise context, EmaFusion™ introduces a dynamic model fusion architecture that optimizes the use of all available large language models (LLMs), as well as many domain specific and private models, to deliver superior results.

Dynamic Model Fusion for Optimal Performance

EmaFusion™ is the first architecture designed to combine multiple LLMs dynamically, ensuring that each agentic task is executed with minimal trade-offs. By leveraging the strengths of various models, EmaFusion™ intelligently balances accuracy, cost, and latency, providing a tailored solution for every task. This approach is especially critical in enterprise settings, where tasks can range from simple query processing to complex, multi-faceted operations.

Automatic Synthesis of Training Data

One of the core innovations of EmaFusion™ is its ability to automatically synthesize training data. Starting with a small number of seed prompt templates—representative examples of high-level enterprise tasks—EmaFusion™ generates a comprehensive dataset that captures the nuances and diversity of real-world scenarios. This data serves as the foundation for training its novel two-tier Fusion network architecture, enabling EmaFusion™ to understand and predict the best model combinations for any given task.

Two-Tier Fusion Network Architecture

The first tier of the Fusion network focuses on model prediction. It learns to anticipate which models are best suited for a specific agentic task without needing to run the task through the models themselves. This predictive capability allows EmaFusion™ to quickly and efficiently identify the optimal models, reducing the need for resource-intensive trial and error.

The second tier is where EmaFusion™ truly shines. EmaFusion™ learns how to combine the selected models in the most effective way. For example, consider a complex task like generating a detailed customer report that requires extracting information from multiple sources, summarizing key points, and providing actionable insights. EmaFusion™ might combine a model known for its data extraction accuracy with another model that excels in summarization, and yet another that specializes in generating actionable insights. The result is a cohesive output that is not only accurate but also delivered with optimal efficiency.

Flexibility to Use Any Model

EmaFusion™ is designed to be highly flexible, with an architecture that can incorporate any model, whether it’s an open-source LLM, an API-only LLM, or an LLM fine-tuned with custom training data, ensuring that enterprises are not locked into any specific set of models. Instead, with EmaFusion™, they can continuously integrate the latest advancements in LLM technology into their workflows without disrupting operations. EmaFusion™ adapts to these changes seamlessly, so that enterprises always have access to the most effective tools for their needs.

EmaFusion™: Redefining Enterprise AI

The range and dynamism of enterprise environments demand technology and automation that goes beyond what any single LLM or model can provide. Further, traditional model routing methods between LLMs also fall short of enterprise needs, where tasks can be both simple or complicated, and are best addressed by a strategic combination of public and private models that optimizes for accuracy, costs, as well as latency.

EmaFusion™ transcends the limitations of traditional model selection by offering a dynamic, intelligent fusion of models that can meet the evolving demands of enterprise agentic tasks. Its innovative architecture provides a powerful, flexible, efficient and secure solution for enterprises to harness the full potential of LLMs without being locked into any single one of them, or constrained by the typical trade-offs in accuracy, cost, and latency.