Energy-Aware Query Routing for RAG
Energy-Aware Query Routing for RAG
If different RAG configurations suit different queries, a router should be able to select the cheapest acceptable one per query—saving energy without sacrificing quality. This project formalizes that intuition as a cheapest-acceptable optimization problem and systematically evaluates 8 routing strategies. The central finding is negative but structurally informative: pre-execution features cannot predict module utility because the signal needed for routing is produced by the very computation routing aims to skip.
Problem Formulation
Given a set of 8 configurations with known energy costs, the routing objective is:
Oracle: argmin energy(c) subject to quality(c, q) ≥ threshold
The oracle router—which has access to all post-execution quality scores—achieves 75% energy savings over always-on (C7, all modules enabled). However, restricting the oracle to the non-HyDE configuration space reduces savings to 19%, confirming that most of the optimization opportunity concentrates in a single module toggle.
Query routing decision flow: pre-execution features inform per-query configuration selection
Module activation decision space across the 8 RAG configurations
Routing Experiments
Eight routing strategies were evaluated, spanning feature engineering approaches, classifier architectures, and training formulations:
| Strategy | Approach | Config Accuracy | |----------|----------|----------------| | S1: Multi-class | Direct 8-class classification | 0.39 | | S2: Binary per-module | Independent toggle prediction | 0.38 | | S3: Cluster + select | Energy-cluster then refine | 0.40 | | S4: Regression | Predict quality, select cheapest passing | 0.38 | | S5: Utility scoring | Energy-weighted quality objective | 0.39 | | S6: Learned utility | End-to-end utility prediction | 0.39 | | S7: Cascading | Sequential module decisions | 0.38 | | S8: Cheapest baseline | Always select lowest-energy config | 0.40 |
All 8 strategies converge to 0.38–0.40 configuration accuracy—no statistically meaningful improvement over the cheapest-config baseline (S8). The 60.5 percentage point gap between oracle accuracy (100%) and achieved accuracy (39.5%) is not a modeling failure but a structural property of the problem.
Quality-energy trade-off across configurations: the narrow quality spread explains why pre-execution routing signals are insufficient
The Information Gap
The structural ceiling arises from a fundamental asymmetry: module-activation routing is harder than the model routing problem studied in prior work.
| Dimension | Model Routing | Module-Activation Routing | |-----------|--------------|--------------------------| | Quality gaps between options | 10–30 pp | 2–4 pp | | Pre-execution signal strength | Strong | Weak | | Decision reversibility | Final | Compositional |
In model routing, a complex query visibly differs from a simple one—the quality gap between a small and large LLM is 10–30 percentage points, producing clear routing signal. In module-activation routing, quality differences between configurations are only 2–4 points. The features available before execution (query length, embedding statistics, complexity proxies) cannot distinguish queries that benefit from HyDE or reranking from those that do not, because that distinction depends on retrieval results and generation quality that have not yet been computed.
Key Findings
- Oracle ceiling: 75% energy savings possible with perfect foreknowledge, but only 19% in the non-HyDE configuration space
- Routing convergence: All 8 strategies achieve 0.38–0.40 accuracy regardless of approach, architecture, or training objective
- Structural gap: The 60.5pp oracle-to-achieved gap reflects a fundamental information asymmetry, not insufficient modeling
- Cheapest-config dominance: A zero-computation baseline matches all learned routers—routing adds complexity without benefit
- Implication for RAG design: Energy optimization in modular RAG should focus on module-level efficiency (making HyDE cheaper) rather than per-query activation decisions
Outcomes
This project contributes a negative result with constructive implications. The systematic failure of 8 diverse routing strategies, combined with the information-gap analysis, establishes that per-query module activation is structurally harder than adjacent routing problems. For practitioners, the takeaway is direct: energy-aware RAG optimization should prioritize reducing module cost (cheaper HyDE, efficient reranking) over predicting module necessity. The cheapest acceptable configuration—selected statically, not per-query—remains the Pareto-optimal strategy when prediction accuracy cannot exceed the structural ceiling.