Energy-Aware Query Routing for RAG

If different RAG configurations suit different queries, a router should be able to select the cheapest acceptable one per query—saving energy without sacrificing quality. This project formalizes that intuition as a cheapest-acceptable optimization problem and systematically evaluates 8 routing strategies. The central finding is negative but structurally informative: pre-execution features cannot predict module utility because the signal needed for routing is produced by the very computation routing aims to skip.

Problem Formulation

Given a set of 8 configurations with known energy costs, the routing objective is:

Oracle: argmin energy(c) subject to quality(c, q) ≥ threshold

The oracle router—which has access to all post-execution quality scores—achieves 75% energy savings over always-on (C7, all modules enabled). However, restricting the oracle to the non-HyDE configuration space reduces savings to 19%, confirming that most of the optimization opportunity concentrates in a single module toggle.

Router Architecture Query routing decision flow: pre-execution features inform per-query configuration selection

Module Decision Space Module activation decision space across the 8 RAG configurations

Routing Experiments

Eight routing strategies were evaluated, spanning feature engineering approaches, classifier architectures, and training formulations:

Strategy	Approach	Config Accuracy
S1: Multi-class	Direct 8-class classification	0.39
S2: Binary per-module	Independent toggle prediction	0.38
S3: Cluster + select	Energy-cluster then refine	0.40
S4: Regression	Predict quality, select cheapest passing	0.38
S5: Utility scoring	Energy-weighted quality objective	0.39
S6: Learned utility	End-to-end utility prediction	0.39
S7: Cascading	Sequential module decisions	0.38
S8: Cheapest baseline	Always select lowest-energy config	0.40

All 8 strategies converge to 0.38–0.40 configuration accuracy—no statistically meaningful improvement over the cheapest-config baseline (S8). The 60.5 percentage point gap between oracle accuracy (100%) and achieved accuracy (39.5%) is not a modeling failure but a structural property of the problem.

Quality vs Energy Quality-energy trade-off across configurations: the narrow quality spread explains why pre-execution routing signals are insufficient

The Information Gap

The structural ceiling arises from a fundamental asymmetry: module-activation routing is harder than the model routing problem studied in prior work.

Dimension	Model Routing	Module-Activation Routing
Quality gaps between options	10–30 pp	2–4 pp
Pre-execution signal strength	Strong	Weak
Decision reversibility	Final	Compositional

In model routing, a complex query visibly differs from a simple one—the quality gap between a small and large LLM is 10–30 percentage points, producing clear routing signal. In module-activation routing, quality differences between configurations are only 2–4 points. The features available before execution (query length, embedding statistics, complexity proxies) cannot distinguish queries that benefit from HyDE or reranking from those that do not, because that distinction depends on retrieval results and generation quality that have not yet been computed.

Key Findings

Oracle ceiling: 75% energy savings possible with perfect foreknowledge, but only 19% in the non-HyDE configuration space
Routing convergence: All 8 strategies achieve 0.38–0.40 accuracy regardless of approach, architecture, or training objective
Structural gap: The 60.5pp oracle-to-achieved gap reflects a fundamental information asymmetry, not insufficient modeling
Cheapest-config dominance: A zero-computation baseline matches all learned routers—routing adds complexity without benefit
Implication for RAG design: Energy optimization in modular RAG should focus on module-level efficiency (making HyDE cheaper) rather than per-query activation decisions

Outcomes

This project contributes a negative result with constructive implications. The systematic failure of 8 diverse routing strategies, combined with the information-gap analysis, establishes that per-query module activation is structurally harder than adjacent routing problems. For practitioners, the takeaway is direct: energy-aware RAG optimization should prioritize reducing module cost (cheaper HyDE, efficient reranking) over predicting module necessity. The cheapest acceptable configuration—selected statically, not per-query—remains the Pareto-optimal strategy when prediction accuracy cannot exceed the structural ceiling.