Multi-Year Energy Asset Planning Platform

energyoptimizationMILPmachine-learningforecastingpython

Multi-Year Energy Asset Planning Platform

Energy infrastructure planning requires answering three critical questions: what to build, when to build it, and how to operate it as technology costs, demand, and renewable availability evolve over decades. Traditional approaches rely on manually defined scenarios—planners compare predefined portfolios one-by-one, risking missed opportunities in the vast combinatorial space of asset mixes and timing. This platform replaces scenario enumeration with direct optimization: a mixed-integer linear programming (MILP) framework that automatically discovers cost-optimal asset schedules over 30-year horizons, explicitly tracking commissioning, operation, and retirement. Integrated with a machine learning forecasting module for renewable generation, it bridges strategic planning with operational reality. Developed as an MSE project evolution from VT1's LP-based scenario comparison, VT2 validates lifecycle-aware optimization on a 5-bus network with 11 generation assets, 4 storage units, and 3× demand growth—solved in under 10 minutes on commodity hardware.

Technical Approach

The platform uses two integrated modules to enable comprehensive energy system planning: MILP optimization for multi-year asset scheduling and XGBoost forecasting for operational renewable prediction.

MILP Optimization Framework

Binary Decision Variables form the core of the lifecycle-aware optimization. Two binary variables per asset per year enable explicit modeling of commissioning and operational status. The optimizer chooses when to commission each asset (build decisions), while operational status is automatically determined by constraint as the sum of all unexpired builds within the asset's lifespan.

Lifecycle Management couples operational status with asset retirement through lifetime-aware constraints. Each asset has a defined lifespan (4-30 years in the candidate pool), with operational status determined by unexpired commissioning decisions. An "at-most-one-build-per-lifetime" constraint prevents overlapping installations within the same lifetime window. This formulation explicitly models asset aging, replacement cycles, and capacity evolution as load grows from baseline to 3× demand by year 30 (3% annual growth years 1-15, 5% thereafter).

Annualized Cost Modeling enables fair economic comparison across assets with different lifespans. The Capital Recovery Factor (CRF) converts upfront capital costs into equivalent annual payments:

CRF = [i(1+i)n] / [(1+i)n - 1]

Where i is the discount rate (5-10%) and n is asset lifetime in years. This annualization distributes capital costs evenly across operational years, avoiding end-of-horizon distortions and enabling direct comparison of short-lived storage (10-12 years) against long-lived thermal generators (25-30 years).

Implementation example:

def compute_crf(lifetime, discount_rate):
    i, n = discount_rate, lifetime
    return (i * (1 + i)**n) / ((1 + i)**n - 1)

annual_cost = capex * compute_crf(lifetime, discount_rate)

Vectorized Constraint Formulation achieved 50-100× computational speedup versus VT1's loop-based LP implementation. Constraints for 10,000+ optimization variables (binary commissioning decisions, continuous dispatch variables, storage state-of-charge tracking) are built using NumPy array operations and passed to IBM CPLEX solver. Representative week aggregation (one week per season) reduces temporal resolution while preserving seasonal dispatch patterns, enabling multi-decade planning horizons with solve times under 10 minutes on M4 MacBook Pro.

ML Forecasting Module

XGBoost Architecture provides day-ahead photovoltaic generation forecasting through gradient boosted decision trees. Operational forecasting principles ensure no data leakage—only lagged, cyclical, and weather features available prior to forecast time are used. The model predicts 7-day horizons to validate multi-day accuracy, with Mean Absolute Error (MAE) as the primary metric.

Feature Engineering Evolution progressed through iterative improvements, each tested in isolation:

Baseline (Time features only): Hour-of-day, day-of-week encoding → R² = 0.095, inadequate for operational use. PV generation exhibits strong autocorrelation that pure time features cannot capture.

+ Lag features: Previous hour (lag1), previous day (lag24), previous week (lag168) generation values → R² = 0.973, MAE = 0.011. Autocorrelation dominates predictive power—today's generation strongly predicts tomorrow's under stable weather.

+ Weather integration: Plane-of-Array (POA) clear-sky index quantifies cloud cover impact on solar irradiance → MAE = 0.010, 8-12% improvement on non-standard days. Weather features add robustness for outlier events (sudden cloud cover, weather fronts) rather than raw accuracy.

Feature Selection via Bayesian optimization revealed 95% of predictive power concentrates in 3 features: electricity_lag1, electricity_lag24, hour_sin (cyclic encoding of hour). Recursive multi-step prediction failed due to error accumulation (output flattened after day 3), confirming direct day-ahead prediction as optimal strategy.

Feature Importance Feature importance analysis: Lagged generation values dominate predictive power, with hour_sin capturing diurnal cycles. Weather features (POA clear-sky) provide edge-case robustness.

Key technical insight: While lagged features deliver strong baseline performance through autocorrelation, physically motivated weather features (POA clear-sky index derived from irradiance models) reduce error by 8-12% on outlier days—demonstrating the value of domain knowledge in ML pipelines. The incremental MAE improvement appears marginal in aggregate metrics but proves critical for non-linear weather transitions that stress grid operations.

Results & Validation

Multi-Year Asset Deployment

The 30-year planning horizon automatically discovered optimal commissioning schedules for all candidate assets (11 generators + 4 storage units) on a 5-bus network. Demand grows 3× from baseline through compound annual rates (3% years 1-15, 5% years 16-30), imposing continuous pressure to expand capacity and replace aging assets.

Asset Timeline 30-year infrastructure roadmap: Storage deployed immediately (year 1), renewables expanded progressively, thermal capacity deferred until year 19 as demand triples. Horizontal bars span asset operational lifetimes (4-30 years), with replacements scheduled at end-of-life.

Analysis: The optimizer exhibits strong preference for early deployment of zero-fuel-cost renewable assets (solar, wind) despite shorter lifespans (25 years) compared to thermal generators (30 years). All 4 storage units commission in year 1, demonstrating economic viability beyond operational flexibility—storage enables renewable integration and peak-shaving value that outweighs 10-12 year replacement cycles. Thermal Generator 2 defers to year 19, reflecting fuel cost penalty in the objective function. By year 30, existing assets operate at maximum utilization before new commissioning—evidence of solver efficiency and load pressure driving infrastructure to capacity limits.

Generation Mix Evolution

Seasonal dispatch patterns reveal how the system adapts to load growth and renewable penetration over three decades. Winter and summer representative weeks demonstrate contrasting operational regimes.

Year 1 vs. Year 30 Comparison:

Winter Generation Mix Year 1 Winter week dispatch, Year 1: Thermal-dominated operation with nascent storage support. Renewables provide supplemental capacity during peak daylight hours.

Winter Generation Mix Year 30 Winter week dispatch, Year 30: Thermal remains primary baseload as demand triples, with expanded storage role managing ramps and renewable variability. Storage capacity increased through replacements (10-12 year cycles).

Winter analysis: Thermal generation dominates both years due to lower solar irradiance and higher heating loads. By year 30, storage exhibits pronounced charge/discharge cycling to smooth renewable ramps and defer thermal peaking. Load growth (3× baseline) drives thermal capacity to maximum sustained output, with renewables filling mid-day valleys.

Summer Generation Mix Year 1 Summer week dispatch, Year 1: Balanced mix with solar contributing during mid-day peaks. Limited storage cycling due to lower renewable penetration.

Summer Generation Mix Year 30 Summer week dispatch, Year 30: Solar dominance during daylight hours with deep storage cycling. "Duck curve" mitigation evident—storage charges during midday surplus, discharges during evening ramp. Thermal reduced to residual/peaking role.

Summer analysis: By year 30, solar generation dominates mid-day operations, creating pronounced surplus that charges storage to full capacity. Evening demand ramps trigger deep discharge cycles, mitigating the "duck curve" challenge. Thermal generators transition from baseload to peaking/residual role, operating at minimum stable levels during high-solar periods. The optimizer automatically discovered this seasonal differentiation through cost minimization—zero-fuel renewables displace thermal during high-irradiance months.

Forecasting Performance

Day-ahead PV forecasting validated across 7-day test horizon (2024-01-01 to 2024-01-07) demonstrates operational accuracy for renewable integration planning.

Forecasting Performance Day-ahead PV forecasting accuracy: Weather-enhanced XGBoost achieves MAE = 0.010 (normalized units), with 8-12% improvement over lag-only baseline on non-standard days. Error increases gradually beyond day 3 due to weather forecast uncertainty propagation.

Model Progression:

| Feature Set | MAE | R² | Key Insight | |-------------|-----|-----|-------------| | Time only | 0.070 | 0.095 | Inadequate—misses autocorrelation | | Time + lags | 0.011 | 0.973 | Strong baseline via lag1/lag24 | | + Weather (POA) | 0.010 | 0.970 | Marginal aggregate gain, significant outlier-day improvement |

Analysis: Lagged generation features (lag1, lag24) capture PV autocorrelation under stable weather, delivering R² = 0.973—sufficient for most operational scenarios. Weather features (POA clear-sky index) provide robustness for weather transitions where autocorrelation breaks down. Feature selection via Bayesian optimization confirmed 95% of predictive power concentrates in 3 features: electricity_lag1, electricity_lag24, hour_sin. The incremental weather benefit (8-12% MAE reduction on outliers) proves critical for grid stress events—sudden cloud cover or weather fronts that invalidate persistence assumptions.

Weather Feature Impact Weather feature integration: POA clear-sky index (plane-of-array irradiance normalized by clear-sky baseline) quantifies cloud cover impact, improving forecast robustness on non-standard days.

Outcomes

The platform delivers four key capabilities for energy infrastructure planning:

Unified Planning Tool: Co-optimizes investment timing and operational dispatch over 30-year horizons in a single MILP formulation. Binary variables for commissioning decisions interact with continuous dispatch variables through lifetime-aware constraints, eliminating the artificial separation between strategic planning and operational scheduling that characterizes traditional scenario-based approaches.

Automatic Asset Selection: Eliminates manual scenario enumeration—the optimizer discovers cost-optimal portfolios from the full candidate pool (11 generators + 4 storage in this validation) without analyst-defined cases. Traditional methods require defining dozens of "what-if" scenarios; MILP collapses this combinatorial space into direct optimization, surfacing solutions potentially missed by intuition-driven scenario selection.

Lifecycle Awareness: Explicit retirement and replacement scheduling aligned with technical lifespans (4-30 years) and load evolution. The annualized cost formulation (CRF) distributes capital expenditure across operational years, enabling fair comparison between short-lived storage (10-12 years) and long-lived thermal generators (30 years) while avoiding end-of-horizon distortions common in NPV-only approaches.

Operational Integration: ML forecasting module provides renewable uncertainty quantification for future stochastic planning extensions. Day-ahead PV forecasts (MAE = 0.010) establish baseline accuracy for integrating forecast error into investment optimization—enabling robust asset sizing under renewable variability.

Technical Validation

  • Planning horizon: 30 years with compound load growth (3% → 5% annual, final 3× demand)
  • Network scale: 5-bus grid, 11 generation assets (thermal/solar/wind), 4 storage units
  • Optimization variables: 10,000+ (binary commissioning + continuous dispatch + storage SoC)
  • Solve time: <10 minutes (M4 MacBook Pro, IBM CPLEX)
  • Computational speedup: 50-100× versus VT1's loop-based LP formulation
  • Forecasting accuracy: MAE = 0.010, R² = 0.973 (7-day day-ahead PV prediction)

Key Insights

Economic preference for renewables: Zero-fuel-cost solar and wind deploy early despite shorter lifespans (25 years) versus thermal (30 years). CAPEX disadvantage outweighed by operational savings—demonstrating the value of annualized cost modeling that captures lifecycle economics rather than upfront capital bias.

Storage as infrastructure: All 4 storage units commissioned year 1, demonstrating strategic value beyond operational flexibility. Storage enables renewable integration (absorbing mid-day solar surplus) and peak-shaving (deferring thermal expansion), with economic return justifying 10-12 year replacement cycles throughout the horizon.

Lagged features dominate ML accuracy: 95% of PV forecasting predictive power concentrates in 3 features (lag1, lag24, hour_sin). Weather features improve robustness on outlier days (8-12% MAE reduction) but offer marginal aggregate gains—confirming autocorrelation as dominant driver under stable conditions, with physics-based features crucial for weather transitions.

Scalability vs. tractability: Representative-week aggregation (1 week per season) enables multi-decade horizons with <10 min solve times but limits rare-event modeling (extreme weather, grid stress). Future work must balance temporal detail with computational tractability for large networks (100+ buses).

Limitations & Future Work

The current implementation demonstrates proof-of-concept lifecycle optimization but requires several extensions for production deployment:

Representative weeks aggregate temporal detail, capturing seasonal patterns but missing rare stress events (extreme weather, multi-day renewable lulls, grid contingencies). Full 8760-hour resolution becomes computationally intractable beyond 5-10 year horizons—future work must explore temporal clustering methods that preserve extremes while maintaining tractability.

Deterministic optimization assumes perfect foresight of load growth and renewable availability. Real planning requires stochastic MILP formulations that optimize under uncertainty—incorporating forecast error distributions, demand variability, and technology cost projections. The forecasting module provides groundwork (MAE = 0.010 establishes error bounds) but integration into investment optimization remains future work.

Forecasting module not yet integrated into investment optimization loop. Current workflow runs MILP with deterministic renewable profiles, then validates forecasts separately. True operational-strategic integration requires embedding forecast error into asset sizing decisions—"How much storage is needed given 10% PV forecast uncertainty?"—a stochastic programming extension.

Binary annual investments assume assets commission at year-start, ignoring gradual capacity additions common in utility-scale projects (phased wind farm build-out, modular storage expansion). Relaxing to partial-year commissioning or continuous capacity variables may better represent incremental investments while increasing problem complexity.

Future directions: Stochastic MILP for renewable/demand uncertainty propagation; integrated forecast-planning loop quantifying forecast error economic impact; ramp constraints and maintenance scheduling (currently ignored); scalability testing on 100+ bus networks; validation on real utility planning cases with historical load/renewable data.

The project demonstrates that energy infrastructure decisions benefit from simultaneous lifecycle and operational modeling—enabling planners to evaluate portfolios based on long-term economics and short-term operational constraints rather than optimizing each dimension independently.