Day-Ahead PV Generation Forecasting
Day-Ahead PV Generation Forecasting
Accurate renewable generation forecasting is essential for grid operations and infrastructure planning. This module provides day-ahead photovoltaic (PV) generation prediction using gradient boosted decision trees, developed as part of a broader energy asset planning platform. The approach emphasizes operational forecasting principles—ensuring no data leakage by restricting features to information available prior to forecast time—and demonstrates that a small set of physically motivated features delivers robust predictions across both stable and volatile weather conditions.
Technical Approach
XGBoost Architecture
The model uses XGBoost gradient boosted decision trees for day-ahead PV generation forecasting. Operational forecasting principles ensure no data leakage—only lagged, cyclical, and weather features available prior to forecast time are used. The model predicts 7-day horizons to validate multi-day accuracy, with Mean Absolute Error (MAE) as the primary metric.
Feature Engineering Evolution
Feature engineering progressed through iterative improvements, each tested in isolation:
Baseline (Time features only): Hour-of-day, day-of-week encoding → R² = 0.095, inadequate for operational use. PV generation exhibits strong autocorrelation that pure time features cannot capture.
+ Lag features: Previous hour (lag1), previous day (lag24), previous week (lag168) generation values → R² = 0.973, MAE = 0.011. Autocorrelation dominates predictive power—today's generation strongly predicts tomorrow's under stable weather.
+ Weather integration: Plane-of-Array (POA) clear-sky index quantifies cloud cover impact on solar irradiance → MAE = 0.010, 8–12% improvement on non-standard days. Weather features add robustness for outlier events (sudden cloud cover, weather fronts) rather than raw accuracy.
Feature Selection
Bayesian optimization revealed 95% of predictive power concentrates in 3 features: electricity_lag1, electricity_lag24, hour_sin (cyclic encoding of hour). Recursive multi-step prediction failed due to error accumulation (output flattened after day 3), confirming direct day-ahead prediction as optimal strategy.
Feature importance analysis: Lagged generation values dominate predictive power, with hour_sin capturing diurnal cycles. Weather features (POA clear-sky) provide edge-case robustness.
Key technical insight: While lagged features deliver strong baseline performance through autocorrelation, physically motivated weather features (POA clear-sky index derived from irradiance models) reduce error by 8–12% on outlier days—demonstrating the value of domain knowledge in ML pipelines. The incremental MAE improvement appears marginal in aggregate metrics but proves critical for non-linear weather transitions that stress grid operations.
Results & Validation
Day-ahead PV forecasting validated across 7-day test horizon (2024-01-01 to 2024-01-07) demonstrates operational accuracy for renewable integration planning.
Day-ahead PV forecasting accuracy: Weather-enhanced XGBoost achieves MAE = 0.010 (normalized units), with 8–12% improvement over lag-only baseline on non-standard days. Error increases gradually beyond day 3 due to weather forecast uncertainty propagation.
Model Progression
| Feature Set | MAE | R² | Key Insight |
|---|---|---|---|
| Time only | 0.070 | 0.095 | Inadequate—misses autocorrelation |
| Time + lags | 0.011 | 0.973 | Strong baseline via lag1/lag24 |
| + Weather (POA) | 0.010 | 0.970 | Marginal aggregate gain, significant outlier-day improvement |
Lagged generation features (lag1, lag24) capture PV autocorrelation under stable weather, delivering R² = 0.973—sufficient for most operational scenarios. Weather features (POA clear-sky index) provide robustness for weather transitions where autocorrelation breaks down. The incremental weather benefit (8–12% MAE reduction on outliers) proves critical for grid stress events—sudden cloud cover or weather fronts that invalidate persistence assumptions.
Weather feature integration: POA clear-sky index (plane-of-array irradiance normalized by clear-sky baseline) quantifies cloud cover impact, improving forecast robustness on non-standard days.
Outcomes
Operational-ready forecasting: Day-ahead PV prediction with MAE = 0.010 establishes baseline accuracy for renewable integration planning. The model respects operational constraints—no future data leakage—making it deployable in real-time grid management systems.
Domain-driven feature engineering: Iterative progression from time-only (R² = 0.095) through lag features (R² = 0.973) to weather-enhanced (MAE = 0.010) demonstrates systematic model improvement. Each feature set addition was tested in isolation to quantify marginal contribution.
Uncertainty quantification groundwork: Forecast error characterization (MAE by day, feature importance ranking) provides foundation for stochastic planning extensions—enabling future integration into investment optimization where forecast uncertainty drives asset sizing decisions.
Key Insights
Lagged features dominate accuracy: 95% of predictive power concentrates in 3 features (lag1, lag24, hour_sin). Autocorrelation is the dominant driver under stable conditions, making simple persistence-based approaches surprisingly competitive.
Weather features provide robustness, not accuracy: POA clear-sky index improves MAE by 8–12% on outlier days but offers marginal aggregate gains. The value is in tail-risk reduction—exactly the scenarios that stress grid operations and drive infrastructure sizing decisions.
Recursive prediction fails: Multi-step recursive forecasting (using predicted outputs as inputs) accumulates error rapidly, with predictions flattening after day 3. Direct day-ahead prediction is the optimal strategy for operational use.
Limitations & Future Work
Not yet integrated into investment optimization. Current workflow runs MILP planning with deterministic renewable profiles, then validates forecasts separately. True operational-strategic integration requires embedding forecast error into asset sizing decisions—quantifying how forecast uncertainty affects optimal storage capacity and renewable mix.
Single-site validation. The model was trained and tested on one PV installation. Generalization across sites with different climates, panel orientations, and shading conditions requires transfer learning or site-specific retraining.
Weather forecast dependency. The POA clear-sky index relies on irradiance model accuracy. In operational deployment, weather forecast errors propagate through the feature pipeline—degrading prediction quality beyond the 3-day horizon where numerical weather prediction accuracy drops.
Future directions: Integration into stochastic MILP for forecast-error-aware asset sizing; probabilistic forecasting (quantile regression, conformal prediction) for uncertainty bounds; multi-site generalization; extension to wind generation forecasting with turbine-specific power curves.