Open AccessArticle

A Hybrid Methodology Using Machine Learning Techniques and Feature Engineering Applied to Time Series for Medium- and Long-Term Energy Market Price Forecasting

Flávia Pessoa Monteiro

^1,*

Suzane Monteiro

Carlos Rodrigues

³,

Josivan Reis

¹,

Ubiratan Bezerra

Maria Emília Tostes

and

Frederico A. F. Almeida

⁴

Oriximiná Campus, Federal University of Western Pará, Oriximiná 68270-000, Brazil

Capitão Poço Campus, Federal Rural University of the Amazon, Capitão Poço 68650-000, Brazil

Faculty of Electrical and Biomedical Engineering, Federal University of Para, Belém 66075-110, Brazil

⁴

Eletrobras, Rio de Janeiro 20091-005, Brazil

Author to whom correspondence should be addressed.

Energies 2025, 18(6), 1387; https://doi.org/10.3390/en18061387 (registering DOI)

Submission received: 6 February 2025 / Revised: 25 February 2025 / Accepted: 28 February 2025 / Published: 11 March 2025

(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

Download

Browse Figures

Figure 1
Electricity market products and their temporal segmentation across different time horizons. The figure categorizes market mechanisms into Reserves, Energy, Capacity, and New Capacity, spanning short-term (minutes to 24 hours), medium-term (months to years), and long-term (up to 35 years) periods. Dark blue arrows indicate markets present in all structures, while orange arrows represent those operating only in specific contexts. The red arrow (System Operations Delivery) highlights ongoing system stability efforts, and the large red timeline arrow illustrates the transition from real-time balancing to long-term contractual agreements. Source: Authors, adapted from [<a href="#B32-energies-18-01387" class="html-bibr">32</a>]. "> Figure 2
Systematization of spot price forecasting methodologies—Electricity Power Forecasting. Source: authors, adapted from [<a href="#B33-energies-18-01387" class="html-bibr">33</a>]. "> Figure 3
Hybrid Method for Long-Term Projection. Source: Authors. "> Figure 4
Historical LPC Series Normalized Using the MinMax Technique, Decomposed into Trend + Seasonal + Residual. Source: Authors. "> Figure 5
Input Attributes Indicated by Importance Using the XGBoost Technique, Created for the Feature Engineering Process. Source: Authors. "> Figure 6
Forecasting Components of Long-Term Energy Prices for 2024: (a) Seasonality, (b) Trend, and (c) Residuals. "> Figure 7
LPC Projection for the Period January–December 2024 in BRL/MWh (a) by week and (b) by month. "> Figure 8
Comparative Performance of SARIMAX and LSTM Models in Normalized Data Projection. "> Figure 9
(a) Trend Component Projection with SARIMAX Normalized by MinMax for the 10–Year Period and (b) Reconstructed LPC Projection with only SARIMAX model for the 10–Year Period in BRL/MWh. (c) Trend Component Project with hybrid model SARIMAX (blue) + LSTM (green) Normalized by MinMax for 10–Year. Source: Authors. "> Figure 10
Residual Projection for the 10-Year Period with Percentage Increases of 5%, 15%, and 30% for (a) Volatility, (b) 5-Month Moving Average, and (c) 12-Month Moving Average. Source: Authors. "> Figure 10 Cont.
Residual Projection for the 10-Year Period with Percentage Increases of 5%, 15%, and 30% for (a) Volatility, (b) 5-Month Moving Average, and (c) 12-Month Moving Average. Source: Authors. "> Figure 11
Forecast of the Forward Energy Curve in the Long-Term Horizon from 2024 to 2034 with a 95% Confidence Interval (blue shadow). Source: Authors. ">

Versions Notes

Abstract

In the electricity market, the issue of contract negotiation prices between generators/traders and buyers is of particular relevance, as an accurate contract modeling leads to increased financial returns and business sustainability for the various participating agents, encouraging investments in specialized sectors for price forecasting and risk analysis. This paper presents a methodology applied in experiments on energy forward curve scenarios using a set of techniques, including Long Short-Term Memory (LSTM), Extreme Gradient Boosting (XGBoost), Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors (SARIMAX), and Feature Engineering to generate a 10-year projection of the Conventional Long-Term Price. The model validation proved to be effective, with errors of only 4.5% by Root Mean Square Error (RMSE) and slightly less than 2% by Mean Absolute Error (MAE), for a time series spanning from 7 January 2012 to 31 August 2024, in the Brazilian energy market.

Keywords:

energy market forecasting; machine learning; time series decomposition; hybrid modeling; long-term energy price projection; feature engineering; LSTM; SARIMAX; XGBoost

1. Introduction

The growing concern over energy efficiency and sustainability highlights the need to develop energy saving approaches capable of enhancing the effectiveness of the decision-making process in managing the electric sector. Solutions that make it possible to understand and anticipate consumption patterns, as well as project future demands, become fundamental in terms of optimizing resource allocation and reducing operational costs. In this context, establishing advanced methods of analysis and forecasting (statistical techniques, machine learning, and computational simulations) [1,2] can enable a more rational and sustainable use of the available infrastructure, promoting greater reliability in supply and minimizing waste.

In Brazil, power generation companies need to estimate with utmost precision the amount of energy they will have available, considering climatic factors and their predominantly hydropower-based energy matrix, though not limited to this alone. It is also crucial to determine how much energy consumers will require in the future, considering at least three horizons: the short, medium, and long term.

Each forecasting horizon has its own particularities and complexities, as well as specific costs. For an energy distribution company, the key factor is to estimate as accurately as possible the required amount of energy to avoid last-minute purchases or the risk of selling surplus energy at a price lower than its acquisition cost. Moreover, there are penalties associated with forecasting errors.

Thus, the issue of negotiating contract prices between generators/traders and buyers becomes particularly relevant, as it is a key factor in defining better-structured contracts.

In Brazil, energy prices are published weekly by the Câmara de Comercialização de Energia (CCEE) through the execution of the computational models NEWAVE, DECOMP, and DESSEM. Their primary output is the Preço de Liquidação das Diferenças (PLD), the official reference price for the energy market. The PLD is used to settle surpluses and deficits among market participants in both free and regulated contracting environments. However, the variables considered in the PLD calculation, which primarily aim to reflect the system’s operational cost over short-, medium-, and long-term horizons, do not accurately match market expectations and do not account for the risk premium inherent in the energy trading process.

This decoupling is more pronounced for the medium- and long-term horizons, which are less susceptible to typical operational variables and more influenced by risk variables associated with the business [3,4,5,6,7,8,9,10].

Given this decoupling for the medium- and long-term horizons, it is necessary to establish methodologies that generate price curve projections reflecting not only the grid’s intrinsic operational aspects but also the risks associated with trading, with the goal of optimizing the modeling of contracts between market participants.

In this context, the concept of forward curves is introduced as a tool capable of capturing market expectations—such as risk premium, opportunity cost, liquidity, market concentration, and volatility—thereby providing a more precise reflection of energy trading prices [11,12,13,14,15,16].

Thus, this article presents a new methodology for forecasting energy market prices in the medium and long term, using data science techniques and feature engineering methods, adapted to the specificities of the Brazilian energy market.

2. Brazilian Energy Market: Free (ACL) and Regulated (ACR) Contracting Environments

In the Brazilian energy market, following the enactment of Law No. 10.848/2004, bilateral energy purchase and sale transactions are conducted within two distinct contractual environments: the free contracting environment (ACL), in which contracts are freely negotiated among the involved parties in accordance with the regulations established by the Câmara de Comercialização de Energia (CCEE); and the regulated contracting environment (ACR), which is subdivided into Contracts for the Commercialization of Regulated Energy (CCEAR), Physical Guarantee Quota Contracts (CCGFs), Nuclear Energy Quota Contracts (CCENs), Itaipu Contracts, PROINFA Contracts, Adjustment Auction Contracts, and Regulated Bilateral Contracts [17].

In general, the ACR is designed to meet the energy demand of captive consumers, including residential users, service providers, and small industries. In this context, energy distributors acquire power through a centralized pool system, where procurement occurs via energy auctions in which generators offering the lowest prices are awarded the contracts.

Through CCEARs, distributors are required to contract 100% of their forecasted energy demand, ensuring both supply and capacity availability. The terms and conditions governing CCEARs, along with the bilateral guaranteed mechanisms devised to ensure compliance with contractual obligations, are established by ANEEL within its regulatory authority to issue energy auction notices, in accordance with the provisions of Law No. 10.848/2004 and Decree No. 5.163/2004 [18].

This regulatory framework is characterized by a government-administered auction system, in which long-term bilateral contracts are formalized between generators and energy distributors, with the latter assuming the credit risk associated with captive consumers. There are several types of energy auctions, with the most significant being the Existing Energy Auctions, which facilitate the acquisition of power from commercially operating plants, and the New Energy Auctions, intended to promote the development and construction of new generation projects. These auctions predominate in the current market scenario, with long-term contracts extending up to 30 years [19].

On the other hand, the ACL is composed of large-scale consumers, with energy trading carried out bilaterally among generators, traders, free consumers, and special consumers. In this structure, contractual clauses—such as prices, energy volume, contract registration requirements, and financial guarantees—are freely negotiated between the parties. Consequently, the role of the CCEE is limited to recording the contract durations and energy volumes, while the financial terms remain confidential between the contracting entities [20].

Unlike the ACR, the ACL operates as an unregulated trading platform without a centralized entity responsible for overseeing bilateral contracts. However, measures have been taken to establish standardized mechanisms for contract negotiation through private entities, with the Brazilian Energy Trading Exchange (BBCE) being the most prominent [21].

Nevertheless, the volume of contracts executed through this platform remains limited compared to direct transactions between traders and consumers, which are not subject to public disclosure. This opacity in market transactions hinders participants’ ability to discern prevailing energy price trends, underscoring the need to develop robust forecasting methodologies and analytical tools aimed at enhancing the predictability of energy trading prices.

The evolution of dynamics in the Brazilian energy market suggests a sustained expansion of the ACL and a corresponding contraction of the ACR, driven by the progressive migration of corporate entities to the free contracting environment. According to the Brazilian Electric Energy Statistical Yearbook [22], based on 2022 data, energy consumption in the ACL accounted for 39.66% of Brazil’s total electricity demand, highlighting the continuous entry of new market participants and increased liquidity in energy contracts.

3. State of the Art

Several studies have been conducted in recent years to understand and model forward price curves for electricity in medium- and long-term horizons. One of the seminal works in this field [23] proposes an equilibrium model for the energy market, allowing the system to be analyzed from a supply perspective through a defined number of agents. The study examined the U.S. energy market, focusing on spot and forward rates using a discrete-time model. The authors identified a declining risk premium in the forward curve as more speculators entered the market.

Following a similar approach, reference [24] proposed constructing a forward price curve for the Nordic energy market, based on forward contract prices available for different dates within the same market. The authors noted that even in centrally dispatched electricity systems—such as in the Nordic market—it is necessary to incorporate variables reflecting market participants’ risk aversion. Consequently, the construction of forward curves must be intrinsically linked to financial models.

Focusing on the Brazilian market, reference [25] aimed to analyze market risk management in energy trading in Brazil by conducting a survey of key players in the electricity sector, including generators, traders, and consumers. The study found that companies more heavily focused on power generation displayed a higher degree of risk aversion, while trading companies tended to have greater risk exposure. Nevertheless, the sector as a whole has been improving its methodologies, as deficiencies in risk management have led to financial losses. The study also noted that although many companies still use the Settlement Price for Differences (PLD) as an input for risk management, there is a growing reliance on forward curves as a strategy for implementing value-at-risk (VaR) in energy trading.

In a related study, reference [26] applied Monte Carlo Simulation combined with Geometric Brownian Motion and Cholesky Decomposition—stochastic procedures widely used in financial and stock markets—to simulate stochastic future price curves for the Brazilian energy market, focusing on the Southeast/Midwest submarket, which accounts for the largest share of energy trading in the country. The results demonstrated robustness even during periods of high volatility in recent years, including climate, economic, and health crises that significantly impacted price fluctuations.

Exploring new methodologies for price curve formation, reference [27] focused on applying machine learning techniques to forecast monthly energy prices over a five-year horizon in California. The study utilized historical data spanning sixteen years, analyzing seventy-eight variables to determine their correlation and influence on medium- and long-term energy prices.

In another study, reference [28] implemented a semiparametric model to estimate the prices of daily duration contracts, referred to as elementary contracts, whose portfolio constitutes the delivery period of contracts observed in the market. This methodology combines the determination of the smallest components forming the delivery period of an electricity swap through a semiparametric structural model, the reduction in residual dimensionality using Principal Component Analysis (PCA), and the subsequent estimation of a time series model based on the resulting components.

The authors of reference [29] proposed a decision support system designed to optimize energy trading strategies. This system comprises integrated modules for clustering generation and price scenarios, calculating the marginal cost of expansion, pricing contracts, performing static balancing, projecting tariffs, and optimizing portfolios. In this manner, the ideal trading strategy is structured from generation forecasting to the final definition of tradable contracts.

The authors of reference [30] conducted a comparative test among four predictive models: SARIMAX (Seasonal Autoregressive Integrated Moving Average), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit), and CNN-LSTM (Convolutional Neural Networks). Weekly forecasts were carried out, and the results indicated that GRU and SARIMAX demonstrated superior accuracy compared to the other models tested.

With respect to risk management, reference [31] proposed a methodology for pricing derivative contracts—specifically quantity-based collar-type contracts (with floor, cap, and premium parameters) indexed to the PLD. This methodology establishes indifference curves for different parameter combinations, enabling the development of trading strategies for both traders and consumers, taking into account the risk profiles of both parties. To validate the methodology, the study derived three indifference curves, providing traders with a decision-support tool to aid in contractual negotiations.

When addressing price forecasting in the electricity market, two key terms frequently appear in the specialized literature: Electricity Price Forecasting (EPF) and forward curve modeling. Although both terms are related—since they involve mechanisms for forecasting electricity prices—they differ significantly in two respects:

The existence of risk premiums that vary according to the contract’s duration.
The high degree of uncertainty associated with long-term spot price forecasts [32].

Accordingly, it is important to highlight the different products and their respective time scales within the electricity market environment, as illustrated in Figure 1. In long-term markets, structures related to availability contracts for new generation units, such as Power Purchase Agreements (PPAs), are generally present. In the medium-term horizon, capacity contracts for existing plants are established, in addition to forward contracts for future energy delivery. In the short-term market, operations take place in the Day-Ahead Market (DAM) and the Intraday/Real-Time Market (IDM or RTM), where transactions of electricity and ancillary services occur.

Thus, considering the different products and the various horizons until contract maturity, fundamental distinctions can be drawn between the two central concepts in price forecasting: Electricity Price Forecasting (EPF) and forward curves.

Electricity Price Forecasting primarily refers to forecasting the spot price, i.e., the short-term market price—of electricity at a given point in time. In the Brazilian market, this price stems from the sequence of calculations performed by the NEWAVE, DECOMP, and DESSEM models, which are intrinsically linked to the system’s operational and technical fundamentals.

At the global level, the literature presents a wide range of methodologies and techniques for price determination, which can be classified into different approaches, as illustrated in Figure 2: multi-agent models, fundamental models, reduced-form models, statistical models, computational intelligence methods, and hybrid models.

Forward curves refer to curves that record the price of contracts in which energy trading is initiated prior to physical delivery and consumption, typically conducted through organized exchanges, energy markets, or even bilateral contracts between parties, generally focusing on medium- and long-term markets. Owing to this characteristic, they incorporate intrinsic market aspects into their construction, such as the risk associated with purchasing or selling a specific quantity of energy within a given time horizon [34].

In recent years, there has been a significant increase in the use of hybrid approaches for time series forecasting, combining statistical methods and computational intelligence techniques to simultaneously address linear and nonlinear factors. Several studies, for example, [35,36], point to the use of linear models like ARIMA together with nonlinear models such as ANN as a common strategy. Meanwhile, reference [37] proposes a hybrid ETS-ANN model that combines exponential smoothing with ANN to effectively capture both linear and nonlinear patterns, delivering substantial improvements in accuracy while also accounting for seasonal aspects and more complex data patterns.

Moreover, the application of feature engineering methods [38] has expanded these models’ capacity to identify relevant variables, reflecting domain-specific characteristics. In this context, the adoption of hybrid methodologies not only boosts forecasting capability for longer horizons but also provides greater robustness in volatile scenarios or when data are limited. These developments underscore the importance of exploring combinations of techniques, as proposed in this study, which integrates SARIMAX, LSTM, and XGBoost into a single approach for energy market price forecasting.

The proposed methodology was grounded in the reviewed literature to identify the most suitable techniques for solving medium- and long-term energy price forecasting problems. Hence, to model this new hybrid approach, various neural network techniques were employed, along with statistical learning methods and multiple data preprocessing techniques [39,40,41,42,43,44,45]. All the techniques tested are presented in Table 1.

Identified Research Gaps

Throughout the aforementioned studies, several approaches have been proposed for forecasting energy prices and forward curves. However, certain research challenges and gaps remain that motivate further investigation, including:

Low robustness in long-term horizons: Many methods (e.g., ARIMA, SARIMA) exhibit acceptable accuracy in short- and medium-term forecasts, but struggle to capture structural variations when projections extend over multiple years.
Limited capacity to handle high volatility: Particularly in energy markets, climatic, economic, and political factors can generate abrupt, non-stationary fluctuations, challenging models that assume more stable patterns.
Insufficient treatment of exogenous variables: Although some models incorporate exogenous regressions (such as SARIMAX), many techniques do not fully exploit external databases—such as macroeconomic indicators, climatological data, or risk indicators—which limits predictive accuracy.
Excessive reliance on manual feature engineering: In several studies, attribute extraction and selection are performed intensively and manually, requiring extensive specialized knowledge. There is still room for automated or hybrid methods (e.g., approaches based on Transformers or attention mechanisms) that reduce this need.
Lack of flexibility in hybrid modeling: A large portion of the studies focuses on combining only two techniques (for example, ARIMA + Neural Networks), without exploring more comprehensive solutions that include advanced decompositions (such as wavelets) or multiple architectures (RNNs, CNNs, Transformers) within a single workflow.
Scarcity of validations using real long-term data: Many projects analyze only limited periods or synthetic scenarios, making it difficult to verify the effectiveness of the methodologies in practical contexts over medium to long durations.

Need for solutions specific to the Brazilian electricity market: There remains a dearth of literature addressing approaches that fully adapt to the peculiarities of the Brazilian electricity market, characterized by a strong presence of hydropower generation, a regulated auction model, and the weekly publication of prices by CCEE/DCIDE since 2012. These unique aspects, combined with the growing demand for long-term projections (due to the migration of consumers to the free contracting environment), require approaches that take into account hydrological seasonality, regulatory volatility, and the sporadic fluctuations typical of the country.

4. Hybrid Model for Long-Term Forward Energy Curve Projection

The methodology applied in experiments on forward energy curve scenarios employs a set of techniques, including Long Short-Term Memory (LSTM), Extreme Gradient Boosting (XGBoost), Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors (SARIMAX), and Feature Engineering, to generate a 10-year projection of LPC (Conventional Long-Term Price).

After testing various combinations of the techniques listed in Table 1, the model with the lowest error was defined and is illustrated in Figure 3, where the SARIMAX, LSTM, and XGBoost methods were employed at each stage of the hybrid model. Additionally, the MinMax Scaler method was used for data preprocessing, scaling numerical features within a specific range, typically between 0 and 1, as expressed in Equation (1).

x_{s c a l e d} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(1)

The data acquisition was based on the DCIDE database, using the weekly time series of LPC for the period from January 2012 to December 2024. The weekly LPC data from January 2012 to December 2023 were split into 80% for training and 20% for testing the model. Meanwhile, the data from 1 January 2024 to 31 December 2024 were used for the validation phase.

For each component, during both the training and testing stages, a cross-validation technique was employed. Furthermore, for the validation process, step-by-step predictions were generated until the end of the available historical series.

Another important aspect of the methodology is the decomposition of the input variable into seasonality, trend, and residual components, allowing for an analysis of the patterns and behavior of the time series. This step is essential because LPC is a complex variable influenced by multiple endogenous and exogenous factors related to energy companies. Figure 4 illustrates an example of the time series decomposition process.

Seasonality is the least challenging of the three components to analyze, as it is responsible for regular and recurring patterns that appear at fixed time intervals, often reflecting seasonal variations in the dataset. However, while it may be considered the most predictable component, it is by no means simple, as climate fluctuations throughout the year significantly impact energy prices, adding complexity to the forecasting task.

The trend component represents the long-term movement or direction of the data, capturing the implicit and persistent behavior of the time series. Whether increasing, decreasing, or remaining stable over time, the trend reflects variations in energy consumption, cultural and economic factors, and other influences, while disregarding short-term fluctuations.

The residuals, or residual component, can be interpreted as the error component or irregularities within the data, capturing random fluctuations or noise that cannot be explained by trend or seasonality. In energy price forecasting, the residual component represents unexplained variability and the influence of unpredictable or external factors.

To project the trend component, a combination of SARIMAX and LSTM was employed. The input data for SARIMAX covered the period from January 2012 to 31 December 2023, generating a three-year projection, corresponding to the 36 months of 2024, 2025, and 2026. Subsequently, the LSTM model took as input the same period used by SARIMAX, along with its three-year projection, extending the dataset from January 2012 to December 2026. The LSTM model then produced an additional seven-year LPC projection, resulting in a total forecast of 10 years.

For the 10-year horizon projection of seasonality, only SARIMAX was used. The seasonality forecast also contributed to the 10-year residual projection, which was modeled using XGBoost. The feature importance measure, derived from the XGBoost technique, justified the application of feature engineering to the residual component, as indicated in Figure 5.

In the feature engineering stage, several transformations were performed to enrich the dataset and enhance model performance. Initially, temporal and statistical information was extracted—such as the day of the week, month, holidays, and other markers that help capture seasonalities and recurring events—along with computing window-based statistics (averages, standard deviations, and variations) to identify short-term dynamics. Simultaneously, Fourier terms were added to represent the periodic components of the series, enabling efficient modeling of seasonal cycles and other fluctuations even when multiple frequencies are present. Additionally, differentiation (such as first differencing) was applied to remove trends and stabilize the series, rendering it stationary and allowing the model to focus on the genuine variations in the data by eliminating long-term effects.

The selection of the most relevant attributes was conducted through quantitative analyses, leveraging an intrinsic feature of regression trees—namely, feature importance. Among the key input attributes for the residual component, analyzed through feature engineering, were:

Lags from the 4th to the 13th input;
Volatility (3 weeks);
Moving average (5 and 12 weeks);
One-Hot Vector (month);
Projected seasonality (output from SARIMAX);

After projecting each component of the time series, the recomposition is performed by combining trend + seasonality + residual.

The 5-period moving average (rolling_mean_5) stood out for capturing short-term moving average patterns, indicating its relevance in detecting recent fluctuations in the data. This attribute exhibited the highest relative importance, exceeding 30%, among all analyzed features.

The 4-Period Lag (lag_4) represents the influence of historical data from four weeks prior, highlighting the importance of recent events in the behavior of residuals. This attribute contributed approximately 20% to the residual component.

Following this, the 11-Period Lag (lag_11) and Month 5 (month_5) ranked third in importance for the residual component, each contributing approximately 6%. The 11-Period Lag demonstrated the significance of longer-term temporal dependencies, suggesting that distant past events still impact residual patterns. Similarly, events occurring in May were also identified as influential.

The fifth most important attribute for the residual component, with around 4% relevance, was the seasonality projection, obtained using the SARIMAX technique previously discussed. Lastly, volatility, with just over 3% importance, was identified as another key feature.

Volatility measures the degree of variation or dispersion of values around a mean, quantifying the level of uncertainty or fluctuations in the data over time. Within a residual time series, volatility represents the noise level or unpredictable variations that cannot be explained by trend or seasonality components.

Regarding the forecasting model architecture, Table 2, Table 3 and Table 4 present the architectural details of each model used:

LSTM network configuration for each layer used in trend projection;
Parameter settings of the XGBoost method for residual projection;
SARIMAX configurations for the first three years of trend projection and seasonality modeling.

Although the focus is on the Brazilian electricity market, the same methodological framework—based on time series decomposition (trend, seasonality, and residuals), hybrid forecasting techniques, and targeted feature engineering—can be extended to other domains dealing with multiple seasonalities, exogenous shocks, and extended forecasting horizons. To adapt our approach to other datasets, it is essential to consider the adopted temporal granularity. Since the model was developed using weekly data, all the utilized variables (including those in the residual component projection, such as moving averages and volatility indices) are also calculated on this time scale. Consequently, if another context requires daily, monthly, or annual data, it will be necessary to adjust both the decomposition method (for example, redefining the seasonal components) and the computation of input variables, which may require moving averages or volatility indicators specific to the new frequency.

5. Results

As a result, this paper presents the performance of the proposed methodology, outlining: (1) the validation phase, based on the available time series data, (2) the validation of the hybrid model for trend component projection, (3) the use of the residual component and its input attributes as a tool for refining the forecast, and (4) the 10-year projection of the Conventional Long-Term Price.

5.1. Validation Phase of the Hybrid Methodology

It is important to highlight that, due to the limited availability of input data, it was not possible to conduct validation for the entire projection period. This is because Conventional Long-Term energy pricing data are only available from January 2012 to December 2024, providing only 12 years of historical data. Given that it is highly problematic to use just two years of data for the training and testing phases of a long-term forecasting model (10 years), the available dataset does not allow for full validation over the proposed projection period.

To demonstrate the effectiveness of the proposed hybrid methodology in forecasting the input variable over a 10-year horizon, the validation phase for the first year (2024) is described below. This validation was conducted for the seasonality, trend, and residual components, as illustrated in Figure 6.

Figure 6a presents the seasonality component, projected exclusively using the SARIMAX technique. The blue curve represents the mean forecast, while the green curve represents the actual values for the long-term energy price decomposition, as provided by DCIDE under the Conventional Long-Term Price dataset. The visual comparison between the blue and green curves reveals a close identification of seasonal patterns with minimal deviation, remaining within the 95% confidence interval throughout the entire projection. This confirms that the annual behavior of energy prices is well captured.

Figure 6b presents the trend component, projected using the hybrid SARIMAX and LSTM model. As in the previous pattern, the blue curve represents the mean forecast, while the green curve corresponds to the actual values for the trend component, derived from the long-term energy price decomposition, along with the 95% confidence interval. Given that the data presented are normalized, a slight upward trend can be observed between January and December 2024. Additionally, a similar behavior between the blue and green curves is noticeable, where the actual data followed the forecast within the confidence interval throughout the entire graph.

Figure 6c illustrates the residual component, which represents the unexplained variations, neither captured by seasonality nor by trend. The residuals contain complex, nonlinear patterns, modeled using XGBoost, as indicated by the green curve representing the actual data. Among the three components, residual was the only one in which the actual values fell outside the 95% confidence interval for most of the graph. However, the methodology successfully captured the overall upward and downward trends of the actual data throughout the year.

The performance evaluation of the hybrid energy price forecasting model for medium- and long-term projections is conducted using the MAE, RMSE, and MSE metrics, which are widely used in time series analysis.

The Mean Absolute Error (MAE) calculates the average of absolute deviations between actual and predicted values, making it useful for quantifying the overall accuracy of projections without disproportionately penalizing larger deviations.

The Root Mean Square Error (RMSE) assigns greater weight to larger errors, identifying significant discrepancies between actual and predicted values. This characteristic makes RMSE an essential metric in applications where large discrepancies can compromise predictive analysis.

The Mean Squared Error (MSE), similar to RMSE, further penalizes large errors, measuring the mean of squared differences between predicted and observed values. A higher MSE indicates that the model is less precise in areas with greater deviations, a crucial characteristic for long-term projections.

Table 5 shows that XGBoost achieves the lowest error indices across all metrics, suggesting higher forecast accuracy compared to the other methods. SVR ranks second, with slightly higher errors, yet still lower than those of LSTM and SARIMAX. LSTM exhibits median performance, while SARIMAX records the worst results, indicating that for the given dataset and forecasting horizon, machine learning-based models (XGBoost and SVR) proved more effective than purely statistical approaches.

Table 6 presents the performance evaluation metrics for the normalized projection components: trend, residual, and seasonality, among which seasonality demonstrated the best results.

With MAE in the fourth decimal place, RMSE in the third, and MSE in the sixth decimal place, the hybrid methodology successfully achieved optimal seasonality values using SARIMAX. The next best-performing component was the residual, where feature engineering combined with XGBoost resulted in MAE and RMSE values in the second decimal place, and an MSE as low as 0.000278. The trend component exhibited similar performance, producing values in the same decimal range as the residual component.

With the projections of the three components completed, the next step is recomposition, returning to the full input variable, Conventional Long-Term Price (LPC). Figure 7 presents the recomposed LPC for the one-year projection and the validation of the proposed hybrid methodology.

Figure 7a displays the forecast curves (in blue) and the actual LPC prices (in green) for the period from January to December 2024, expressed in BRL/MWh. It can be observed that the forecast closely followed the actual values for most of the series, with minor discrepancies, particularly in February, March, and the last quarter. The forecast error for this period was only 3.04%.

Figure 7b illustrates the accumulated monthly values, providing insight into the monthly trends and forecast errors. This visualization enhances the understanding of the previously described behavior, where the greatest discrepancies occurred in February, March, and April, as well as in October and November.

The performance evaluation metrics indicate an adequate accuracy for the Conventional Long-Term Price projection, achieving: MAE: 0.03863, RMSE: 0.04580, and MSE: 0.00209.

5.2. Validation of the Hybrid Model for the Trend Component in the Long-Term Projection Horizon

Among the various techniques for long-term energy price forecasting, the methods that yielded the best performance were SARIMAX and the recurrent neural network LSTM. However, differences in the accuracy of these techniques were observed.

The comparative study between SARIMAX and LSTM was conducted over the same validation period as the hybrid methodology, limited by the scarcity of input data, meaning the validation was performed for the period between January and December 2024.

As shown in Figure 8, for the first year of the projection, the SARIMAX model (red curve) closely followed the actual data (blue curve), demonstrating significantly better performance compared to LSTM (green curve). It is important to note that the y-axis represents normalized data using MinMax scaling.

Following this, the analysis proceeded with the technique that showed the best results for medium- and long-term forecasting, which aligns with the projection window proposed in this study. Figure 9 illustrates the 10-year projection result using only SARIMAX, focusing on the trend component, which is the primary subject of this analysis.

As observed in Figure 9a, the SARIMAX technique was unable to sustain a projection for the entire 10-year horizon. A noticeable decline began around the third year (approximately month 36), indicating that SARIMAX alone is not sufficient for long-term forecasts.

Figure 9b presents the 10-year projection of the reconstructed LPC variable, where the Trend (SARIMAX), Seasonality (SARIMAX), and Residual (XGBoost) components were summed. As expected, the prominent downward trend observed in Figure 9a persisted in the full LPC projection beyond the third year.

In Figure 9c, the red plus marker highlights the transition point between the SARIMAX model (blue) and the LSTM model (green) predictions. This point indicates the shift in modeling approach, where SARIMAX captures the initial trend, and LSTM takes over the subsequent projection. The choice of this junction point can impact the continuity and accuracy of the forecast over the projected period.

As demonstrated, SARIMAX alone does not provide a reliable 10-year forecast. Therefore, the next step was to integrate the second-best technique among those tested, as listed in Table 1, namely the recurrent neural network LSTM. A 10-year long-term projection, incorporating this approach, will be presented in the Section 5.4.

5.3. Residual Component as a Tool for Adjustments in the Projection

Another important analysis involves the impact of the residual component in the projection, demonstrating that, by manipulating input variables such as volatility, 5-month moving average, and 12-month moving average, it is possible to create different projection scenarios. Each parameter plays a specific role in how short- and long-term fluctuations and patterns are interpreted.

Figure 10 illustrates the 10-year projection of the residual component, incorporating percentage variations of 5%, 15%, and 30% in three input attributes: volatility, 5-month moving average, and 12-month moving average. In all three graphs, the blue curves represent a 5% increase, the green curves represent a 15% increase, and the red curves represent a 30% increase.

Figure 10a depicts volatility, which measures the variability or dispersion of values around a mean, quantifying the degree of uncertainty or fluctuations in the data over time. Notably, the 15% increase (green curve) had a more significant impact compared to the other two variations, while the 5% (blue) and 30% (red) variations exhibited similar behaviors. The 5% increase (blue curve) resulted in stronger short-term variations, particularly within the first three years of the projection, whereas the 30% increase (red curve) did not show significant impacts at any period.

An increase in volatility may indicate external factors, such as economic events, market changes, unexpected seasonalities, or exogenous shocks. For example, an economic crisis may increase systemic uncertainty, while climate variations could introduce unexpected fluctuations in environmental data, commodity prices, or energy consumption patterns.

Figure 10b illustrates the impact of the 5-month moving average, a smoothing technique that calculates the mean of the last five months at each point. This attribute is particularly useful for capturing short-term trends and recent fluctuations in the data. In its graph, the 30% increase, represented by the red curve, exhibited a greater uncertainty impact than the other variations, both in the initial years and toward the end of the projection period.

Similarly to the 5-month moving average, the 12-month moving average is also relevant as it smooths variations over a full year, capturing long-term patterns and annual trends. This approach is crucial for identifying structural movements in time series, such as overall growth or decline.

Increasing the 12-month moving average may simulate a scenario of strong economic growth, allowing for the assessment of optimistic forecasts or system responses under structural improvements. Figure 10c shows that the 5% (blue) and 15% (green) increases had a greater impact within the first three years of the projection, with fluctuations re-emerging in the last two years.

Thus, an increase in volatility can be interpreted as a means to simulate pessimistic scenarios, while an increase in the 5-month moving average evaluates short-term behaviors, and the 12-month moving average focuses on long-term trends. Modifying these attributes influences the residual component, capturing effects not accounted for by the trend and seasonality components. This approach presents an opportunity to generate realistic, pessimistic, and optimistic forecasting scenarios.

5.4. 10-Year Projection of the Conventional Long-Term Price

To demonstrate that the proposed hybrid methodology is capable of producing long-term forecasts, Figure 11 presents the LPC projection for the 10-year period from 2024 to 2034, including a 95% confidence interval. Additionally, the one-year validation period for 2024 is shown, with actual data incorporated into the green curve.

As input parameters, the following values were used: volatility (10%), 5-month moving average (30%), and 12-month moving average (30%). These values represent a scenario in which external factors negatively impact the data, predicting economic downturns, market shifts, or unexpected seasonal variations over both short- and long-term horizons.

A slight decline is observed toward the end of the third year, which was expected and previously explained. Starting in January 2027, the last seven years of the Trend component are projected using the LSTM technique, leading to moderate growth until mid-2029, followed by seasonal upward and downward patterns.

Although it is not possible to validate the entire 10-year projection, the graph suggests that the hybrid methodology remains stable, providing a viable long-term forecast within the stipulated time frame.

It is essential to highlight that, for methodology validation purposes, the available dataset does not allow for a full 10-year validation due to the limited number of historical samples. Therefore, based on the MAE, RMSE, and MSE validation results for the analyzed period, it is recommended to conduct updates at least semi-annually. This approach enables users to continuously monitor the forecast, update the model when possible, and ensure its robustness and reliability over time.

The combination of SARIMAX, LSTM, and XGBoost proved to be the most effective in our energy price forecasting experiments due to the complementary properties of each method. SARIMAX effectively handles seasonalities and linear trends, while also incorporating exogenous factors relevant to the electric sector, such as climatic or macroeconomic indices. LSTM, on the other hand, can capture long-term dependencies and highly nonlinear relationships—an essential feature in markets subject to supply and demand shocks that may persist for months. Finally, XGBoost is employed to refine the projection on the portion of residuals not explained by the previous models, benefiting from variables generated through feature engineering (such as moving window volatility, specific lags, etc.) and efficiently exploring nonlinear interactions with both speed and accuracy.

6. Final Considerations

This paper presented a new methodology for medium- and long-term energy market price forecasting, utilizing data science techniques and feature engineering tailored to the specific characteristics of the Brazilian energy market.

A weekly time series was used due to the Brazilian market’s pricing structure, where energy prices are published weekly by the Câmara de Comercialização de Energia (CCEE) and DCIDE, covering a time window from January 2012 to December 2024.

The proposed methodology was applied to forward energy curve scenario experiments by decomposing the input variable LPC into three components: trend, seasonality, and residual. A set of techniques was applied to each component.

For the trend component, the methodology first utilized SARIMAX for a three-year projection, which then served as input for a Long Short-Term Memory (LSTM) recurrent neural network, generating an additional seven-year forecast, resulting in a 10-year projection horizon—a period considered long-term in Brazilian energy pricing scenarios.

For the seasonality component, SARIMAX was employed. This component was also used as an input feature for XGBoost, following feature engineering techniques, alongside other attributes such as volatility (3 months), One-Hot Vector (month), and others. The selected attributes were determined based on the feature importance metric generated by XGBoost.

Due to the limited time series window available, there was insufficient data for a full 10-year validation. However, the model validation proved effective, yielding an error of only 0.03863 (MAE), 0.04580 (RMSE), and 0.00209 (MSE). To overcome this limitation, as new market data emerge, we recommend recalibrating the model every six months, as this interval strikes a balance between two opposing needs: (i) promptly incorporating market information, and (ii) avoiding computational overload and frequent instabilities. However, it is important to note that the exact choice of this period may vary based on factors such as the pace of market changes, the availability of high-quality data, and the user’s computational resources. For cases of greater volatility, for instance, it might be desirable to update the model quarterly or even monthly, provided that the new sample set is sufficiently representative to enhance the model’s predictive capability. Thus, each application should define the ideal retraining frequency based on robustness tests and the specific dynamics of the sector in question.

Although the present research focuses on methods such as SARIMAX, LSTM, and XGBoost, there has been a significant increase in the use of Transformer-based architectures, such as the Temporal Fusion Transformer and Informer, for time series forecasting. These models stand out for their ability to capture long-range patterns and handle multiple variables, integrating attention mechanisms for more accurate projections. Simultaneously, wavelet transforms allow for the decomposition of signals into different frequency components, revealing seasonal variations and hidden trends, which can improve accuracy by reducing noise and facilitating the extraction of relevant features. Moreover, these attention mechanisms can sometimes eliminate the need for manual feature engineering. Finally, future research will explore how these approaches can further enhance time series forecasting outcomes.

We acknowledge the importance of a more comprehensive comparison of the proposed hybrid methodology with other classic and recent techniques for electricity price forecasting, going beyond the residual component alone. In future work, we intend to develop a specific study that compares our hybrid model with widely used methods such as ARIMA, Exponential Smoothing, and various machine learning approaches. This quantitative and qualitative comparison will provide an even more robust view of the advantages and limitations of each approach, broadening the discussion on effective solutions for electricity price forecasting over medium- and long-term horizons.

The relevance of this study lies in the development of a methodology capable of delivering long-term energy price projections, a topic that remains underrepresented in both academic and market research. Long-term energy price forecasts are still largely based on market sentiment, relying on the experience and intuition of forecasting professionals, who require robust and reliable tools to guide their analyses, given the significant financial and operational consequences of their predictions. Additionally, the methodology enables scenario creation, allowing market participants to test different strategic outlooks.

It is crucial to highlight that the financial savings from accurate forecasts are inversely proportional to forecasting errors in medium- and long-term energy projections. The greater the model’s accuracy, the better the energy pricing process, facilitating more precise sales aligned with real market conditions and mitigating additional costs stemming from calculation errors or penalties imposed on energy market participants.

Although the results obtained are promising, there are risks associated with practical deployment, especially in an environment of constant regulatory and economic change. One of the main challenges lies in the potential occurrence of behavioral deviations or abrupt regime shifts not captured by historical data, which may reduce the model’s accuracy when new scenarios emerge. Additionally, overfitting issues can arise in non-stationary time series, as complex models may excessively adapt to patterns in the training period and lose their generalizability over time.

With these precautions in mind, the approach presented here can serve as a starting point for long-term energy trading scenarios, offering a flexible framework that can be customized to the characteristics of other markets and sectors. However, any real-world application should be accompanied by periodic analyses and continuous adjustments, as energy prices are subject to various exogenous and endogenous factors that could suddenly alter market dynamics.

Author Contributions

Conceptualization, F.P.M., S.M. and U.B.; Methodology, F.P.M., S.M., U.B. and M.E.T.; Software, F.P.M.; Validation, F.P.M., C.R., J.R. and U.B.; Formal analysis, F.P.M. and S.M.; Investigation, F.P.M., S.M. and C.R.; Resources, F.P.M.; Data curation, F.P.M. and J.R.; Writing–original draft, F.P.M. and S.M.; Writing–review & editing, F.P.M., S.M., U.B. and M.E.T.; Visualization, F.P.M.; Supervision, U.B., M.E.T. and F.A.F.A.; Project administration, F.A.F.A.; Funding acquisition, F.A.F.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by CENTRAIS ELÉTRICAS BRASILEIRAS S.A. ELETROBRAS grant number PD-00394-2112/2021. And The APC was funded by the same company.

Data Availability Statement

The data used as input for the model were made available by the consulting firm DCIDE on its website (https://www.dcide.com.br/).

Conflicts of Interest

All authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACL	Free Contracting Environment
ACR	Regulated Contracting Environment
ANEEL	Brazilian Electricity Regulatory Agency
BBCE	Brazilian Energy Trading Exchange
CCEE	Electric Energy Commercialization Chamber
CCEAR	Regulated Energy Commercialization Contracts
CCEN	Nuclear Energy Quota Contracts
CCGF	Physical Guarantee Quota Contracts
DAM	Day-Ahead Market
DCIDE	Energy Market Intelligence Database
EPF	Electricity Price Forecasting
ETS	Exponential Smoothing State Space Model
GARCH	Generalized Autoregressive Conditional Heteroskedasticity
GRU	Gated Recurrent Unit
IDM	Intraday Market
LPC	Conventional Long-Term Price
LSTM	Long Short-Term Memory
MA	Moving Average
MAE	Mean Absolute Error
MSE	Mean Squared Error
NEWAVE	Brazilian Long-Term Hydrothermal Dispatch Model
PCA	Principal Component Analysis
PLD	Settlement Price of Differences
PROINFA	Brazilian Incentive Program for Alternative Energy Sources
RMSE	Root Mean Square Error
RNN	Recurrent Neural Networks
RTM	Real-Time Market
SARIMAX	Seasonal AutoRegressive Integrated Moving Average with eXogenous Regressors
STL	Seasonal-Trend Decomposition Using LOESS
SVR	Support Vector Regression
XGBoost	Extreme Gradient Boosting

References

Bastos, J.P.; Cunha, G.R.A.; Barroso, L.A. Uma Metodologia Para A Separação Da Comercialização De Energia E Lastro No Brasil Através Da Captura Do Valor Econômico Da Escassez No Mercado De Eletricidade. In Proceedings of the XXIV Seminário Nacional de Produção e Transmissão de Energia Elétrica (SNPTEE), Curitiba, Paraná, Brazil, 22–25 October 2017. [Google Scholar]
Marques, N.L.; Brandão, L.E.T.; Gomes, L.L.; Rosa, V.C.V. Modelo De Apoio À Tomada De Decisão Sobre Operações De Hedge Na Comercialização De Energia Elétrica. In Proceedings of the XXVI Seminário Nacional de Produção e Transmissão de Energia Elétrica (SNPTEE), Rio de Janeiro, Brazil, 24–27 October 2022. [Google Scholar]
Felizatti, H.L.; Hansen, P.M.; Hotta, L.K.; Herencia, M.E.Z. Curva Forward no Mercado de Energia Elétrica Brasileiro: Construção, Modelagem, Previsão e Simulação. 2019. Available online: https://dcide.com.br/wp-content/uploads/2019/06/%C3%9Altimo-Captura-e-Modelagem-de-pre%C3%A7os-no-mercado-de-energia.pdf (accessed on 6 February 2025).
Pappas, F.; Santos, M.L.L. Poder de mercado na formação de preços via oferta: Análise de fatores de influência e métricas. In Proceedings of the XXV Seminário Nacional de Produção e Transmissão de Energia Elétrica (SNPTEE), Belo Horizonte, Brazil, 10–13 November 2019. [Google Scholar]
Castro, N.; Brandão, R.; Hubner, N.; Dantas, G.; Rosental, R. A formação do preço da energia elétrica: Experiências internacionais e o modelo brasileiro. In Texto de Discussão do Setor Elétrico n° 62; GESEL-UFRJ: Rio de Janeiro, Brazil, 2014. [Google Scholar]
Cheng, F.; Fan, T.; Fan, D.; Li, S. The prediction of oil price turning points with log-periodic power law and multi-population genetic algorithm. Energy Econ. 2018, 72, 341–355. [Google Scholar] [CrossRef]
Xue, W.; Li, C.; Mao, X.; Li, X.; Zhao, L.; Zhao, X. Medium and Long Term Load Forecasting of Regional Power Grid in the Context of Economic Transition. In Proceedings of the 2018 2nd IEEE Conference on Energy Internet and Energy System Integration (EI2), Beijing, China, 20–22 October 2018. [Google Scholar] [CrossRef]
Porto, T.; Sacchi, R.; Relatório do Grupo Temático–Mecanismos de Formação de Preço. Grupo de Trabalho Modernização do Setor Elétrico, Ministério de Minas e Energia. 2019. Available online: https://www.gov.br/mme/pt-br/assuntos/secretarias/secretaria-executiva/modernizacao-do-setor-eletrico/arquivos/pasta-geral-publicada/formacao-de-precos.pdf (accessed on 6 February 2025).
Zitzler, E.; Thiele, L. An Evolutionary Algorithm for Multiobjective Optimization: The Strength Pareto Approach. In Technical Report 43, Computer Engineering and Communication Networks Lab (TIK); Swiss Federal Institute of Technology (ETH): Zurich, Switzerland, 1998. [Google Scholar]
Camargo, L.A.S.; Guarnier, E.; Ramos, D.S. Estratégia Ótima De Contratação Para Consumidores Livres, Como Trade Contratação Imediata E Postergação De Decisão, Ponderando Incertezas Nos Preços De Curto Prazo E Na Precificação De Contratos Bilaterais. In Proceedings of the XXIII Seminário Nacional de Produção e Transmissão de Energia Elétrica (SNPTEE), Curitiba, Paraná, Brazil, 22–25 October 2017. [Google Scholar]
Castro, N.; Câmara, L.; Castro, B. Expansão do Mercado Livre e as Distribuidoras de Energia Elétrica; GESEL-UFRJ: Rio de Janeiro, Brazil, 2020. [Google Scholar]
Gomes, L.C.S.; Pereira, R.B.; Ekel, P.; Conceição, G.L.; Santos, A.A.M.; Andrade, A.S.; Filho, L.B.G.; Lima, M.O.; Fonseca, R.M.; Mendonça, M.O.; et al. Avaliação De Risco De Exposição Ao Mercado De Curto Prazo Na Aquisição De Novos Ativos De Energia. In Proceedings of the XXVI Seminário Nacional de Produção e Transmissão de Energia Elétrica (SNPTEE), Rio de Janeiro, Brazil, 15–18 May 2022. [Google Scholar]
Manita, S.; Hourly Electricity Consumption and Production. Kaggle Dataset. 2023. Available online: https://www.kaggle.com/datasets/stefancomanita/hourly-electricity-consumption-and-production?resource=download (accessed on 6 February 2025).
Santos, R.R.; Figer, V.; Transição Energética: Brasil Dribla a Crise Hídrica, mas Permanece em Alerta em Relação a Conta de luz. Portal FGV. 2022. Available online: https://portal.fgv.br/artigos/transicao-energica-brasil-dribla-crise-hidrica-mas-permanece-alerta-relacao-conta-luz (accessed on 6 February 2025).
CCEE–Câmara de Comercialização de Energia. Regras de Comercialização, Módulo 05–Contratos. 2023. Available online: https://www.ccee.org.br/ (accessed on 6 February 2025).
Castro, N.J.; Brandão, R. Mercado Elétrico e Risco Financeiro; PUBL!T Soluções Editoriais: Rio de Janeiro, Brazil, 2021. [Google Scholar]
Castro, N.J.; Brandão, R.; Machado, A.; Gomes, V. Contribuições para o aperfeiçoamento do mercado atacadista de energia brasileiro. In Texto de Discussão do Setor Elétrico n° 77; GESEL-UFRJ: Rio de Janeiro, Brazil, 2017. [Google Scholar]
Castro, N.J.; Brandão, R.; Machado, A.; Gomes, V. Reflexões sobre o mercado brasileiro de energia elétrica no atacado e a crise financeira recente. In Texto de Discussão do Setor Elétrico n° 74; GESEL-UFRJ:: Rio de Janeiro, Brazil, 2017. [Google Scholar]
Brandão, R. Novos Desenhos de Mercado para a Comercialização de Energia Elétrica; Fundação Coge: Rio de Janeiro, Brazil, 2019; Available online: https://gesel.ie.ufrj.br/wp-content/uploads/2019/09/49_Roberto-Brandao.pdf (accessed on 6 February 2025).
EPE. Anuário Estatístico de Energia Elétrica 2023; EPE-Empresa de Pesquisa Energética: Rio de Janeiro, Brazil, 2023. Available online: https://www.epe.gov.br/sites-pt/publicacoes-dados-abertos/publicacoes/PublicacoesArquivos/publicacao-160/topico-168/anuario-factsheet.pdf (accessed on 6 February 2025).
Bessembinder, H.E.; Lemmon, M. Equilibrium pricing and optimal hedging in electricity forward markets. J. Financ. 2002, 57, 1347–1382. [Google Scholar] [CrossRef]
Fleten, S.E.; Lemming, J. Constructing forward price curves in electricity markets. Energy Econ. 2003, 25, 409–424. [Google Scholar] [CrossRef]
Hansen P., M.; Cabral, R.S.; Felizzati, H.L.; Rosa, L.F.S.C.; Sacchi, R.; Maciel, D.; Barros, L.A. Gestão De Risco Na Comercialização De Energia: Situação Atual E Proposta De Melhores Práticas. In Proceedings of the XVIII Seminário de Planejamento Econômico-Financeiro do Setor Elétrico, Rio de Janeiro, Brazil, 10–12 September 2015. [Google Scholar]
Amadeu, J.R.A. Simulação Estocástica de Preços no Mercado Brasileiro de Energia Elétrica. Master’s Dissertation, Universidade Estadual de Campinas, São Paulo, Brazil, 2020. [Google Scholar]
Yousefi, A.; Sianaki, O.A.; Sharafi, D. Long-Term Electricity Price Forecast Using Machine Learning Techniques. In Proceedings of the IEEE PES Innovative Smart Grid Technologies Asia, Bucharest, Romania, 29 September–2 October 2019. [Google Scholar]
Monteiro, M.D.; de Aguiar, A.S.; Chávarry, I.S.S.M.; Freire, A.C.B.; Fernandes, C.A.C.; Valladão, D.M.; de Moraes, G.M.B. Ferramenta Computacional Para Modelagem E Previsão Probabilística De Curvas Forward De Eletricidade. In Proceedings of the XXVI Seminário Nacional de Produção e Transmissão de Energia Elétrica (SNPTEE), Rio de Janeiro, Brazil, 15–18 May 2022. [Google Scholar]
Balan, M.H.; Rosa, P.S.; Fenili, M.; Camargo, L.A.S.; Biase, R.G.; Ramos, D.S.; Dias, M.M. Sistema Integrado De Apoio À Decisão Para Definição Da Estratégia Ótima De Comercialização De Energia Elétrica De Um Agente Gerador. In Proceedings of the XXVI Seminário Nacional de Produção e Transmissão de Energia Elétrica (SNPTEE), Rio de Janeiro, Brazil, 24–27 October 2022. [Google Scholar]
Mendonça, M.O.; Pinto, P.H.A.; Santos, F.F.G.; Pires, D.S.C.; Vieira, D.A.G.; Lobato, M.V.C.; Silva, G.R.L.; Saldanha, R.R.; Resende, G.D.; Santiago, F.P.; et al. Análise Comparativa Entre Modelos De Inteligência Computacional Para Previsão Do Preço Futuro No Mercado De Energia Brasileiro. In Proceedings of the XXV Seminário Nacional de Produção e Transmissão de Energia Elétrica (SNPTEE), Belo Horizonte, Brazil, 10–13 November 2019. [Google Scholar]
Carvalho, G.P.; Viana, A.G.; Mello, J.C. Os Contratos Derivativos E O Setor Elétrico–Uma Solução Legítima Para A Segurança Do Mercado Nacional. In Proceedings of the XXVI Seminário Nacional de Produção e Transmissão de Energia Elétrica (SNPTEE), Rio de Janeiro, Brazil, 15–18 May 2022. [Google Scholar]
Lamps–Laboratory of Applied Mathematical Programming and Statistics. Ferramentas Analíticas Para Previsão de Preços de Contratos Futuros de Energia (Curva Forward) em Mercados Hidrotérmicos com Preços Spot Horários. Relatório de Projeto de Pesquisa & Desenvolvimento ANEEL. 2023. Available online: http://www.lamps.ind.puc-rio.br/projeto/pd-aneel-eneva-curvas-forward-para-mercados-de-eletricidade/ (accessed on 6 February 2025).
IEA–International Energy Agency. Re-powering markets: Market design and regulation during the transition to low-carbon power systems. Electricity Market Series, Paris. 2016. Available online: https://ndcpartnership.org/toolbox/re-powering-markets-market-design-and-regulation-during-transition-low-carbon-power-systems (accessed on 6 February 2025).
Nametala, C.A.L. Redes Neurais Atencionais Aplicadas à Modelagem e Previsão de Preços no Mercado de Eletricidade Brasileiro. Ph.D. Thesis, Escola de Engenharia de São Carlos da Universidade de São Paulo, São Paulo, Brazil, 2023. [Google Scholar]
Viana, A.G. Leilões Como Mecanismo Alocativo Para um Novo Desenho de Mercado no Brasil. Ph.D. Thesis, Universidade de São Paulo-Escola Politécnica, São Paulo, Brazil, 2018. [Google Scholar]
Anflor, C.T.M. Otimização Evolucionária e Topológica em Problemas Governados Pela Equação de Poisson Empregando o Método dos Elementos de Contorno. Ph.D. Thesis, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, 2007. [Google Scholar]
Boente, A.N.P.; Oliveira, F.S.G.; Rosa, J.L.A. Utilização de Ferramenta de KDD para Integração de Aprendizagem e Tecnologia em Busca da Gestão Estratégica do Conhecimento na Empresa. In Proceedings of SEGeT-Simpósio de Excelência em Gestão e Tecnologia, Rio de Janeiro, Brazil, 19–21 October 2007; Centro Universitário Estadual da Zona Oeste-UEZO: Rio de Janeiro, Brazil, 2007. [Google Scholar]
Chen, Z.; Li, C.; Sun, W. Bitcoin price prediction using machine learning: An approach to sample dimension engineering. J. Comput. Appl. Math. 2020, 365, 112395. [Google Scholar] [CrossRef]
Evans, C.; Pappas, K.; Xhafa, F. Utilizing artificial neural networks and genetic algorithms to build an algo-trading model for intra-day foreign exchange speculation. Math. Comput. Model. 2013, 58, 1249–1266. [Google Scholar] [CrossRef]
Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P. Da mineração de dados à descoberta de conhecimento em bancos de dados. AI Mag. 1996, 17, 37. [Google Scholar] [CrossRef]
Morettin, P.A.; Singer, J.M. Estatística e Ciência de Dados; Editora LTC: Rio de Janeiro, Brazil, 2022. [Google Scholar]
Sathya, R.; Rastogi, A.; Kumar, A.; Singh, S. Weather Based Future Rain Prediction Using Machine Learning with Flask Framework. In Proceedings of the 2022 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI), Chennai, India, 8–10 December 2022. [Google Scholar] [CrossRef]
PGMPY. 2025. Available online: https://pgmpy.org/ (accessed on 6 February 2025).
Mota, E.; Coimbra, D.; Peixoto, M. Cartola FC Data Analysis: Uma Ferramenta para Simulação, Análise e Visualização de Dados para o Fantasy Game Cartola-FC. In Proceedings of the SBSI’18: XIV Brazilian Symposium on Information Systems, Caxias do Sul, Brazil, 4–8 June 2018. [Google Scholar] [CrossRef]
Hansen, K.B. The virtue of simplicity: On machine learning models in algorithmic trading. Big Data Soc. 2020, 7, 205395172092655. [Google Scholar] [CrossRef]
Hu, Y.; Liu, K.; Zhang, X.; Su, L. Application of Evolutionary Computation for Rule Discovery in Stock Algorithmic Trading: A Literature Review. Appl. Soft Comput. 2015, 36, 534–551. [Google Scholar] [CrossRef]
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis: Forecasting and Control; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar]

Figure 1. Electricity market products and their temporal segmentation across different time horizons. The figure categorizes market mechanisms into Reserves, Energy, Capacity, and New Capacity, spanning short-term (minutes to 24 hours), medium-term (months to years), and long-term (up to 35 years) periods. Dark blue arrows indicate markets present in all structures, while orange arrows represent those operating only in specific contexts. The red arrow (System Operations Delivery) highlights ongoing system stability efforts, and the large red timeline arrow illustrates the transition from real-time balancing to long-term contractual agreements. Source: Authors, adapted from [32].

Figure 2. Systematization of spot price forecasting methodologies—Electricity Power Forecasting. Source: authors, adapted from [33].

Figure 3. Hybrid Method for Long-Term Projection. Source: Authors.

Figure 4. Historical LPC Series Normalized Using the MinMax Technique, Decomposed into Trend + Seasonal + Residual. Source: Authors.

Figure 5. Input Attributes Indicated by Importance Using the XGBoost Technique, Created for the Feature Engineering Process. Source: Authors.

Figure 6. Forecasting Components of Long-Term Energy Prices for 2024: (a) Seasonality, (b) Trend, and (c) Residuals.

Figure 7. LPC Projection for the Period January–December 2024 in BRL/MWh (a) by week and (b) by month.

Figure 8. Comparative Performance of SARIMAX and LSTM Models in Normalized Data Projection.

Figure 9. (a) Trend Component Projection with SARIMAX Normalized by MinMax for the 10–Year Period and (b) Reconstructed LPC Projection with only SARIMAX model for the 10–Year Period in BRL/MWh. (c) Trend Component Project with hybrid model SARIMAX (blue) + LSTM (green) Normalized by MinMax for 10–Year. Source: Authors.

Figure 10. Residual Projection for the 10-Year Period with Percentage Increases of 5%, 15%, and 30% for (a) Volatility, (b) 5-Month Moving Average, and (c) 12-Month Moving Average. Source: Authors.

Figure 11. Forecast of the Forward Energy Curve in the Long-Term Horizon from 2024 to 2034 with a 95% Confidence Interval (blue shadow). Source: Authors.

Table 1. Techniques Used in the Solution Development Process. Source: Authors.

Machine Learning Techniques	Data Processing Techniques
Recurrent Neural Networks (RNNs)	StandardScaler
Long Short-Term Memory (LSTM)	MinMaxScaler
Extreme Gradient Boosting (XGBoost)	Exponential Smoothing
Prophet (Facebook)	Log Transform
Support Vector Regression (SVR)	Dummy Variable Creation
Convolutional Neural Networks (CNNs)	Differentiation
Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors (SARIMAX)	Fourier Terms
Exponential Smoothing State Space Model (ETS)	Moving Average
	STL Decomposition
	Rolling Statistics

Table 2. Configuration Parameters of the SARIMAX Technique for the Forecast Used in the Hybrid Model. Source: Authors.

Parameter	Trend	Seasonal	Description
Time Series	1 January 2012	31 December 2023	Time series used in the model filtered up to 31 December 2023 and without missing values.
p (AR—Autoregressive)	6	2	The number of lags in the autoregressive (AR) component.
d (Difference)	0	0	The number of differencing operations applied to make the series stationary.
q (MA—Moving Average)	1	1	The number of lags in the moving average (MA) component.
P (Seasonal AR)	1	1	The number of lags in the seasonal AR component.
D (Seasonal Difference)	0	1	The number of seasonal differencing operations applied to make the series stationary.
Q (Seasonal MA)	1	1	The number of lags in the seasonal MA component.
s (Seasonality Period)	52	52	Seasonality period, considering weekly cycles (52 weeks in a year).
Simple Differencing	False	False	If true, applies simple differencing within the model.

Table 3. Configuration Parameters of the LSTM Technique for the Forecast Used in the Hybrid Model. Source: Authors.

Layer	Parameter	Value	Description
Input Shape	Units	1	The number of LSTM units (neurons) in the layer.
Input Shape	time_step	52	The number of time steps in the input sequence.
Bidirectional LSTM	return_sequences	False	Indicates whether to return the full sequence (false means only the last output is returned).
	Units	200	The number of LSTM units (neurons) in the layer.
	L2 Regularization	0.01	L2 regularization applied to prevent overfitting.
Dense	Units	1	The number of neurons in the dense layer.
Dense	L2 Regularization	0.01	L2 regularization applied to the dense layer to prevent overfitting.

Table 4. Configuration Parameters of the XGBoost Technique for the Forecast Used in the Hybrid Model. Source: Authors.

Parameter	Value	Description
Objective	reg:squarederror	Defines the learning objective. ’reg:squarederror’ is used for regression tasks.
n_estimators	500	The number of boosting rounds (trees) in the model.
learning_rate	0.05	Step size shrinkage used to prevent overfitting (learning rate).
max_depth	9	The maximum depth of each tree.
subsample	0.8	The fraction of training data to use per boosting round (helps with generalization).
colsample_bytree	0.8	The fraction of features to use when constructing each tree (helps reduce overfitting).

Table 5. Performance Metrics Evaluation of the Validation Among Projection Techniques for the Residual Component.

Model	MAE	RMSE	MSE
XGBoost	0.012078	0.016672	0.000278
SVR	0.015038	0.022201	0.000493
LSTM	0.030674	0.040239	0.001619
SARIMAX	0.046980	0.060450	0.003654

Table 6. Performance Evaluation Metrics for the Seasonality, Trend, and Residual Components. Source: Authors.

Metric	Seasonal	Trend	Residual
MAE	0.00076	0.01755	0.012078
RMSE	0.00100	0.02126	0.016672
MSE	0.000001	0.00045	0.000278

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Monteiro, F.P.; Monteiro, S.; Rodrigues, C.; Reis, J.; Bezerra, U.; Tostes, M.E.; Almeida, F.A.F. A Hybrid Methodology Using Machine Learning Techniques and Feature Engineering Applied to Time Series for Medium- and Long-Term Energy Market Price Forecasting. Energies 2025, 18, 1387. https://doi.org/10.3390/en18061387

AMA Style

Monteiro FP, Monteiro S, Rodrigues C, Reis J, Bezerra U, Tostes ME, Almeida FAF. A Hybrid Methodology Using Machine Learning Techniques and Feature Engineering Applied to Time Series for Medium- and Long-Term Energy Market Price Forecasting. Energies. 2025; 18(6):1387. https://doi.org/10.3390/en18061387

Chicago/Turabian Style

Monteiro, Flávia Pessoa, Suzane Monteiro, Carlos Rodrigues, Josivan Reis, Ubiratan Bezerra, Maria Emília Tostes, and Frederico A. F. Almeida. 2025. "A Hybrid Methodology Using Machine Learning Techniques and Feature Engineering Applied to Time Series for Medium- and Long-Term Energy Market Price Forecasting" Energies 18, no. 6: 1387. https://doi.org/10.3390/en18061387

APA Style

Monteiro, F. P., Monteiro, S., Rodrigues, C., Reis, J., Bezerra, U., Tostes, M. E., & Almeida, F. A. F. (2025). A Hybrid Methodology Using Machine Learning Techniques and Feature Engineering Applied to Time Series for Medium- and Long-Term Energy Market Price Forecasting. Energies, 18(6), 1387. https://doi.org/10.3390/en18061387

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu