1. Introduction
European Parliament Directive 2024/1275 on the energy performance of buildings reiterates that 75% of existing buildings remain energy-inefficient [
1]. Since 2012, the European Union (EU) has recognised the need for building retrofits through Directive 2012/27/EU on energy efficiency [
2]. Moreover, this directive emphasises the impact of occupant behaviour on energy performance, advocating for greater energy awareness, particularly through cost allocators. Research further reinforces this, showing that occupant behaviour plays a key role in energy consumption [
3], underscoring the complexity of energy use and the need for a holistic approach to improving building efficiency.
Under the principle of “energy efficiency first” [
1], Directive 2024/1275, recasting elements from Directive 2023/1791 [
4], stresses the need to establish both public and private financial mechanisms to support the execution of Energy Conservation Measures (ECM) that are aimed to retrofit buildings and develop nZEBs. In the current scenario, Energy Performance Contracts (EPCs) emerge like a powerful tool for financing building retrofits [
1]. An EPC is an agreement between an Energy Service Company (ESCO) and a beneficiary, typically the building owner(s) [
5]. In this arrangement, the ESCO assumes the financial costs of implementing ECMs and is therefore entitled to a share of the revenue from subsequent energy savings generated within the retrofitted building [
6]. Thus, the EPC’s intention is to promote private financing in energy-inefficient building renovations while protecting the beneficiaries and property owners from debt.
It is clear that the success of an EPC relies on accurately estimating the building’s energy savings. However, measuring the precise energy savings generated by implementing ECMs is challenging due to the dynamic nature of the building’s occupancy, its operation, and shifts in climate conditions, all of which will directly influence the building’s energy performance [
7]. To address this, a robust Measurement and Verification (M&V) process must be carried out [
8]. This M&V process involves comparing the energy consumed after the successful retrofit of the building to the energy that would have been consumed by it without the retrofit [
9], hence providing a reliable assessment of the actual energy savings achieved.
When M&V was applied initially, energy savings were calculated by comparing monthly utility bills [
10]. While this method is easy to implement, it proved inaccurate in buildings with variable consumption patterns [
11]. To improve accuracy, the M&V process typically involves developing a baseline Building Energy Model (BEM) prior to ECM implementation [
12]. This BEM serves as the reference point for estimating the hypothetical energy savings of the building following the retrofit.
The International Performance Measurement and Verification Protocol (IPMVP) [
13] has established four options to develop an M&V: Options A (Retrofit Isolation with Measurement of Key Parameters), B (Retrofit Isolation with Measurement of All Parameters), C (Whole Facility Measurement), and D (Calibrated Simulation). While the scope of Options A and B is limited to the M&V of a building’s particular element or subsystem, the application of Options C and D is aimed at the whole facility, and these latter two options are defined as follows:
Option C describes a data-driven approach, also known as a black-box model; it classifies monitoring data into input and output variables and does not consider the building’s physical characteristics [
14]. The collected input variables can range from weather conditions, to indoor temperature, energy loads, building occupancy rate, or subsystem’s operation, among others. On the other hand, the output variables could be the total electric or total heating and cooling consumption of the building [
15]. Once the classification is performed, the collected data stream is used to train the models until their outputs resemble reality. According to IPMVP, the development of an Option C model requires a minimum of twelve months of monitoring data prior to the implementation of the given ECM to generate the BEM [
13].
Option D focuses on the development of a white-box model whose implementation requires a detailed description of the buildings and its systems in order to generate a physics-based model [
16]. Unlike Option C, these white-box models take into account the building’s physical characteristics intheir calculations. Therefore, white-box models require extensive input data, including the building’s location, orientation, geometry, construction, operation schedules, environmental information, internal loads, and heating, ventilation, and air conditioning (HVAC) equipment details [
17]. Similar to Option C, these models require a calibration process where their internal parameters are adjusted until their simulated results match the monitored data [
18].
In recent years, an increase in the availability of affordable computer processors and advances in the efficiency of training methods has made IPMVP Option C and black-box models a popular choice for developing baseline BEMs [
19]. For example, in M. Agenis et al. [
7], a black-box baseline model was developed using a 24-month dataset. The original data resolution was changed from hourly into daily resolution for modelling purposes. This approach allows the study to find an optimal baseline model to assess energy consumption in the context of an EPC. Meanwhile, J. Granderson et al. [
20] explored the accuracy of six widely used algorithms in black-box modelling when quantifying building-level load shifts. By using a dataset comprised of 120 commercial buildings observed during 24 months, they were able to develop a series of baseline BEMs that take into account weekly variations (weekday and weekend differences) to enhance the model’s reliability.
Black-box models, like the ones described before, may offer several advantages, yet they also have limitations. These models are highly sensitive to the quantity, resolution, and quality of the data used during their training, often requiring large datasets that may be incomplete or contain errors [
21]. According to P. Klanatsky et al. [
22], monitoring data for model development is essential for effective M&V but can be costly due to the complexity and variability of building systems. In fact, long-term, high-resolution data is necessary to accurately capture building behaviour [
23]. However, data describing the building’s energy consumption remains limited, often relying on monthly utility bills to describe it, which can lead to inaccuracies when analysing consumption patterns in finer daily or hourly resolutions [
24]. Despite the cost-effectiveimplementation and widespread adoption of various temperature sensors facilitated by the Internet of Things (IoT) [
25], the scarcity of detailed energy data presents challenges for the precise analysis of building performance and, thus, the training of data-driven models.
K. Poulinakis et al. [
26] conducted a study to assess the impact of noisy and sparse data on black-box models, concluding that these factors significantly affect interpolation precision. Conversely, Z. Li et al. [
27] recommend using white-box models in cases where training data is insufficient, due to their transparency in the calculation process. The current research recognises these challenges and capitalises on the physical properties of white-box models, particularly their ability to calibrate parameters using limited historical data. Unlike their black-box counterpart, which rely heavily on correlations between extensive datasets, white-box models rely on calculations based on fundamental physical laws and engineering principles [
28]. This capability makes white-box models especially valuable where extensive historical data is unavailable or costly to obtain. The present study presents a calibration methodology that successfully operates with such limited historical data; it makes use of six months of temperature data and one and a half months of energy consumption data, both at a fifteen-minute time-step resolution. Furthermore, the study also explores the effects on BEM calibration of utilising scenarios where a fundamental measured variable, energy consumption, is missing. The evaluation of each scenario is performed by contrasting two different approaches against a non-calibrated model: one using indoor temperature as a single measured variable for parameter calibration, and another using both temperature and energy as measured variables.
Regarding the calibration methodology, a white-box model involves fine-tuning numerous parameters across various building systems—such as the envelope, HVAC, and other equipment—to ensure simulations reflect actual building behaviour [
29]. Carefully defining boundary conditions is essential, as these influence thermal stresses and operational dynamics [
30]. To streamline this process, a multi-stage calibration approach is often recommended, focusing first on the building envelope, followed by HVAC system calibration. For instance, A. Cacabelos et al. [
31] presented a multi-stage calibration methodology divided into six steps, with the initial two steps dedicated to envelope calibration, during which the model’s HVAC system remained inactive. This methodology aligns with J. Pachano et al. [
32], who similarly divided the calibration into two stages, calibrating the envelope during a free oscillation period before addressing the HVAC system.
However, multi-stage calibration methodologies require distinguishing between the building’s load and free oscillation periods. In continuously occupied buildings such as hospitals or hotels, where HVAC systems operate continuously, obtaining sufficient free oscillation data can be challenging, potentially limiting the effectiveness of a multi-stage methodology. To address this issue, the current research opts to perform a single-stage process with a global calibration approach; in other words, it adjusts all parameters simultaneously, regardless of whether they pertain to the envelope or HVAC system. This simplified approach utilises indoor temperature as the key variable defining the building’s operational regimes (free oscillation and load periods), thus indirectly separating the influence of each system and transforming a multi-stage process into a single-stage one. While this may introduce some bias across the different building systems, the methodology is more time-efficient and reduces computational resource requirements. The presented single-stage approach’s objective is to establish an accurate correlation between the building’s indoor behaviour and its energy consumption, which aligns with the requirements of an EPC. Within the context of EPC implementation, a single-stage approach accelerates the calibration process, enabling faster deployment of energy-saving measures and quicker realisation of savings. Additionally, it simplifies model development and analysis while enhancing applicability across a broader range of buildings, including those with continuous occupancy and user-driven operation.
Furthermore, this study also employs a non-detailed HVAC system model that makes use of the Ideal Loads Air System option in EnergyPlus to emulate the building’s complex HVAC system. The Ideal Loads Air System simulates an ideal HVAC unit that blends zone exhaust air with a specified amount of outdoor air [
33], efficiently adding and/or removing heat and moisture to deliver supply air at a desired set of conditions. The proposed method enables the adjustment of this HVAC system in order to evaluate the building’s energy performance and its thermal comfort without requiring a detailed calibration of each one of the HVAC components that would describe the installed system, allowing for quicker iterations and greater flexibility in refining the overall building model. However, while the Ideal Loads Air System effectively emulates an air-based HVAC system, it does present certain limitations when trying to emulate other types of systems, such as those related to radiant heat. This constraint should be taken into account when interpreting the simulation results, particularly in buildings where non-air systems play a critical role in maintaining indoor environmental quality.
The following study focuses on addressing the previously stated issues regarding calibration of parameters with limited datasets, while proposing a novel single-stage approach which has proven viable for the development of models aimed for EPCs. In summary, the proposed research addresses the following core points:
Development of a baseline BEM with limited data—The calibrated BEM model developed in this study achieves accurate energy predictions with limited historical data, demonstrating an opportunity for white-box model application for EPCs or in buildings where data collection is challenging.
Exploring the effects of using multiple fundamental measured variables as control inputs inside the calibration process—The presented research compares and contrasts a BEM that had been calibrated using only indoor temperature against a BEM whose calibration takes into account both temperature and energy data. The obtained results highlight optimal calibration strategies that are aimed to improve accuracy and reliability of the models.
Exploring the application of a simplified single-stage calibration process—Aimed towards enhancing efficiency by reducing the time and resources needed for model calibration, this study explores the methodology’s application, benefits, and limitations. Moreover, it studies the application of the Ideal Loads Air System as a simplified HVAC system to perform the building’s energy performance assessment. This single-stage approach might offer a practical framework that promotes the use of IPMVP Option D for EPCs, as well as for the improvement of different energy management strategies.
The structure of the present paper is established as follows:
Section 2 presents the building description, the monitoring plan, and the calibration methodology. It details the approach followed in the research.
Section 3 discusses the results obtained through comparisons between the base and the calibrated models. The analysis focuses on how the calibrated models perform against the temperature and energy control values. Finally,
Section 4 lists the conclusions obtained from this analysis, including the methodology’s practical challenges and limitations, followed by future research directions.
2. Methods
This section is devoted to the methodology followed in this work, which consists of the following steps:
Building description, encompassing a detailed description of the building’s relevant components, including its envelope, HVAC system, and operation.
Monitoring plan, a detailed description of the monitoring system and the establishment of the necessary database for calibration.
Calibration process, performed within the EnergyPlus environment through a genetic algorithm.
2.1. Building Description
The “Amigos” Building of the Universidad de Navarra, located in Pamplona, Spain, has been chosen as the test site for this study. The research focuses on the building’s south annex, denominated “Decanato” (shown in
Figure 1), due to its self-contained functionality and independent HVAC subsystems.
The annex building is a single-storey structure with an effective area of 506.72 m
2; it has an average height of 7.00 m, which translates into a net volume of treated air of 3547.67 m
3. The building envelope is composed of reinforced concrete with appropriate insulation.
Table 1 provides a summary of the envelope materials, including partitions and glazing.
The building’s HVAC system is composed of two subsystems aiming to provide comfort and ventilation for each of the occupied thermal zones. The first subsystem is dedicated exclusively to maintaining indoor thermal comfort and is comprised of four-way pipe-water-based fan-coil units. The installed fan-coil units have a nominal cooling capacity that ranges from 3.40 kW to 8.44 kW, as well as a nominal heating capacity between 3.00 kW to 6.30 kW. The second system corresponds to an Air Handling Unit (AHU) with sensible heat recovering system that is capable of providing 3465.00 m3/h of fresh outdoor air into the building. This subsystem can satisfy both thermal comfort and the building’s ventilation requirements. In order to satisfy indoor thermal requirements, the AHU includes a 29.50 kW heating coil and a 32.20 kW cooling coil.
The building’s heating production system is comprised of four condensing gas boilers, while its cooling is provided by three electric air-to-water refrigeration units. Given that the scope of this paper is limited to the “Decanato” building, there is no need to detail these systems; it is sufficient to state that energy meters have been installed on the piping that supplies heating and cooling to the annex building.
All relevant information concerning the building and its HVAC systems has been provided by the building’s facility manager. This documentation includes “as-built” architectural blueprints, construction details, and technical specifications of the construction and materials used, as well as HVAC Piping and Instrumentation Diagrams (P&ID) and system blueprints. Based on this information, it was possible to develop the physical model using DesignBuilder, as displayed in
Figure 2.
As
Figure 2 shows, the annex building has 20 thermal zones (TZ), which have been listed in in
Table 2. The model preserves the building’s original partitioning, retaining small zones such as bathrooms and service shafts to accurately represent thermal transmittance and air exchange effects. Although these zones are unheated and unmonitored for indoor air temperature (excluding them from the calibration process), they act as buffer spaces between climate-controlled areas and the adiabatic volume of the “Amigos” Building. This effect is particularly evident in TZ12, which is influenced by conditions in the central hall and its connection to TZ09. Thus, the remaining 12 TZ under the scope of the calibration process represent 90.06% of the building’s effective surface area.
Given that both HVAC subsystems in the annex are air-based, the simulated HVAC system was defined using the Ideal Loads Air System component in EnergyPlus, and its initial parameter values were set using the building’s technical documentation. This approach of simplifying the detailed representation of the HVAC system aimed to save time during modelling and simulation. Additionally, it reduces the number of parameters, and therefore the search space, during the subsequent calibration process.
Finally, the BEM was exported into the EnergyPlus environment, where boundary and load conditions are introduced [
35]. For the present study, set-point temperatures and HVAC operation schemes were introduced, while thermal loads for people, lights, and other equipment were eliminated; their influence is represented inside the TZ temperature, and their effects in terms of energy is assumed to be part of the uncertainty error of the proposed method. The resulting BEM is the base model used in the calibration process.
2.2. Monitoring Plan
The building’s monitoring campaign relies on data collected from the Building Management System (BMS), which includes both indoor conditions and operational data from the HVAC system. This research specifically makes use of the data regarding indoor dry bulb temperatures (C) and heating and cooling set-point temperatures (C), along with the operational status (On/Off) of the building’s HVAC equipment. There is a total of 11 sensors of each type located between thermal zones TZ01 to TZ11. Access and manipulation of the set-point sensors allow the development of a predictive model and future applications regarding the optimisation of energy consumption in the building.
In order to perform the BEM’s calibration and the evaluation of the building’s energy performance gap, thermal energy meters have been installed over the heating and cooling distribution pipes that provide energy to the HVAC in the annex building. Since these meters are deployed on the annex’s main piping branch before it diverts into each one of the HVAC terminal units or the AHU, their location allows this study to establish the actual heating and cooling consumption of the building.
The data obtained from this monitoring campaign were collected in 15-min intervals (time steps) and underwent a validation process in order to clean them from errors. This validation process flags possible data blanks or losses, removing such periods from the calibration and evaluation period. Data is considered valid when blanks are shorter than four time steps (one hour), with missing values filled using linear interpolation [
32]. However, if blanks exceed one hour, the data is deemed invalid. Additionally, sensors may produce anomalous readings due to system errors, time adjustments, or other factors. Any values falling outside typical measurement ranges are identified and removed from the dataset. The data processing was conducted using a Python 3.9.7 script.
The monitoring period spanned from August 2023 to February 2024. However, due to the installation and commissioning of the energy meters, data collection for energy consumption only began on 22 December 2023. As
Table 3 illustrates, 2884 time steps were retrieved, corresponding to 590 h of energy consumption data, which was divided into training and checking periods using a 60/40 split.
Regarding the site’s climate information, a weather station was installed on-site. The sensors deployed in this weather station measure dry bulb temperature (C), dew point (C), relative humidity (%), horizontal radiation (W/m2), diffuse radiation (W/m2), wind speed (m/s), wind direction (degrees), and precipitation (mm). Additionally, atmospheric pressure (MPa) was retrieved from a weather station located near the building.
Table 4 provides an overview of the data retrieved for the calibration process. Data regarding the site’s climate were used to generate the weather file in EPW format. Information provided by the BMS was classified into input data, boundary conditions (heating and cooling set-point temperature (
C), heating and cooling availability (1/0) schedule), or control variables (indoor temperature (
C), or heating and cooling energy consumption (kWh)).
2.3. Calibration Process Description
The BEM described in
Section 2.1 underwent a calibration process to align its behaviour with measured data from the dataset established in
Section 2.2. This process, developed by C. Fernandez Bandera [
36] and validated by V. Gutierrez [
37] and J. Pachano [
32,
38,
39], follows Option D of the IPMVP.
The present work builds on these previous studies, using them as a foundation to study a new scenario for the multi-level benchmark required to achieve building calibration, that is, minimising the gap between simulated versus real indoor temperatures and energy consumption. Moreover, it analyses the effects when the latter is missing in an attempt to achieve proper building calibration by using only indoor temperature. The reason for this approach is to address a recurrent problem that arises when dealing with existing buildings, the fact that thermal energy consumption is often difficult and costly to measure. In order address this problem, the current study assessed a single-stage calibration process following three distinct analysis cases:
Base model (non-calibrated)—The reference BEM’s parameters have been set to technical specifications detailed in the building’s documents and, thus, have not been calibrated. This BEM is simulated under the same conditions (weather and operational schedules) as the other approaches. This BEM’s objective is to study the performance of a model that has been generated meeting the typical “business as usual” requirements, and it is established as a control BEM for comparison against models that were calibrated.
Model A (temperature-focused calibration)—Model A attempts to adjust or calibrate the parameter values of both the envelope and the building’s HVAC system while using only one control variable: indoor temperature. To do this, the calibration process makes use of free oscillation and load/operational periods of the annex building to indirectly separate the effects of the HVAC subsystem from those related to the building’s envelope. The objective function used for this calibration process prioritises achieving accurate indoor temperature for the multiple thermal zones under calibration and does not take energy consumption into account. The objective of this approach is to emulate the lack of control data regarding the building’s energy consumption, and the process applied aims to solve this challenge by finding a solution that matches the actual building’s indoor climate during both operational periods (free oscillation and load).
Model B (comprehensive calibration)—This model is based on the methodology developed by J. Pachano [
32]; however, the process has been adapted to be executed in a single-stage approach. This BEM’s calibration operates by performing a multi-level benchmark based on two main control variables. Thus, the calibration of the building’s envelope and its HVAC parameter values is performed until both indoor temperatures and energy consumption meet the requirements set by international standards. The application of this multi-level benchmark aims to achieve an accurate representation of both the building’s thermal behaviour (dynamics) and its energy performance.
A key aspect of the present study is the empirical validation of a single-stage calibration process under real building operation data, expanding on previous work by J. Pachano [
32] in an attempt to optimise computer processing time and resources by simplifying the calibration process.
The process detailed in
Figure 3 has been developed to capture the building’s indoor climate while faithfully representing its energy consumption. It primarily focuses on the development of a calibrated BEM whose purpose is to study comfort conditions inside the building as well as provide an energy performance assessment. Consequently, the resulting calibrated BEM can serve as a baseline model for EPCs and facilitate the analysis of potential Energy Conservation Measures (ECMs) such as BMS control strategies and room set-point optimisation.
In previous studies, V. Gutierrez and J. Pachano [
32,
37,
38,
39] explain the inherent need to execute a multi-stage process in order to separate the effects of the different building’s subsystems, minimise the spread of error bias between them, and capture the behaviour of passive and active systems separately. The single-stage approach presented in this paper aims to indirectly separate these effects by relying on the difference in behaviour of indoor conditions between periods of free oscillation, where energy consumption is zero, and load periods, where the HVAC system is operational. This approach relies on the use of a multi-level objective function, one that is aimed at benchmarking indoor conditions on multiple thermal zones and the building’s energy consumption on both characteristic periods of time, thus generating a simplified model that meets the requested capabilities for EPC assessment.
Model A and B, developed for this study, follow the calibration process in
Figure 3, where heating and cooling set-point temperature data, provided by the BMS, are defined as a boundary condition prior the training of the models. Only then do the resulting BEM’s key parameter values, displayed on
Table 5, undergo an adjustment using the non-dominated sorting genetic algorithm II (NSGA-II) [
40] within JePlus + EA 1.7.7 software [
41]. NSGA-II was selected for its efficiency, elitist approach, and parameter-less sharing mechanism [
42]. Its population-based search approach generates a diverse set of high-quality solutions, making it particularly effective for multi-objective optimisation [
43]. In the building calibration, NSGA-II outperforms other elitist multi-objective evolutionary algorithms by delivering greater solution diversity and improved convergence to the Pareto-optimal front [
44].
The adjustment of these parameter values continues until the BEM’s respective simulation results meet international standards’ criteria, shown in
Table 6, when compared to the monitored control data [
38].
In the case of the model’s energy performance assessment, it is carried out by benchmarking the results of the BEMs using the international standards of the “American Society of Heating, Refrigerating, and Air-Conditioning Engineers” (ASHRAE) and the IPMVP [
11]. As for the evaluation of thermal performance, the model’s indoor temperature results from the different thermal zones are evaluated using the international standard of the “Chartered Institution of Building Service Engineers” (CIBSE) [
45].
The calibration process described in this section requires a checking period, characterised by the previously monitored data that were not used during training period. This validation confirms that there is no over-adjustment of the parameter values during the training period.
3. Results
This section presents the results obtained from this process, which include a performance evaluation for all models during both training and checking periods. By examining the calibration outcomes of each case study, we aim to underscore the significance of using one or multiple control variables when developing an accurate model, as well as their impact on the BEM’s predictive capabilities.
The initial key parameter values and the values obtained after the calibration of each case study are presented in
Appendix A, while the results from the models are presented in the following subsections categorised by Indoor Temperature Evaluation and Energy Performance Assessment.
3.1. Indoor Temperature Evaluation
As explained in
Section 2, indoor temperature evaluation is performed by studying the statistical indexes MAE, RMSE, and R
2 for each conditioned thermal zone.
The first comparison, shown in
Table 7, presents the models’ behaviour during the training period. The non-calibrated base model was also evaluated to demonstrate the impact of the different calibration approaches used. Notably, Model B selected higher zone capacitance multipliers and lower zone infiltration effective leakage Areas compared to Model A. This results in Model A consistently achieving a better statistical value for indoor temperature across all thermal zones than Model B. Regardless,
Table 7 results show that the average evaluation metric for the building’s indoor conditions are significantly improved for both Model A and B. This improvement is evident in reduced MAE and RMSE uncertainty values, some even below the usual threshold for a temperature sensor’s dead-band (0.50
C), and all models now meet the established R
2 criteria.
During the checking period displayed in
Table 8, Model A remained stable and met the statistical criteria for all thermal zones. For TZ06, Model B improved R
2 but fell short of the base model’s MAE. This deviation may stem from overfitting the thermal zone during training or anomalies in the monitored data, including noise or shifts in occupant behaviour. In TZ01, Model B showed overall improvement compared to the base model, although it didn’t meet the MAE and RMSE targets during this period. Same case in TZ03, where Model B fell short to accomplish R
2 target of 75.00%.
When analysing the average building’s temperature indexes during this checking period, both calibrated models once more show a clear improvement in uncertainty values, especially R2, which exhibits poor performance in the non-calibrated model.
Figure 4 and
Figure 5 present a scatter plot comparing simulated and measured temperatures for Model A (a) and B (b), contrasting their results against the non-calibrated base model during the training and checking period, respectively. Ideally, well-calibrated results should align along the 45° diagonal, with deviations indicating model inconsistencies. The base model’s results, highlighted in orange, provide a visual reference for the calibration improvements achieved by the other models.
Figure 4 highlights significant scatter in the base model during the training period, indicating poor initial agreement with the measured data. Overall, both models tend to overestimate temperatures when measured values are below 20
C, as shown by the greater concentration of points above the 45
line. Conversely, for measured temperatures above 25
C, both models underestimate temperatures, evident from the larger number of points below the diagonal.
Figure 4a shows Model A’s calibration clustering indoor temperatures more closely along the ideal diagonal than Model B in
Figure 4b.
During the checking period, displayed on
Figure 5, the base model continues to exhibit poor agreement with measured temperatures. Notably, two data points fall below 14
C, while all models simulate them at no less than 20
C, with Model A (
Figure 5a) providing the closest approximation.
Model A continues to outperform Model B, which seems to align with the expected behaviour of a model guided solely by a temperature-driven objective function. In Model A (
Figure 5a), the similar trend observed during training is repeated, where temperatures below 22
C are overestimated, while those above 22
C are underestimated. In contrast, Model B (
Figure 5b) shows a more pronounced underestimation of simulated temperatures above 20
C, along with a greater dispersion of points below this threshold. This deviation suggests the decay in R
2 of Model B when compared with Model A and the difficulty of capturing the thermal dynamics on TZ01.
Finally, the average building temperature curve of each model is represented in
Figure 6. In this figure, (a) focuses on the training period, depicting the temperature during a time lapse of 10 days, and (b) is focused on the checking period. At first glance, the base model clearly demonstrates a poor performance when compared to the calibrated models A and B.
A closer inspection of the figure shows that Model A closely mimics the measured temperature during load periods, and even if it deviates from the measured temperature during free oscillation periods, it seems to behave better than Model B. On the other hand, Model B shows an increased temperature gap, which is particularly noticeable on 22 January. During this day, the mean simulated temperature of Model B is 1 C below the measured mean temperature.
A similar behaviour is shown during the checking period, illustrated in
Figure 6b, where Model A seems to outperform Model B. However, it is during this checking period that Model B appears to perform better when compared to the previous graph
Figure 6a. It should be noted that during this period, both calibrated models, Model A and B, deviate from the measured temperature around 11 and 18 January, coinciding with extended periods of free oscillation. During this time lapse, both models predict a higher indoor temperature of 1.5
C compared to the measured temperature. Once the HVAC system renews its operation, both models demonstrate strong thermal performance, closely aligning with the measured temperature.
The overall results for indoor temperature show an improved performance for Model A and B when compared with the non-calibrated model, confirming the effectiveness of the calibration process. This initial assessment demonstrates that both models exhibit good thermal stability during the checking period, complying with CIBSE’s criteria for calibration. It should be noted that the models’ underperformance during extended free oscillation periods of time might indicate a bias in the calibration of the envelope’s parameter values.
3.2. Energy Performance Assessment
The models’ energy performance is analysed exclusively during load periods. In order to comply with ASHRAE and IPMVP standards, the results obtained from the simulations, which were conducted at 15-min intervals, are resampled to an hourly resolution prior to their comparison, as displayed in
Table 9.
In contrast to the results previously obtained during the indoor temperature evaluation, in terms of energy performance Model B achieves the best results during the training period. Since Model B employs a multi-level objective function, which is capable of observing energy consumption as a control variable, the model’s calibration meets ASHRAE’s criteria during training and checking periods.
Interestingly enough, Model A, whose training process is “blind” to energy as a control variable, reaches ASHRAE’s calibration criteria during the training period; however, it shows no stability and fails to meet the NMBE and CV(RMSE) limits during the checking period. Moreover, the results show that both models exceed the 75.00% recommendation given for R2 index.
These results suggest that a calibration process that does not takes into account energy consumption may not provide the stability required to produce accurate results in the long term or under previously unseen data. In contrast, Model B exhibits the best stability, with its CV(RMSE) only worsening by 3.25 points. Additionally, Model B improves its NMBE and R2 values during the checking period. As a result, Model B continues to meet ASHRAE criteria during the checking period, while Model A shows decreased performance.
Figure 7 and
Figure 8 compare measured and simulated energy consumption for the training and checking periods at both daily and cumulative levels. In
Figure 7a, Model B seems to outperform Model A, particularly during peak energy consumption periods. However, in the cumulative assessment shown in
Figure 7b, Model B’s accumulated energy exceeds the measured consumption by a total of 8.49%, while Model A’s total energy consumption falls short by 5.05%. The figure also shows the energy performance of the non-calibrated base model, which significantly underestimates energy consumption, predicting 17.29% less than the measured value.
The apparent advantage of Model A proves not to be stable during the checking period. As shown in
Figure 8a, Model A’s simulated results fall below the measured energy, while Model B exhibits a curve closer to the building’s actual energy performance. This translates to
Figure 8b, where Model B’s prediction is nearly the same to the building’s measured energy, exceeding only by 0.04%. In contrast, Model A’s performance deteriorates significantly during the checking period, falling short by 12.62% and approaching the base model’s values.
The reason for this behaviour is driven by how the objective function has been established. While Model A’s main focus is to adjust the parameter values solely by controlling that indoor temperatures are met, Model B’s objective is to meet both indoor temperatures and energy consumption at the same time and on the same resolution (thus the importance of correctly defining the objective function and introducing energy as a factor during calibration).
A building’s dynamics, particularly its energy balance, is by definition an indetermined problem, thus prone to multiple solutions. By disregarding energy as a control variable during the calibration process (i.e., Model A’s objective function), the obtained parameter values prioritise achieving indoor temperatures even if this leads to a set of less energy-efficient parameter values, and in consequence a less accurate/robust solution in terms of energy consumption. By contrast, Model B’s objective function observes both sides of the building’s energy balance, which results in a better combined outcome for both indoor temperature and energy consumption.
The performance differences between the calibrated models can be attributed to the parameter values obtained during the calibration.
Table 10 presents the average values for each Ideal Loads Air System parameter, highlighting the direct differences in energy performance between the models. Notably, both models selected lower parameter values compared to the base model. For Maximum Heating Supply Air Temperature, both models remained relatively close to the base model, reducing the value by only 18.91% and 27.27%, respectively. In contrast, Maximum Sensible Heating Capacity showed a significant reduction, decreasing from 18,412.76 W in the base model to 3503.72 W in Model A (an 80.97% reduction) and 5413.91 W in Model B (a 70.60% reduction), indicating a considerable deviation from the base model. Similarly, for Sensible Heat Recovery Effectiveness, both models significantly reduced the system’s effectiveness, interpreting values below 0.16%. However, as previously stated these models are simplified approaches focused on EPC use; because of the use of an Ideal Loads component instead of the introduction of a detailed HVAC system into the models, the resulting parameter values lack any physical significance and, thus, the models are not able to confidently state if and/or where a particular inefficiency of the HVAC system could be.
The results displayed in
Figure 4 and
Figure 5 may suggest that Model B’s overall performance apparently “sacrifices” its indoor temperature accuracy in favor of an improved energy performance. After the assessment of the results displayed in
Figure 7 and
Figure 8, it can be seen that this is not the case. Including energy as a control variable inside the calibration process proves to guide the solution towards a more robust and stable model, one that comprehensively captures the building’s behaviour and is capable of assessing both thermal comfort and energy performance.
4. Conclusions
This study performs an empirical validation of a single-stage calibration methodology with the use of limited data; it demonstrates the effects of utilising only indoor temperature as a control variable and evaluates the necessary level of calibration required to achieve a reliable baseline model for Energy Performance Contracts (EPC) applications.
The comparison between Model A and Model B provides valuable insights into the trade-offs between different calibration approaches. Model A prioritises temperature calibration, ensuring high accuracy in indoor climate control while requiring minimal energy monitoring. It is effective for thermal comfort assessment but lacks stability in energy predictions, making it unsuitable for Energy Performance Contracts (EPCs). In contrast, Model B balances both temperature and energy consumption, meeting international calibration standards and providing reliable long-term energy performance assessments. While more computationally demanding, it is ideal for EPCs and energy audits. Ultimately, Model A is best for comfort-focused applications, whereas Model B is preferable for energy-saving initiatives and comprehensive building performance evaluations. The results of this study underscore the critical role of energy as a control variable in achieving accurate calibration.
Furthermore, by considering load and free oscillation periods, and using room set-point temperature as a boundary condition during the calibration process, the resulting Building Energy Model (BEM) manages to capture the building’s overall behaviour, becoming a robust tool that allows its application in comfort evaluation, opening a window for a more comprehensive assessment of building performance.
The single-stage process, with its simultaneous calibration of envelope and heating, ventilation, and air conditioning (HVAC) parameters, proved to be a viable methodology. Even though the simulated indoor temperature curves during extended free oscillation periods deviated from measured temperatures, both Model A and Model B demonstrated strong statistical performance. This validates the single-stage approach described in this study. The use of this approach reduced time and computational resources, generating BEMs that accurately represent the building’s indoor climate, as well as their energy performance. This improvement in the efficiency of the calibration process could highlight its potential for a broader application in EPCs, offering more precise and reliable energy savings forecasts, especially with limited data. Additionally, it provides a practical solution for calibrating buildings with continuously operating HVAC systems, where a multi-stage approach might be challenging to implement.
The study makes use of the similarity between the Ideal Loads component in EnergyPlus and the building’s installed air-based HVAC system, allowing for a simplified yet accurate calibration process. Although the modelled Ideal Loads Air System does not fully replicate the physics of the real HVAC system, the study demonstrates it can produce comparable results for some BEM applications (i.e., EPCs) when certain conditions in the building are observed. The use of this simplified approach minimises the complexity involved in describing a detailed HVAC system inside the simulation environment, enabling quicker iterations and increased efficiency in the model refinement process.
As mentioned before, this research introduces the use of heating and cooling set-point temperatures as boundary conditions for white-box models. Setting these boundaries inside the simulation environment allows the calibrated model to operate as a predictive tool, a particularly useful trait to perform optimisations in buildings equipped with Building Management Systems (BMS). The predictive capabilities of the model enables the evaluation of different energy saving strategies that are based on room set-point optimisation and availability schedules. Moreover, it facilitates the assessment of occupant comfort, which is particularly beneficial in educational and commercial buildings.
Finally, this study demonstrates the feasibility of establishing a calibrated baseline BEM using a limited data set (590 h), which states a significant advantage specially in buildings where data collection may be challenging. The stable Coefficient of Variation of Mean Square Error (CV(RMSE)) and Normalised Mean Bias Error (NMBE) values achieved by Model B suggest that reliable calibration can still be accomplished despite data limitations, which is particularly useful for older buildings or those undergoing renovations.
In conclusion, by emphasising temperature and energy as essential control variables, the baseline model developed by the single-stage methodology described in this study provides a highly effective tool for EPCs. Its accuracy supports realistic energy savings predictions, aiding in risk reduction for both Energy Service Companies (ESCOs) and building beneficiaries. The accuracy showed by Model B allows its use for the direct calculation of potential Energy Conservation Measures (ECMs) savings, contributing to develop more comprehensive and effective building retrofit strategies.
4.1. Practical Challenges and Limitations
One of the primary limitations of this methodology is its applicability exclusively to air-based HVAC systems. Radiative systems, which have different thermal dynamics and response times, introduce inaccuracies when assessed using this approach. Ideal Loads Air System assumes perfect heat and moisture transfer, neglecting real-world HVAC inefficiencies and dynamic system behaviours. Additionally, these models do not account for thermal inertia in radiative systems, limiting their ability to assess response times and overall system performance accurately.
Another major challenge is that Ideal Loads Air System may underestimate or overestimate peak loads due to their inability to account for system-specific constraints. As a result, shorter time steps (under one hour) are necessary to enhance accuracy. While this methodology remains a valuable tool for estimating energy performance, its effectiveness is highly dependent on the availability of high-resolution data, specialised monitoring infrastructure, and a thorough understanding of HVAC system characteristics.
Data availability and monitoring infrastructure also pose challenges. Accurate calibration relies on both temperature and energy monitoring, as using temperature alone results in an energy-unstable model. While temperature sensors are cost-effective and easy to deploy thanks to IoT advancements, energy meters are more challenging to install. Furthermore, setpoint temperature monitoring is uncommon in buildings without a BMS, making implementation more difficult in such cases.
Another key challenge is the differentiation between free oscillation and load periods in monitored data, which is crucial for activating Ideal Loads. In buildings without a BMS or fixed HVAC schedules, where systems operate based on user demand, identifying these periods becomes significantly more complex. This challenge is even more pronounced in buildings without free oscillation periods, requiring further investigation into alternative calibration strategies.
Lastly, this research was conducted using a single demonstration site. While Model A and Model B performed well statistically, adapting the methodology to different buildings may require modifications. Buildings with varying occupancy patterns, HVAC configurations, and climate conditions could present additional calibration challenges, necessitating further refinement of the model.
4.2. Future Research
Future studies will continue exploring this approach, aiming to further optimise the calibration process by reducing computational demands and minimising execution time. Refinements will focus on improving algorithm efficiency, enhancing automation, and integrating real-time data processing to streamline the methodology’s implementation.
Additionally, further research will explore the practical applications of developed baseline models in assessing optimisation strategies. These include refining HVAC system operation through set-point adjustments, demand-driven scheduling, and efficient heating and cooling strategies. Studies will also investigate building-envelope-retrofitting strategies to enhance thermal performance, alongside more accurate quantification of energy savings resulting from such interventions.
Since the current methodology has been applied to a single demonstration site, future research will expand its application to various building types. This will help determine necessary adaptations based on building characteristics and occupancy patterns. By validating the approach across diverse case studies, researchers can refine the methodology to enhance its applicability and scalability for broader energy efficiency improvements.