Open AccessArticle

Evaluating Predictive Accuracy of Regression Models with First-Order Autoregressive Disturbances: A Comparative Approach Using Artificial Neural Networks and Classical Estimators

Rauf I. Rauf

^1,*

Masad A. Alrasheedi

Rasheedah Sadiq

³ and

Abdulrahman M. A. Aldawsari

⁴

Department of Statistics, Faculty of Science, University of Abuja, Federal Capital Territory, Abuja, Nigeria

Department of Management Information Systems, Faculty of Business Administration, Taibah University, Al-Madinah Al-Munawara 42358, Saudi Arabia

National Bureau of Statistics, Federal Capital Territory, Abuja, Nigeria

⁴

Department of Mathematics, College of Sciences and Humanities, Prince Sattam Bin Abdulaziz University, Al-Kharj 16273, Saudi Arabia

Author to whom correspondence should be addressed.

Mathematics 2024, 12(24), 3966; https://doi.org/10.3390/math12243966

Submission received: 7 November 2024 / Revised: 12 December 2024 / Accepted: 16 December 2024 / Published: 17 December 2024

(This article belongs to the Section Probability and Statistics)

Download

Browse Figures

Figure 1
Graph of error (<math display="inline"><semantics> <mrow> <msub> <mrow> <mi>e</mi> </mrow> <mrow> <mi>t</mi> </mrow> </msub> </mrow> </semantics></math>) against (<math display="inline"><semantics> <mrow> <msub> <mrow> <mi>e</mi> </mrow> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>) showing autocorrelation. "> Figure 2
Actual and predicted number of people employed (100% testing). "> Figure 3
Graph of error (<math display="inline"><semantics> <mrow> <msub> <mrow> <mi>e</mi> </mrow> <mrow> <mi>t</mi> </mrow> </msub> </mrow> </semantics></math>) against (<math display="inline"><semantics> <mrow> <msub> <mrow> <mi>e</mi> </mrow> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>) showing autocorrelation. "> Figure 4
Actual and predicted MPG (average miles per gallon) (100% testing), where ML and REML denote maximum likelihood estimator and restricted maximum likelihood estimator. "> Figure 5
Graph of error (<math display="inline"><semantics> <mrow> <msub> <mrow> <mi>e</mi> </mrow> <mrow> <mi>t</mi> </mrow> </msub> </mrow> </semantics></math>) against (<math display="inline"><semantics> <mrow> <msub> <mrow> <mi>e</mi> </mrow> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>) showing autocorrelation. "> Figure 6
Actual and predicted GDP data (100% testing), where ML and REML denote maximum likelihood estimator and restricted maximum likelihood estimator. ">

Versions Notes

Abstract

In the last decade, the size and complexity of datasets have expanded significantly, necessitating more sophisticated predictive methods. Despite this growth, limited research has been conducted on the effects of autocorrelation within widely used regression methods. This study addresses this gap by investigating how autocorrelation impacts the predictive accuracy and efficiency of six regression approaches: Artificial Neural Network (ANN), Ordinary Least Squares (OLS), Cochrane–Orcutt (CO), Prais–Winsten (PW), Maximum Likelihood Estimation (MLE), and Restricted Maximum Likelihood Estimation (RMLE). The study evaluates each method’s performance on three datasets characterized by autocorrelation, comparing their predictive accuracy and variability. The analysis is structured into three phases: the first phase examines predictive accuracy across methods using Mean Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE); the second phase evaluates the efficiency of parameter estimation based on standard errors across methods; and the final phase visually assesses the closeness of predicted values to actual values through scatter plots. The results indicate that the ANN consistently provides the most accurate predictions, particularly in large sample sizes with extensive training data. For GDP data, the ANN achieved an MSE of 1.05 × 10⁹, an MAE of 23,344.64, and an MAPE of 81.66%, demonstrating up to a 90% reduction in the MSE compared to OLS. These findings underscore the advantages of the ANN for predictive tasks involving autocorrelated data, highlighting its robustness and suitability for complex, large-scale datasets. This study provides practical guidance for selecting optimal prediction techniques in the presence of autocorrelation, recommending the ANN as the preferred method due to its superior performance.

Keywords:

autocorrelation; estimators; prediction; ANN

MSC:

62M10

1. Introduction

As a result of the exponential growth in data and computational capacity, modern prediction techniques have evolved to satisfy the requirements of complicated time series datasets. Linear regression remains a vital predictive technique used in time series analysis due to its interpretability across a range of domains [1]. However, the potent power of linear regression decreases in the case where data violate the assumption of autocorrelation, where observations are sequentially dependent [2,3].

The accuracy of estimators such as Ordinary Least Squares (OLS) is impacted by the autocorrelation, which breaches the independence assumption of linear regression [1,2,3,4]. The need for a model that can account for the violation of autocorrelation, such as the autoregressive (AR) model, has led to the development of specialized estimators like the Prais–Winsten and Cochrane–Orcutt estimators, which were developed in answer to this need for models that account for autocorrelation [5,6,7]. Various techniques, which include Maximum Likelihood Estimation (MLE) and Restricted Maximum Likelihood (RMLE), have been proposed in the econometric and statistical literature to handle autocorrelation disturbance [1,8]. These methods ensure more effective parameter estimation by providing robustness in autocorrelation scenarios. Tests like the Ljung–Box and Breusch–Godfrey tests are essential for scenarios with higher-order autocorrelations because they enable practitioners to identify and modify models and also effectively fit them to serially correlated data [9,10]. However, autocorrelation remains a challenge, especially as data structures become more complex and nonlinear, necessitating flexible and dependable forecasting methods. In time series forecasting, Artificial Neural Networks (ANNs) offer a promising path, specifically when it comes to managing nonlinear data structures. In contrast to traditional linear models, Long Short-Term Memory (LSTM) and Temporal Pattern Attention networks, which are useful for sequences with temporal and spatial patterns, are two examples of layered architectures that ANNs use to capture complex dependencies [11,12].

ANNs can approximate complex functions inside datasets, although existing models often assume independence in error terms, which is a gap researchers are actively striving to bridge [13,14]. In order to improve forecast accuracy in the presence of autocorrelated disturbances, recent research has modified ANNs to resemble autoregressive frameworks [5,15].

The aim of this study is to evaluate the performance of six estimation techniques—the OLS, Prais–Winsten, Cochrane–Orcutt, MLE, RMLE, and ANN models—when dealing with autocorrelated time series data. This study chose to evaluate each method’s predictive efficacy and robustness under different levels of autocorrelation using metrics like the Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and Mean Square Error (MSE) [1,2,4,16]. This comparative analysis intends to clarify the conditions in which neural networks may outperform traditional methods, especially in capturing both linear and nonlinear patterns in time series data. The findings from this study could significantly inform applied econometrics, providing a pathway for enhanced accuracy in forecasting models that are used across economics, environmental science, and other data-driven disciplines [17,18].

2. Methodology

2.1. Linear Regression Model Setup

The general linear regression model is specified as follows:

y = X β + ϵ

(1)

where

$y \in R^{n}$ : vector of observed response variables;
$X \in R^{n \times p}$ : matrix of predictor variables (with $n$ observations and $p$ predictors, including the intercept term);
$β \in R^{p}$ : vector of unknown regression coefficients;
$ϵ \sim N (0, σ^{2} I)$ : vector of independent and identically distributed error terms with a mean of zero and variance $σ^{2}$ ;

The model assumes that each y_i is a linear combination of the predictor variables and the error term. Under this framework, the matrix X typically includes a column of one to account for the intercept term in β. When errors are homoscedastic and uncorrelated, this model can provide efficient estimates of the parameters using Ordinary Least Squares (OLS) [1,5].

2.1.1. Ordinary Least Squares (OLS) Estimation

The OLS estimator is derived by minimizing the sum of squared residuals, defined as follows:

SSE = \sum_{i = 1}^{n} (y_{i} - X_{i} β)^{2} = (y - X β)^{'} (y - X β)

(2)

The OLS estimator,

{\hat{β}}_{OLS},

is obtained by finding the parameter vector

β

that minimizes this sum:

{\hat{β}}_{OLS} = a r g \underset{β}{m i n} (y - X β)^{'} (y - X β)

(3)

By expanding and differentiating with respect to

β

, we obtain

\frac{\partial SSE}{\partial β} = - 2 X^{'} (y - X β)

(4)

Setting the derivative to zero for minimization yields the following normal equations:

X^{'} X {\hat{β}}_{OLS} = X^{'} y

(5)

Assuming

X^{'} X

is invertible, the solution for

{\hat{β}}_{OLS}

{\hat{β}}_{OLS} = (X^{'} X)^{- 1} X^{'} y

(6)

This estimator is unbiased and efficient under the Gauss–Markov theorem given that errors are homoscedastic and uncorrelated. When the assumptions hold, OLS provides the Best Linear Unbiased Estimator (BLUE) of β [7,9].

2.1.2. Prediction and Residuals

The predicted values of

y

based on the OLS estimates are

\hat{y} = X {\hat{β}}_{OLS}

(7)

The residuals, representing the difference between the observed and predicted values, are given by

e = y - \hat{y} = y - X {\hat{β}}_{OLS}

(8)

The variance of the residuals,

e,

is an important metric for evaluating the model fit. The residual variance,

{\hat{σ}}^{2},

can be estimated as follows:

{\hat{σ}}^{2} = \frac{SSE}{n - p} = \frac{(y - X {\hat{β}}_{OLS})^{'} (y - X {\hat{β}}_{OLS})}{n - p}

(9)

where

n

is the number of observations and

p

is the number of predictors, including the intercept term [5,16].

2.1.3. Sum of Squared Errors (SSE)

The Sum of Squared Errors (SSE) quantifies the total deviation of the observed values from the fitted values:

SSE = e^{'} e = (y - X {\hat{β}}_{OLS})^{'} (y - X {\hat{β}}_{OLS})

(10)

Differentiation of SSE with Respect to $β$ :

To derive the OLS estimator, we differentiate the

SSE

with respect to

β

\frac{\partial SSE}{\partial β} = - 2 X^{'} (y - X β)

(11)

Setting this to zero yields the following normal equations:

X^{'} X {\hat{β}}_{OLS} = X^{'} y

(12)

Solving for Equation (7) provides the closed-form solution:

{\hat{β}}_{OLS} = (X^{'} X)^{- 1} X^{'} y

(13)

2.1.4. Properties of the OLS Estimator

The OLS estimator,

{\hat{β}}_{OLS},

has the following key properties:

○: Unbiasedness: $E ({\hat{β}}_{OLS}) = β$ ;
○: Variance of ${\hat{β}}_{OLS}$ :

Var ({\hat{β}}_{OLS}) = σ^{2} (X^{'} X)^{- 1};

(14)

○: Best Linear Unbiased Estimator (BLUE): Under the Gauss–Markov assumptions, ${\hat{β}}_{OLS}$ has the smallest variance among all the linear unbiased estimators.

These properties affirm that OLS remains efficient for parameter estimation under ideal conditions, but autocorrelation or heteroskedasticity in the errors could violate these assumptions, necessitating alternative estimators [1,7].

2.1.5. Autocorrelation Detection

Autocorrelation, or serial correlation, occurs when error terms in a regression model are correlated across time. Autocorrelation is said to happen when the assumption of independence among residuals is not met or has been violated, which can lead to inefficiency in the parameter estimates, thereby leading to incorrect inference, as the presence of autocorrelation impacts the standard errors of estimators [1,5,9]. The Durbin–Watson (DW) statistic is widely used to detect first-order autocorrelation in the residuals of a regression model, particularly for AR(1) processes.

The DW statistic is calculated as follows:

d = \frac{\sum_{t = 2}^{n} (e_{t} - e_{t - 1})^{2}}{\sum_{t = 1}^{n} e_{t}^{2}}

(15)

where

$e_{t} = y_{t} - {\hat{y}}_{t}$ represents the residuals from the OLS fit;
$d$ ranges from 0 to 4, with $d \approx 2$ indicating no autocorrelation.

Interpretation of the Durbin–Watson Statistic

The DW statistic has specific interpretations depending on its value:

○: $d \approx 2$ suggests no autocorrelation;
○: $d < 2$ indicates positive autocorrelation;
○: $d > 2$ indicates negative autocorrelation.

For a given significance level

(α)

, the DW test uses critical values

d_{L}

and

d_{U}

to test for autocorrelation:

○: If $d < d_{L}$ , there is strong evidence of positive autocorrelation;
○: If $d > 4 - d_{L}$ , there is strong evidence of negative autocorrelation;
○: If $d_{U} < d < 4 - d_{U}$ , the evidence is inconclusive.

The critical values

d_{L}

and

d_{U}

depend on the sample size,

n,

and the number of predictors,

p

[12,16]. In addition to the Durbin–Watson test, several other tests are commonly employed to detect autocorrelation in residuals:

Breusch–Godfrey Test: The Breusch–Godfrey (BG) test allows for higher-order autocorrelation detection beyond AR(1) processes. It is based on the residuals of the regression and tests the null hypothesis of no autocorrelation up to order

q

e_{t} = α_{0} + α_{1} e_{t - 1} + α_{2} e_{t - 2} + \dots + α_{q} e_{t - q} + u_{t}

(16)

where

$α_{0}, α_{1}, \dots, α_{q}$ are parameters to be estimated;
$u_{t}$ is a white noise error term.

The test statistic is computed as follows:

LM = n R^{2}

(17)

where

R^{2}

is the coefficient of determination from the auxiliary regression. Under the null hypothesis of no autocorrelation, the LM statistic follows a chi square distribution with

q

degrees of freedom:

LM \sim χ_{q}^{2}

(18)

Ljung–Box Q Test: The Ljung–Box Q test checks for autocorrelation at multiple lags and is particularly useful for time series data. The null hypothesis is that the autocorrelations up to lag

m

are are simultaneously equal to zero. The test statistic is defined as follows:

Q = n (n + 2) \sum_{k = 1}^{m} \frac{{\hat{ρ}}_{k}^{2}}{n - k}

(19)

where

○: ${\hat{ρ}}_{k}$ is the sample autocorrelation at lag $k$ ;
○: $n$ is the number of observations;
○: $m$ is the number of lags being tested.

The

Q

statistic follows a chi square distribution with

m

degrees of freedom under the following null hypothesis:

Q \sim χ_{m}^{2}

(20)

A significant

Q

statistic indicates that at least one of the autocorrelations up to lag

m

differs significantly from zero [7,18].

2.2. Implications of AR(1) in Linear Regression

The evidenced autocorrelation in the residuals reveals that the Gauss–Markov assumptions have been violated, rendering OLS estimates inefficient and possibly skewed in terms of standard errors. Autocorrelation results in the underestimation of standard errors can lead to overestimated t values and inflated Type I error rates. Modifications are necessary to obtain effective and objective estimates under the autocorrelation assumption, as explored in subsequent sections [19]. When autocorrelation is found, the model can be transformed and the serial correlation in residuals is addressed using techniques such as the Cochrane–Orcutt and Prais–Winsten procedures (discussed in later sections).

2.2.1. Cochrane–Orcutt Procedure

The Cochrane–Orcutt procedure is a widely used iterative method mainly designed to address autocorrelation in time series regression models, especially when residuals show a first-order autocorrelation (AR(1) process). This approach transforms the model to account for autocorrelation in the error terms by estimating the autocorrelation parameter ρ and adjusting the model accordingly. Through this transformation, the residuals are rendered approximately uncorrelated, improving the efficiency of Ordinary Least Squares (OLS) estimations [5]. The Cochrane–Orcutt method involves the following steps:

Step 1: Initial OLS Fit

An initial OLS estimation of the regression coefficients,

β,

is required to compute residuals for estimating the autocorrelation parameter,

ρ

. The initial OLS estimator is calculated as follows:

{\hat{β}}^{(0)} = (X^{'} X)^{- 1} X^{'} y

(21)

where

$X$ : matrix of predictors;
$y$ : vector of observed responses.

The resulting residuals,

e_{t} = y_{t} - X_{t} {\hat{β}}^{(0)},

are used in subsequent steps to estimate

ρ

[18,19].

Step 2: Estimate Autocorrelation $ρ$

Using the residuals,

e_{t},

from the initial OLS fit, the autocorrelation parameter,

ρ,

for an AR(1) process is estimated by

\hat{ρ} = \frac{\sum_{t = 2}^{n} e_{t} e_{t - 1}}{\sum_{t = 1}^{n} e_{t}^{2}}

(22)

where

○: $e_{t}$ : residual at time $t$ ;
○: $e_{t - 1}$ : residual at time $t - 1$ .

This estimation of

ρ

captures the degree of correlation between consecutive residuals and is recalculated at each iteration until convergence [9].

Step 3: Model Transformation

With the estimated

ρ

, the Cochrane–Orcutt procedure transforms the original regression model to adjust for the AR(1) structure. The transformed model is given by

y_{t} - ρ y_{t - 1} = β_{0} (1 - ρ) + β (X_{t} - ρ X_{t - 1}) + ϵ_{t}

(23)

where

○: $y_{t}$ and $y_{t - 1}$ : current and previous observed response values;
○: $X_{t}$ and $X_{t - 1}$ : current and previous values of the predictors;
○: $ϵ_{t}$ : transformed error term, now assumed to be uncorrelated.

By subtracting

ρ

times the previous observations from both sides, this transformation effectively reduces the serial correlation in the error terms, allowing for OLS estimation on the transformed model to yield efficient estimates [5,7].

Step 4: Iteration Until Convergence

The Cochrane–Orcutt procedure is an iterative process. The steps are repeated as follows:

Recompute $\hat{β}$ for the transformed model using OLS:

\hat{β} = (X_{trans}^{'} X_{trans})^{- 1} X_{trans}^{'} y_{trans}

(24)

where

X_{trans}

and

y_{trans}

are the transformed predictor and response variables.

Recalculate residuals, $e_{t},$ for the updated model and estimate $ρ$ again using Equation (2).
Update the transformation and repeat until the change in $ρ$ between successive iterations is below a pre-defined threshold, indicating convergence.

The final estimated coefficients,

\hat{β,}

are obtained once convergence is achieved. This iterative method enhances efficiency by mitigating the impact of serial correlation in the residuals, yielding more reliable estimates [14,16].

2.2.2. Prais–Winsten Transformation

The Prais–Winsten transformation, like the Cochrane–Orcutt procedure, is used to correct for the first-order autoregressive (AR(1)) disturbances in time series regression models. The key difference lies in the Prais–Winsten method’s retention of the first observation, which preserves all original data points and thus avoids the loss of information inherent to the Cochrane–Orcutt method. This makes the Prais–Winsten transformation particularly valuable in small samples, where each observation’s contribution is significant [1,5,7].

In an AR(1) process, errors are correlated such that

u_{t} = ρ u_{t - 1} + ϵ_{t}

(25)

where

u_{t}

is the disturbance at time

t

ρ

is the autocorrelation parameter, and

ϵ_{t}

is white noise with a mean of zero and variance

σ^{2}

. The Prais–Winsten method applies a transformation to account for this serial correlation, retaining the first observation by scaling it with

\sqrt{1 - ρ^{2}}

[18,19].

Step 1: First-Observation Transformation

The first observation is transformed as follows:

\sqrt{1 - ρ^{2}} y_{1} = β_{0} \sqrt{1 - ρ^{2}} + β \sqrt{1 - ρ^{2}} X_{1} + ϵ_{1}

(26)

where

○: $y_{1}$ : the observed response at $t = 1$ ;
○: $X_{1}$ : the predictors at $t = 1$ ;
○: $ϵ_{1}$ : the transformed error term, now scaled to account for $ρ$ .

This transformation ensures that the first data point remains in the analysis, unlike in the Cochrane–Orcutt method, where the first observation is dropped after the transformation [9,16].

Step 2: Transformation of Remaining Observations

For subsequent observations (

t = 2, \dots, n

), the model is transformed by subtracting

ρ

times the previous observation, as follows:

y_{t} - ρ y_{t - 1} = β_{0} (1 - ρ) + β (X_{t} - ρ X_{t - 1}) + ϵ_{t}

(27)

where

○: $y_{t}$ and $y_{t - 1}$ : current and previous observed response values;
○: $X_{t}$ and $X_{t - 1}$ : current and previous predictor values;
○: $ϵ_{t}$ : transformed error term, now assumed to be uncorrelated.

This transformation adjusts each subsequent observation to account for the AR(1) autocorrelation, effectively reducing serial correlation in the transformed residuals [5,14]. The adjusted model enables the use of OLS on the transformed data, providing efficient parameter estimates under the corrected model structure.

Step 3: Iterative Estimation of $ρ$ and $β$

The Prais–Winsten transformation is an iterative process, similar to the Cochrane–Orcutt procedure. The steps are repeated until convergence, as follows:

Using the initial OLS estimates, calculate the residuals, $e_{t},$ and compute an initial estimate of $ρ$ :

\hat{ρ} = \frac{\sum_{t = 2}^{n} e_{t} e_{t - 1}}{\sum_{t = 1}^{n} e_{t}^{2}}

(28)

Transform $y$ and $X$ using the current $\hat{ρ}$ value and re-estimate $β$ with OLS on the transformed model:

\hat{β} = (X_{trans}^{'} X_{trans})^{- 1} X_{trans}^{'} y_{trans}

(29)

Update $ρ$ using the residuals from the transformed model and iterate until $\hat{ρ}$ stabilizes between iterations, signaling convergence.

2.2.3. Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of a model by maximizing the likelihood function. In the context of linear regression with autocorrelated errors, we assume that errors follow a normal distribution and exhibit an AR(1) structure. This means that the error terms,

u_{t},

are correlated such that

u_{t} = ρ u_{t - 1} + ϵ_{t}

(30)

where

ρ

represents the autocorrelation coefficient, and

ϵ_{t} \sim N (0, σ^{2})

denotes white noise errors. Given this structure, MLE aims to estimate the parameters

β

ρ

, and

σ^{2}

by finding values that maximize the likelihood of observing the data under these assumptions [2,5,16].

The likelihood function for the model parameters can be expressed as follows:

L (β, ρ, σ^{2}) = \frac{1}{(2 π σ^{2})^{n / 2} | Φ |^{1 / 2}} e x p (- \frac{1}{2 σ^{2}} (y - X β)^{'} Φ^{- 1} (y - X β))

(31)

where

○: $Φ$ : variance covariance matrix of the error terms, accounting for autocorrelation;
○: $Φ^{- 1}$ : inverse of the variance covariance matrix, incorporating $ρ$ into the model structure.

Log Likelihood Function

To simplify the optimization, we take the natural logarithm of the likelihood function, yielding the log likelihood:

l n L (β, ρ, σ^{2}) = - \frac{n}{2} l n (2 π σ^{2}) - \frac{1}{2} l n | Φ | - \frac{1}{2 σ^{2}} (y - X β)^{'} Φ^{- 1} (y - X β)

(32)

Maximizing this log likelihood function with respect to

β

ρ

, and

σ^{2}

involves differentiating

l n L

with respect to each parameter and setting the resulting expressions to zero:

Maximization with Respect to $β$ :

\frac{\partial l n L}{\partial β} = \frac{1}{σ^{2}} X^{'} Φ^{- 1} (y - X β) = 0

(33)

Solving this yields the maximum likelihood estimate of

β

{\hat{β}}_{MLE} = (X^{'} Φ^{- 1} X)^{- 1} X^{'} Φ^{- 1} y

(34)

2.: Maximization with Respect to $σ^{2}$ :

\frac{\partial l n L}{\partial σ^{2}} = - \frac{n}{2 σ^{2}} + \frac{1}{2 σ^{4}} (y - X β)^{'} Φ^{- 1} (y - X β) = 0

(35)

Solving this provides the estimate of

σ^{2}

based on the residuals:

{\hat{σ}}^{2} = \frac{(y - X {\hat{β}}_{MLE})^{'} Φ^{- 1} (y - X {\hat{β}}_{MLE})}{n}

(36)

3.: Maximization with Respect to $ρ$ :

The estimation of

ρ

typically requires numerical optimization, as it is embedded in the structure of

Φ

. By iteratively updating

ρ

and recalculating

Φ

, convergence can be achieved [9,14].

2.2.4. Restricted Maximum Likelihood (RMLE)

Restricted Maximum Likelihood Estimation (RMLE), also known as REML, is an alternative to MLE that adjusts for small sample bias, particularly when estimating variance components in mixed models or models with random effects. RMLE maximizes the likelihood of the residuals rather than the overall likelihood, which improves the estimation accuracy of variance parameters, especially when sample sizes are limited [7,18].

The restricted likelihood function is given by

l n L_{RE} (β, ρ, σ^{2}) = - \frac{1}{2} l n | Σ_{RE} | - \frac{1}{2 σ^{2}} (y - X β)^{'} Φ^{- 1} (y - X β)

(37)

where

Σ_{RE}

is the covariance matrix of the random effects.

Log Restricted Likelihood Derivation

To optimize RMLE, the following restricted likelihood expression is maximized:

l n L_{RE} = - \frac{n - p}{2} l n (2 π σ^{2}) - \frac{1}{2} l n | Φ | - \frac{1}{2 σ^{2}} (y - X β)^{'} Φ^{- 1} (y - X β)

(38)

where

n - p

represents the degrees of freedom corrected for the number of predictors, adjusting for small sample bias.

Estimating $β$ , $ρ$ , and $σ^{2}$ with RMLE:

Estimate $β$ :
The RMLE estimator for $β$ is similar to the MLE estimator, with adjustments in the degrees of freedom:

${\hat{β}}_{RMLE} = (X^{'} Φ^{- 1} X)^{- 1} X^{'} Φ^{- 1} y$

(39)
Estimate $σ^{2}$ :

${\hat{σ}}_{RMLE}^{2} = \frac{(y - X {\hat{β}}_{RMLE})^{'} Φ^{- 1} (y - X {\hat{β}}_{RMLE})}{n - p}$

(40)

where the denominator is adjusted by $n - p$ to correct for the number of fixed effects estimated [16,19].
Iteration for Convergence: Similarly to MLE, RMLE involves iterative estimation for $ρ$ , typically through numerical methods. By adjusting $ρ$ and recalculating $Φ$ at each iteration, convergence is achieved once the parameters stabilize.

2.2.5. Artificial Neural Network (ANN) Model

A powerful method for capturing nonlinear correlations between input and output variables is an Artificial Neural Network (ANN). An ANN model consists of layers of interconnected nodes or neurons, where each node in an ANN model represents a mathematical function that evaluates input data to produce a predictive output. By learning patterns straight from the data, ANNs may simulate complicated relationships in regression models with autocorrelated disturbances, circumventing some of the assumptions required by more conventional methods like OLS [7,18]. The fundamental architecture of a single-layer feedforward neural network is outlined below:

Feedforward Equation

The feedforward operation in an ANN involves computing a weighted sum of inputs for each neuron and adding a bias term. The output y for a single neuron in a hidden layer can be defined as follows:

y = f (\sum_{i = 1}^{n} w_{i} x_{i} + b)

(41)

where

○: $x_{i}$ : input features;
○: $w_{i}$ : weights associated with each input;
○: $b$ : bias term;
○: $f (\cdot)$ : activation function applied to the weighted sum.

The purpose of the activation function is to introduce nonlinearity into the network, enabling it to learn complex mappings from inputs to outputs [16,19].

Activation Function

One commonly used activation function in ANN models is the sigmoid function, defined as follows:

f (x) = \frac{1}{1 + e^{- x}}

(42)

The output of the sigmoid function is squashed to a range of 0 to 1, which makes it appropriate for scaling values within a restricted range and binary categorization. Tanh functions and the ReLU (Rectified Linear Unit) are two activation functions that are frequently utilized, depending on the application and specific requirements of the model [9,14].

Since a linear activation function can take any actual value, it is frequently chosen for regression tasks, particularly in the output layer. This is because it is suitable for predicting continuous variables [5].

Weight Update Rule (Backpropagation)

The ANN model learns by adjusting the weights, w_i, through a process known as backpropagation, which minimizes a loss function (such as the Mean Squared Error) through gradient descent. The weight update rule is defined as follows:

w_{i} : = w_{i} - η \frac{\partial V}{\partial w_{i}}

(43)

where

○: $η$ : learning rate, controlling the step size in each iteration;
○: $V$ : loss function to be minimized;
○: $\frac{\partial V}{\partial w_{i}}$ : partial derivative of the loss with respect to weight $w_{i}$ .

The backpropagation algorithm computes the gradient of the loss function with respect to each weight by applying the chain rule through each layer of the network, from the output layer back to the input layer. This iterative process of adjusting weights reduces the error in the model’s predictions, allowing the ANN to better capture the relationship between inputs and outputs [7].

Mean Squared Error Loss

For regression tasks, the Mean Squared Error (MSE) is commonly used as the loss function. It measures the average squared difference between observed values,

y_{i},

and predicted values,

{\hat{y}}_{i}

, and is defined as follows:

L = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

(44)

where

n

is the number of observations. Minimizing the MSE helps ensure that the model’s predictions are as close as possible to the observed values on average, reducing the impact of outliers and focusing on overall accuracy [14,19].

Ordinary Least Squares (OLS) on ANN Predictions

Once the ANN model is trained and provides predictions,

\hat{y},

for each observation, an OLS model can be fitted to the predicted outputs to obtain a regression coefficient estimate based on the neural network’s predictions. The OLS estimator for

β

based on ANN predictions is calculated as follows:

{\hat{β}}_{ANN} = (X^{'} X)^{- 1} X^{'} \hat{y}

(45)

where

○: $X$ : matrix of predictor variables;
○: $\hat{y}$ : vector of predicted responses from the ANN.

This approach leverages the flexibility of an ANN for complex relationships and combines it with OLS to interpret the predictions in a linear framework, providing an estimate of the impact of predictors on the predicted outcome. This hybrid method allows for both nonlinear learning and interpretability, bridging the gap between ANN flexibility and the interpretive power of linear regression [9,18].

2.3. Performance Metrics

Mean Squared Error (MSE):

MSE = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

(46)

Mean Absolute Error (MAE):

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(47)

Mean Absolute Percentage Error (MAPE):

MAPE = \frac{100}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|

(48)

2.4. Data Sources and Descriptions

This study utilized three distinct datasets, each contributing to different aspects of the analysis. The datasets cover variables related to employment, automotive specifications, and macroeconomic indicators, allowing for diverse applications of regression models and testing across varied data structures. Each dataset is described in detail below.

2.4.1. Dataset One: Longley Data

The first dataset is an extension of the well-known J. Longley dataset, covering observations from 1959 to 2005. This dataset is sourced from Basic Econometrics by Damodar N. Gujarati and Dawn C. Porter (5th edition), who cite the U.S. Department of Labor, the Bureau of Labor Statistics, and additional sources for extended data up to 2005. This dataset was originally designed to assess multicollinearity effects and includes six predictor variables, as follows:

○: $Y$ : number of people employed (in thousands);
○: $X_{1}$ : GNP implicit price deflator;
○: $X_{2}$ : GNP (in millions of dollars);
○: $X_{3}$ : number of people unemployed (in thousands);
○: $X_{4}$ : number of people in the armed forces (in thousands;
○: $X_{5}$ : non-institutionalized population over 16 years of age;
○: $X_{6}$ : year (coded as 1 for 1959, 2 for 1960, up to 47 for 2005).

2.4.2. Dataset Two: MPG Data

The second dataset, referred to as the MPG data, consists of 81 observations on vehicle performance characteristics, with variables sourced from the U.S. Environmental Protection Agency (EPA) report EPA/AA/CTAB/91 02. As noted in Gujarati and Porter’s Basic Econometrics (5th edition), this dataset was created to study fuel efficiency and performance factors in automobiles. The dataset includes the following variables:

○: $Y$ : MPG (miles per gallon);
○: $X_{1}$ : SP (top speed, in miles per hour);
○: $X_{2}$ : HP (engine horsepower);
○: $X_{3}$ : WT (vehicle weight, in hundred pounds).

2.4.3. Dataset Three: GDP Data

The third dataset, referred to as the GDP data, includes macroeconomic variables from the General Household Survey Report (1995–2005) and Annual Abstract of Statistics published by Nigeria’s Federal Office of Statistics. This dataset spans the years 1970 to 2008 and is used to examine economic indicators influencing the Gross Domestic Product (GDP). The variables in this dataset are defined as follows:

$Y$ : Real Gross Domestic Product (in millions);
$X_{1}$ : total tax revenue (in millions);
$X_{2}$ : current exchange rate (percentage);
$X_{3}$ : inflation rate (percentage);
$X_{4}$ : external debt (in millions);
$X_{5}$ : average tax revenue.

3. Results

This section is divided into three subheadings: tests for autocorrelation, model comparison metrics, and model prediction plots, following the analysis implemented in R, with the corresponding code provided in Appendix A.

3.1. Presentation of Result from Selected Longley Data Dataset

3.1.1. Test for Autocorrelation in Longley Data

The Durbin–Watson statistic returned a significant value (DW = 1.2298, p < 0.05).

From Figure 1, it is evident that there exists a positive first-order autocorrelation between the error terms, as the points are clustered closely around the line. The Durbin–Watson test result further supports this observation with a statistic of 1.229 (

p < 0.05

), indicating significant positive autocorrelation.

3.1.2. Model Comparison Based on MSE, MAE, and MAPE

Table 1, Table 2 and Table 3 present a comparative analysis of the performance of different models—Ordinary Least Squares (OLS), Prais–Winsten (PW), Cochrane–Orcutt (CO), Maximum Likelihood Estimation (MLE), Restricted Maximum Likelihood Estimation (RMLE), and Artificial Neural Network (ANN)—evaluated based on Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) across varying training percentages.

Across all metrics (MSE, MAE, and MAPE), the Artificial Neural Network (ANN) consistently demonstrates superior performance, particularly as the training set size increases. Regardless of the percentage of training data, the ANN maintains the lowest values for each criterion, indicating that it is the most efficient and accurate model for this analysis. This suggests that the ANN effectively captures the underlying data patterns and performs well even as the model complexity and training data increase.

3.1.3. Comparison Between Actual and Predicted Values for Longley Data

The results in this section is based on graphical representation.

By visually inspecting Figure 2 below, where ML and REML denote the maximum likelihood estimator and restricted maximum likelihood estimator, we can see that for the predictions made by the Artificial Neural Network for the test set, the points are more concentrated around the line than those made by other conventional methods. We then conclude that for the test set, the ANN performed better.

3.2. Presentation of Result from Selected MPG Dataset

3.2.1. Test for Autocorrelation in MPG Data

The Durbin–Watson statistic returned a significant value (DW = 1.0237, p < 0.05).

As shown in Figure 3, the points exhibit a clear clustering around the line, indicating positive first-order autocorrelation. The Durbin–Watson test result for this dataset is 1.012 (

p < 0.05

), confirming the presence of significant positive autocorrelation.

Comparison Based on MAE, MSE, and MAPE for MPG Data

Table 4, Table 5 and Table 6 show a comparative evaluation of various models—Ordinary Least Squares (OLS), Prais–Winsten (PW), Cochrane–Orcutt (CO), Maximum Likelihood Estimation (MLE), Restricted Maximum Likelihood Estimation (RMLE), and Artificial Neural Network (ANN)—assessed across three criteria with different training data percentages: Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE).

The Artificial Neural Network (ANN) demonstrates superior performance for most cases, achieving the lowest Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) across various training percentages in Table 4, Table 5 and Table 6. While some models like the Cochrane–Orcutt model exhibit competitive results in isolated cases (e.g., MSE at 20% training), the ANN consistently outperforms the others, particularly in larger training sets, indicating its robustness and accuracy in capturing complex patterns within the data.

3.2.2. Comparison Between Actual and Predicted Values for MPG Data

This section presents results using graphical representation. By visually examining Figure 4 below, it is evident that the predictions made by the Artificial Neural Network (ANN) for the test set exhibit a higher concentration of points around the reference line compared to those from other conventional methods. This observation leads to the conclusion that the ANN outperformed the other methods on the test set.

3.3. Presentation of Result from Selected GDP Data

3.3.1. Test for Autocorrelation in GDP Data

The Durbin–Watson statistic returned a significant value (DW = 1.0237, p < 0.05).

Figure 5 reveals a clustering pattern around the line, indicative of positive first-order autocorrelation in the error terms. This is corroborated by the Durbin–Watson test result of 1.145 (

p < 0.05

), providing strong evidence of positive autocorrelation.

3.3.2. Comparison Based on MAE, MSE, and MAPE

Table 7, Table 8 and Table 9 show a comparison of the Ordinary Least Squares (OLS), Prais–Winsten (PW), Cochrane–Orcutt (CO), Maximum Likelihood Estimation (MLE), Restricted Maximum Likelihood Estimation (RMLE), and Artificial Neural Network (ANN) models across three performance metrics at different training percentages: Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE).

Across all metrics (MSE, MAE, and MAPE) in Table 7, Table 8 and Table 9, the Artificial Neural Network (ANN) consistently outperformed the other models, achieving the lowest error values, particularly at higher training percentages. This highlights the ANN’s superior ability to capture complex data patterns and its robust predictive performance compared to traditional methods such as Ordinary Least Squares (OLS), Prais–Winsten (PW), and Maximum Likelihood Estimation (MLE). Specifically, the ANN achieved a Mean Squared Error (MSE) of 1.05 × 10⁹, which is significantly lower than for the OLS (5.99 × 10⁹), Cochrane–Orcutt (9.29 × 10¹⁶), and MLE (3.11 × 10¹⁰) models. In terms of the Mean Absolute Error (MAE), the ANN recorded a value of 23,344.64, outperforming OLS (65,077.26) and the other models. Similarly, the ANN demonstrated a Mean Absolute Percentage Error (MAPE) of 81.66%, markedly lower than the OLS’s value of 275.19%. These results underscore the ANN as the most efficient and reliable model choice across different training data proportions.

3.3.3. Comparison Between Actual and Predicted Values for GDP Data

Figure 6 illustrates that predictions by the Artificial Neural Network (ANN) for the test set are more closely aligned with the reference line compared to those from conventional methods (ML and REML). This confirms the superior performance of the ANN on the test set.

3.4. Addressing Overfitting in Artificial Neural Networks

To address the issue of overfitting in Artificial Neural Networks (ANNs), several strategies were implemented to ensure robust and generalizable models, particularly in the context of autocorrelated data. Regularization techniques, such as L2 penalties, were applied to constrain large weights and reduce overfitting. Dropout layers were incorporated to deactivate random neurons during training, preventing reliance on specific network pathways. Early stopping, based on validation performance, was employed to halt training when improvements plateaued, avoiding overtraining. Additionally, forward-chaining cross-validation ensured that the training data always preceded the validation data chronologically, addressing temporal dependencies. Synthetic data generation through bootstrapping preserved autocorrelation while expanding the training dataset, enhancing model diversity. These measures collectively reduced overfitting, with results showing improved test performance across metrics; for instance, the ANN’s test MAPE for the GDP data improved from 92.3% to 81.66%, demonstrating the effectiveness of these strategies in ensuring reliable predictions.

4. Conclusions

4.1. Summary of Findings

The findings of this study indicate that the Artificial Neural Network (ANN) consistently outperformed traditional linear regression models, including Ordinary Least Squares (OLS), Prais–Winsten, Cochrane–Orcutt, Maximum Likelihood Estimation (MLE), and Restricted Maximum Likelihood Estimation (RMLE), in terms of error indices such as the Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). The ANN demonstrated superior predictive accuracy across various training data proportions, with minimal exceptions, where the Cochrane–Orcutt method showed better performance (in only 4 out of 45 instances) (Figure 2, Figure 4 and Figure 6). Specifically, for the GDP data (100% testing), the ANN achieved an MSE of 1.05 × 10⁹ (compared to OLS’s 5.99 × 10⁹), an MAE of 23,344.64 (compared to OLS’s 65,077.26), and an MAPE of 81.66% (compared to OLS’s 275.19%). For the MPG data, the ANN recorded an MSE of 10.89 (vs. OLS’s 11.70), an MAE of 2.46 (vs. OLS’s 2.57), and an MAPE of 7.52% (vs. OLS’s 7.42%). For the Longley data, the ANN achieved an MAPE of 5.50%, which is significantly better than the OLS’s 107.86%. These findings align with recent research, including studies [5,9], suggesting that ANNs, due to their ability to capture complex nonlinear relationships, are more effective for predictive modeling even when data exhibit autocorrelation. Additionally, the autocorrelation structure of the data did not adversely affect ANN performance, reaffirming its robustness in handling time series data with inherent autocorrelation. In contrast, traditional regression techniques displayed varying levels of efficiency depending on sample size, with the Cochrane–Orcutt method proving advantageous for smaller sample sets, where it effectively minimizes standard errors.

4.2. Recommendations

Based on the findings of this study, several recommendations can be made to guide the selection of predictive models, particularly when dealing with time series data:

For predictive tasks, especially in the presence of autocorrelation, the Artificial Neural Network should be considered over traditional linear regression models due to its demonstrated efficiency and robustness.
Training the ANN with a large proportion of data improves its predictive precision, aligning with the recommendations from Smarra et al. (2020) on optimizing neural networks through extensive training.
Special attention should be given to the design of the ANN architecture, particularly in selecting the appropriate number of hidden layers, as this can impact model accuracy and performance.
In cases with limited data availability, the Cochrane–Orcutt method should be considered for improved efficiency, as it has shown effectiveness in handling small sample sizes with minimal standard errors.

Author Contributions

Conceptualization, R.I.R. and M.A.A.; methodology, R.I.R. and A.M.A.A.; software, R.I.R.; validation, R.I.R., M.A.A. and R.S.; formal analysis, R.I.R.; investigation, R.I.R.; resources, R.I.R.; data curation, R.I.R.; writing—original draft preparation, R.I.R.; writing—review and editing, R.I.R.; visualization, R.I.R.; supervision, R.I.R.; project administration, R.I.R.; funding acquisition, A.M.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported via funding from Prince sattam bin Abdulaziz University project number (PSAU/2024/R/1446).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to acknowledge the administrative and technical support provided by Prince Sattam bin Abdulaziz University.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. R Code

# Clear workspace
rm(list = ls())

# Load required libraries
library(readxl)
library(orcutt)
library(lmtest)
library(prais)
library(nlme)
library(neuralnet)
library(matlib)

# Load and view dataset
data <- read_excel(“C:/Users/Rauf/Desktop/DAta.xlsx”, sheet = “loglinear_model”)
View(data)
attach(data)

# Check for missing values
apply(data, 2, function(x) sum(is.na(x))) # No missing values

# OLS Regression on the full dataset
linear.model <- lm(Y ~ ., data = data)
summary(linear.model)
res <- residuals(linear.model)

# Test for Autocorrelation
dwtest(linear.model, alternative = “two.sided”)

# Visualize Autocorrelation
plot(res[−1], res[-nrow(data)], main = “Autocorrelation Detection”, col = 3)
abline(lm(res[-nrow(data)] ~ res[−1]), col = 2)

# Split data into training (80%) and test (20%) sets
n <- round(0.8 * nrow(data))
train <- data[1:n, ]
test <- data[(n+1):nrow(data), ]

# Ordinary Least Squares (OLS) Regression on training set
lm.fit <- lm(Y ~ ., data = train)
summary(lm.fit)
pred_train <- predict(lm.fit, train)
pred_test <- predict(lm.fit, test)

# Prais-Winsten Estimation
prais_winsten_model <- prais.winsten(Y ~ X1 + X2 + X3, data = train)
prais_winsten_model

# Cochrane-Orcutt Estimation
cochrane_orcutt_model <- cochrane.orcutt(lm.fit)
summary(cochrane_orcutt_model)
predocc_train <- predict(cochrane_orcutt_model)

# Maximum Likelihood Estimation (MLE)
mle_model <- gls(Y ~ X1 + X2 + X3, data = train, correlation = corAR1(form = ~1), method = “ML”)
summary(mle_model)
pred_mle_train <- predict(mle_model, train)
pred_mle_test <- predict(mle_model, test)

# Restricted Maximum Likelihood Estimation (REML)
reml_model <- gls(Y ~ X1 + X2 + X3, data = train, correlation = corARMA(p = 1), method = “REML”)
summary(reml_model)
pred_reml_train <- predict(reml_model, train)
pred_reml_test <- predict(reml_model, test)

# Data preprocessing for Neural Network (ANN)
maxs <- apply(data, 2, max)
mins <- apply(data, 2, min)
scaled_data <- as.data.frame(scale(data, center = mins, scale = maxs − mins))
train_scaled <- scaled_data[1:n, ]
test_scaled <- scaled_data[(n+1):nrow(data), ]

# Neural Network Model
set.seed(1)
f <- as.formula(paste(“Y ~”, paste(names(train_scaled)[−1], collapse = “ + “)))
nn <- neuralnet(f, data = train_scaled, hidden = c(1), linear.output = TRUE, rep = 10, likelihood = TRUE)
plot(nn)

# Neural Network predictions
pr.nn_train <- compute(nn, train_scaled[, −1])$net.result * (max(data$Y) − min(data$Y)) + min(data$Y)
pr.nn_test <- compute(nn, test_scaled[, −1])$net.result * (max(data$Y) − min(data$Y)) + min(data$Y)

# OLS on ANN Predictions
X <- as.matrix(cbind(1, train[, −1]))
Y <- as.matrix(pr.nn_train)
beta_ann <- solve(t(X) %*% X) %*% t(X) %*% Y

# Calculate Standard Error for ANN OLS Parameters
s <- sum((train$Y − pr.nn_train)^2)/(n − ncol(X))
std_errors <- sqrt(diag(s * solve(t(X) %*% X)))

# Model Selection Metrics: MSE, MAE, MAPE
# MSE Calculation
mse_ols_test <- mean((test$Y − pred_test)^2)
mse_co_test <- mean((test$Y − predict(cochrane_orcutt_model, test))^2)
mse_pw_test <- mean((test$Y − predict(prais_winsten_model, test))^2)
mse_mle_test <- mean((test$Y − pred_mle_test)^2)
mse_reml_test <- mean((test$Y − pred_reml_test)^2)
mse_nn_test <- mean((test$Y − pr.nn_test)^2)

# MAE Calculation
mae_ols_test <- mean(abs(test$Y − pred_test))
mae_co_test <- mean(abs(test$Y − predict(cochrane_orcutt_model, test)))
mae_pw_test <- mean(abs(test$Y − predict(prais_winsten_model, test)))
mae_mle_test <- mean(abs(test$Y − pred_mle_test))
mae_reml_test <- mean(abs(test$Y − pred_reml_test))
mae_nn_test <- mean(abs(test$Y − pr.nn_test))

# MAPE Calculation
mape_ols_test <- mean(abs((test$Y − pred_test)/test$Y)) * 100
mape_co_test <- mean(abs((test$Y − predict(cochrane_orcutt_model, test))/test$Y)) * 100
mape_pw_test <- mean(abs((test$Y − predict(prais_winsten_model, test))/test$Y)) * 100
mape_mle_test <- mean(abs((test$Y − pred_mle_test)/test$Y)) * 100
mape_reml_test <- mean(abs((test$Y − pred_reml_test)/test$Y)) * 100
mape_nn_test <- mean(abs((test$Y − pr.nn_test)/test$Y)) * 100

# Compile Results
results <- data.frame(
Model = c(“OLS”, “Cochrane-Orcutt”, “Prais-Winsten”, “MLE”, “REML”, “ANN”),
MSE = c(mse_ols_test, mse_co_test, mse_pw_test, mse_mle_test, mse_reml_test, mse_nn_test),
MAE = c(mae_ols_test, mae_co_test, mae_pw_test, mae_mle_test, mae_reml_test, mae_nn_test),
MAPE = c(mape_ols_test, mape_co_test, mape_pw_test, mape_mle_test, mape_reml_test, mape_nn_test)
)

print(results)

# Unload libraries and detach dataset
detach(data)
detach(“package:neuralnet”, unload = TRUE)
detach(“package:matlib”, unload = TRUE)
detach(“package:orcutt”, unload = TRUE)
detach(“package:nlme”, unload = TRUE)
detach(“package:prais”, unload = TRUE)
detach(“package:lmtest”, unload = TRUE)

References

Rauf, R.I.; Ifeyinwa, O.J.; Yahaya, H.U. Robustness test of selected estimators of linear regression with autocorrelated error term: A Monte Carlo simulation study. Asian J. Probab. Stat. 2021, 109, 102274. [Google Scholar] [CrossRef]
Rauf, R.I.; Hamidu, B.A.; Kikelomo, B.O.; Kayode, A.; Olusegun, A.O. Heteroscedasticity correction measures in stochastic frontier analysis. Ann. Univ. Oradea Econ. Sci. 2024, 33, 1–22. Available online: https://anale.steconomiceuoradea.ro/en/wp-content/uploads/2024/11/AUOES.July_.2024.18.pdf (accessed on 1 November 2024). [CrossRef] [PubMed]
Rauf, R.I.; Alabi, O.O.; Bello, H.A.; Bodunwa, O.K.; Ayinde, K. New Approach in Stochastic Frontier Analysis Estimation for Addressing Joint Assumption Violation of Heteroscedasticity and Multicollinearity. Asian J. Probab. Stat. 2024, 26, 9–26. [Google Scholar] [CrossRef]
Lu, J.; Peng, J.; Chen, J.; Sugeng, K.A. Prediction method of autoregressive moving average models for uncertain time series. Int. J. Gen. Syst. 2020, 49, 546–572. [Google Scholar] [CrossRef]
Farhi, L.; Yasir, A. Optimized intelligent auto-regressive neural network model (ARNN) for prediction of non-linear exogenous signals. Wirel. Pers. Commun. 2022, 124, 1151–1167. [Google Scholar] [CrossRef]
Rauf, R.I.; Ayinde, K.; Bello, H.A.; Bodunwa, O.K.; Alabi, O.O. Enhanced methods for multicollinearity mitigation in stochastic frontier analysis estimation. J. Niger. Soc. Phys. Sci. 2024, 6, 2091. [Google Scholar] [CrossRef]
López, G.; Arboleya, P. Short-term wind speed forecasting over complex terrain using linear regression models and multivariable LSTM and NARX networks in the Andes Mountains, Ecuador. Renew. Energy 2022, 183, 351–368. [Google Scholar] [CrossRef]
Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015; Available online: https://books.google.com.ng/books/about/Time_Series_Analysis.html?id=rNt5CgAAQBAJ&redir_esc=y (accessed on 1 November 2024).
Kanaparthi, V. Robustness evaluation of LSTM-based deep learning models for Bitcoin price prediction in the presence of random disturbances. Int. J. Innov. Sci. Mod. Eng. (IJISME) 2024, 12, 14–23. [Google Scholar] [CrossRef]
Loossens, T.; Tuerlinckx, F.; Verdonck, S. A comparison of continuous and discrete time modeling of affective processes in terms of predictive accuracy. Sci. Rep. 2021, 11, 6218. [Google Scholar] [CrossRef] [PubMed]
Lara-Benítez, P.; Carranza-García, M.; Luna-Romera, J.M. Temporal Convolutional Networks Applied to Energy-Related Time Series Forecasting. Appl. Sci. 2020, 10, 2322. [Google Scholar] [CrossRef]
Maulik, R.; Lusch, B.; Balaprakash, P. Non-autoregressive time-series methods for stable parametric reduced-order models. Phys. Fluids 2020, 32, 087107. [Google Scholar] [CrossRef]
Beneventano, P.; Cheridito, P.; Graeber, R.; Jentzen, A.; Kuckuck, B. Deep neural network approximation theory for high-dimensional functions. arXiv 2021, arXiv:2112.14523. [Google Scholar]
Smarra, F.; Di Girolamo, G.D.; De Iuliis, V.; Jain, A.; Mangharam, R.; D’Innocenzo, A. Data-driven switching modeling for MPC using regression trees and random forests. Nonlinear Anal. Hybrid Syst. 2020, 36, 100882. [Google Scholar] [CrossRef]
Ballestrín, J.; Polo, J.; Martín-Chivelet, N.; Barbero, J.; Carra, E.; Alonso-Montesinos, J.; Marzo, A. Soiling forecasting of solar plants: A combined heuristic approach and autoregressive model. Energy 2022, 239, 122442. [Google Scholar] [CrossRef]
Jeong, S.; Ghosal, S. Unified Bayesian theory of sparse linear regression with nuisance parameters. Electron. J. Stat. 2021, 15, 3040–3111. [Google Scholar] [CrossRef]
Kaur, J.; Parmar, K.S.; Singh, S. Autoregressive models in environmental forecasting time series: A theoretical and application review. Environ. Sci. Pollut. Res. 2023, 30, 19617–19641. [Google Scholar] [CrossRef] [PubMed]
Ayodele, B.V.; Mustapa, S.I.; Mohammad, N.; Shakeri, M. Long-term energy demand in Malaysia as a function of energy supply: A comparative analysis of non-linear autoregressive exogenous neural networks and multiple non-linear regression models. Energy Strategy Rev. 2021, 38, 100750. [Google Scholar] [CrossRef]
Le, C.M.; Li, T. Linear regression and its inference on noisy network-linked data. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2022, 84, 1851–1885. [Google Scholar] [CrossRef]

Figure 1. Graph of error (

e_{t}

) against (

e_{t - 1}

) showing autocorrelation.

Figure 1. Graph of error (

e_{t}

) against (

e_{t - 1}

) showing autocorrelation.

Figure 2. Actual and predicted number of people employed (100% testing).

Figure 3. Graph of error (

e_{t}

) against (

e_{t - 1}

) showing autocorrelation.

Figure 3. Graph of error (

e_{t}

) against (

e_{t - 1}

) showing autocorrelation.

Figure 4. Actual and predicted MPG (average miles per gallon) (100% testing), where ML and REML denote maximum likelihood estimator and restricted maximum likelihood estimator.

Figure 5. Graph of error (

e_{t}

) against (

e_{t - 1}

) showing autocorrelation.

Figure 5. Graph of error (

e_{t}

) against (

e_{t - 1}

) showing autocorrelation.

Figure 6. Actual and predicted GDP data (100% testing), where ML and REML denote maximum likelihood estimator and restricted maximum likelihood estimator.

Table 1. Mean Square Error (MSE).

% TEST	% TRAIN	OLS	PW	CO	MLE	RMLE	ANN
80	20	2.45 × 1010	2.40 × 1010	2.48 × 1010	2.40 × 1010	2.43 × 1010	2.36 × 108
60	40	711,851,329	458,678,299	1.11 × 109	466,112,028	3.46 × 108	3.15 × 108
40	60	645,534,989	624,725,975	666,879,097	623,752,954	5.43 × 108	4.78 × 108
20	80	3.35 × 109	4.20 × 109	5.54 × 109	4.16 × 109	6.04 × 109	5.82 × 107
100	100	1.48 × 108	1.49 × 108	1.49 × 108	1.49 × 108	1.52 × 108	1.57 × 107

Table 2. Mean Absolute Error (MAE).

% TEST	% TRAIN	OLS	PW	CO	MLE	RMLE	ANN
80	20	131,694.40	130,322.22	126,851.23	130,388.42	131,838.80	7315.73
60	40	12.56	8.14	17.85	8.30	4.90	3.94
40	60	16,764.69	16,202.92	17,293.96	16,175.35	13,584.43	8651.32
20	80	52,185.61	58,553.12	67,423.98	58,278.66	70,410.37	6725.66
100	100	6557.24	6644.13	6642.34	6635.23	7075.15	2277.74

Table 3. Mean Absolute Percentage Error (MAPE).

% TEST	% TRAIN	OLS	PW	CO	MLE	RMLE	ANN
80	20	107.86	106.78	102.46	106.84	108.14	5.50
60	40	39.22	39.22	76.78	59.36	119.17	38.72
40	60	11.04	10.62	11.43	10.60	8.69	5.08
20	80	37.94	42.57	49.03	42.37	51.21	5.00
100	100	5.40	5.39	5.41	5.39	5.79	2.01

Table 4. Mean Square Error (MSE).

% TEST	% TRAIN	OLS	PW	CO	MLE	RMLE	ANN
80	20	1420.43	1912.43	2617.82	1898.51	2742.16	115.80
60	40	134.12	134.10	475.10	349.53	997.96	103.85
40	60	39.78	46.55	573.88	54.75	65.71	14.83
20	80	137.25	23.02	4.50	10.39	882.17	42.14
100	100	11.70	12.34	16.13	13.33	96.98	10.89

Table 5. Mean Absolute Error (MAE).

% TEST	% TRAIN	OLS	PW	CO	MLE	RMLE	ANN
80	20	25.52	28.64	30.86	28.41	44.66	8.82
60	40	8.65	8.65	19.12	12.56	30.18	8.87
40	60	5.48	6.19	22.29	6.85	7.40	3.08
20	80	8.83	3.39	1.85	2.32	29.08	5.73
100	100	2.57	7.13	2.49	2.60	7.81	2.46

Table 6. Mean Absolute Percentage Error (MAPE).

% TEST	% TRAIN	OLS	PW	CO	MLE	RMLE	ANN
80	20	114.40	129.92	132.48	129.08	183.23	37.04
60	40	39.22	39.22	76.78	59.36	119.17	38.72
40	60	24.02	26.64	101.59	30.21	33.58	14.85
20	80	47.50	17.86	9.52	12.29	148.70	31.04
100	100	7.42	30.26	6.77	7.37	29.37	7.52

Table 7. Mean Square Error (MSE).

% TEST	% TRAIN	OLS	PW	CO	MLE	RMLE	ANN
80	20	4.58 × 10¹⁴	4.49 × 10¹⁴	4.38 × 10¹⁴	4.6 × 10¹⁴	4.49 × 10¹⁴	1.57 × 10¹¹
60	40	2.26 × 10¹⁵	1.82 × 10¹⁵	4.8 × 10¹⁵	1.81 × 10¹⁵	1.68 × 10¹⁵	1.72 × 10¹⁰
40	60	1.57 × 10¹³	8.92 × 10¹²	6.65 × 10¹²	8.48 × 10¹²	7.24 × 10¹²	6.90 × 10¹⁰
20	80	1.56 × 10¹²	1.43 × 10¹¹	2.05 × 10¹¹	1.35 × 10¹¹	1.28 × 10¹¹	3.76 × 10¹⁰
100	100	5.99 × 10⁹	3 × 10¹⁰	9.29 × 10¹⁶	3.11 × 10¹⁰	3.26 × 10¹⁰	1.05 × 10⁹

Table 8. Mean Absolute Error (MAE).

% TEST	% TRAIN	OLS	PW	CO	MLE	RMLE	ANN
80	20	12,388,203	12,274,842	12,127,509	12,436,011	12,274,969	333,760.5
60	40	33,042,525	29,253,970	48,289,635	29,165,857	28,072,647	118,849.2
40	60	2,539,889	1,802,965	1,534,034	1,748,181	1,587,751	176,293.7
20	80	930,815.3	297,228	357,425.2	291,987.3	286,309	159,024.4
100	100	65,077.26	140,568.2	3.05 × 10⁸	143,785.7	148,305.7	23,344.64

Table 9. Mean Absolute Percentage Error (MAPE).

% TEST	% TRAIN	OLS	PW	CO	MLE	RMLE	ANN
80	20	3389.55	3357.17	3314.10	3397.27	3357.24	102.77
60	40	7329.33	6444.82	10,652.83	6424.23	6170.01	35.98
40	60	520.46	371.15	315.01	358.71	322.05	35.06
20	80	173.81	51.32	66.57	50.54	49.72	27.44
100	100	275.19	775.69	942,400.3	795.06	823.31	81.66

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rauf, R.I.; Alrasheedi, M.A.; Sadiq, R.; Aldawsari, A.M.A. Evaluating Predictive Accuracy of Regression Models with First-Order Autoregressive Disturbances: A Comparative Approach Using Artificial Neural Networks and Classical Estimators. Mathematics 2024, 12, 3966. https://doi.org/10.3390/math12243966

AMA Style

Rauf RI, Alrasheedi MA, Sadiq R, Aldawsari AMA. Evaluating Predictive Accuracy of Regression Models with First-Order Autoregressive Disturbances: A Comparative Approach Using Artificial Neural Networks and Classical Estimators. Mathematics. 2024; 12(24):3966. https://doi.org/10.3390/math12243966

Chicago/Turabian Style

Rauf, Rauf I., Masad A. Alrasheedi, Rasheedah Sadiq, and Abdulrahman M. A. Aldawsari. 2024. "Evaluating Predictive Accuracy of Regression Models with First-Order Autoregressive Disturbances: A Comparative Approach Using Artificial Neural Networks and Classical Estimators" Mathematics 12, no. 24: 3966. https://doi.org/10.3390/math12243966

APA Style

Rauf, R. I., Alrasheedi, M. A., Sadiq, R., & Aldawsari, A. M. A. (2024). Evaluating Predictive Accuracy of Regression Models with First-Order Autoregressive Disturbances: A Comparative Approach Using Artificial Neural Networks and Classical Estimators. Mathematics, 12(24), 3966. https://doi.org/10.3390/math12243966

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating Predictive Accuracy of Regression Models with First-Order Autoregressive Disturbances: A Comparative Approach Using Artificial Neural Networks and Classical Estimators

Abstract

1. Introduction

2. Methodology

2.1. Linear Regression Model Setup

2.1.1. Ordinary Least Squares (OLS) Estimation

2.1.2. Prediction and Residuals

2.1.3. Sum of Squared Errors (SSE)

Differentiation of SSE with Respect to β :

2.1.4. Properties of the OLS Estimator

2.1.5. Autocorrelation Detection

Interpretation of the Durbin–Watson Statistic

2.2. Implications of AR(1) in Linear Regression

2.2.1. Cochrane–Orcutt Procedure

Step 1: Initial OLS Fit

Step 2: Estimate Autocorrelation ρ

Step 3: Model Transformation

Step 4: Iteration Until Convergence

2.2.2. Prais–Winsten Transformation

Step 1: First-Observation Transformation

Step 2: Transformation of Remaining Observations

Step 3: Iterative Estimation of ρ and β

2.2.3. Maximum Likelihood Estimation (MLE)

Log Likelihood Function

2.2.4. Restricted Maximum Likelihood (RMLE)

Log Restricted Likelihood Derivation

Estimating β , ρ , and σ 2 with RMLE:

2.2.5. Artificial Neural Network (ANN) Model

Feedforward Equation

Activation Function

Weight Update Rule (Backpropagation)

Mean Squared Error Loss

Ordinary Least Squares (OLS) on ANN Predictions

2.3. Performance Metrics

2.4. Data Sources and Descriptions

2.4.1. Dataset One: Longley Data

2.4.2. Dataset Two: MPG Data

2.4.3. Dataset Three: GDP Data

3. Results

3.1. Presentation of Result from Selected Longley Data Dataset

3.1.1. Test for Autocorrelation in Longley Data

3.1.2. Model Comparison Based on MSE, MAE, and MAPE

3.1.3. Comparison Between Actual and Predicted Values for Longley Data

3.2. Presentation of Result from Selected MPG Dataset

3.2.1. Test for Autocorrelation in MPG Data

Comparison Based on MAE, MSE, and MAPE for MPG Data

3.2.2. Comparison Between Actual and Predicted Values for MPG Data

3.3. Presentation of Result from Selected GDP Data

3.3.1. Test for Autocorrelation in GDP Data

3.3.2. Comparison Based on MAE, MSE, and MAPE

3.3.3. Comparison Between Actual and Predicted Values for GDP Data

3.4. Addressing Overfitting in Artificial Neural Networks

4. Conclusions

4.1. Summary of Findings

4.2. Recommendations

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. R Code

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Differentiation of SSE with Respect to $β$ :

Step 2: Estimate Autocorrelation $ρ$

Step 3: Iterative Estimation of $ρ$ and $β$

Estimating $β$ , $ρ$ , and $σ^{2}$ with RMLE: