research-article

Open access

Can Perturbations Help Reduce Investment Risks? Risk-aware Stock Recommendation via Split Variational Adversarial Training

Authors:

Jiezhu Cheng,

Kaizhu Huang,

Zibin ZhengAuthors Info & Claims

ACM Transactions on Information Systems, Volume 42, Issue 4

Article No.: 101, Pages 1 - 28

https://doi.org/10.1145/3643131

Published: 22 March 2024 Publication History

PDF eReader

Abstract

In the stock market, a successful investment requires a good balance between profits and risks. Based on the learning to rank paradigm, stock recommendation has been widely studied in quantitative finance to recommend stocks with higher return ratios for investors. Despite the efforts to make profits, many existing recommendation approaches still have some limitations in risk control, which may lead to intolerable paper losses in practical stock investing. To effectively reduce risks, we draw inspiration from adversarial learning and propose a novel Split Variational Adversarial Training (SVAT) method for risk-aware stock recommendation. Essentially, SVAT encourages the stock model to be sensitive to adversarial perturbations of risky stock examples and enhances the model’s risk awareness by learning from perturbations. To generate representative adversarial examples as risk indicators, we devise a variational perturbation generator to model diverse risk factors. Particularly, the variational architecture enables our method to provide a rough risk quantification for investors, showing an additional advantage of interpretability. Experiments on several real-world stock market datasets demonstrate the superiority of our SVAT method. By lowering the volatility of the stock-recommendation model, SVAT effectively reduces investment risks and outperforms state-of-the-art baselines by more than 30% in terms of risk-adjusted profits. All the experimental data and source code are available at https://drive.google.com/drive/folders/14AdM7WENEvIp5x5bV3zV_i4Aev21C9g6?usp=sharing.

1 Introduction

The stock market, one of the largest financial markets in the world, has been an attractive platform allowing millions of investors to manage their assets for wealth growth. However, its highly volatile nature presents not only opportunities for profits, but also risks of losses [1]. To achieve a good balance between profits and risks, stock investors have been striving for methods that can accurately predict the future trend of the stock market [8]. Unfortunately, stock prediction is extremely challenging due to the highly stochastic and non-stationary nature of stock prices. Under such circumstances, more researchers have opted for advanced machine learning methods to study stock movements and make profitable predictions [45].

Modern stock-prediction solutions mainly fall into three categories, namely, regression, classification, and recommendation methods [45]. Regression methods formulate stock prediction as a pure time-series forecasting problem and predict the future stock prices/returns by learning from historical stock time-series data [9, 28, 35, 42, 48, 53, 60]. However, classification methods treat stock prediction as a binary up/down classification problem and develop accurate classifiers to perform stock-movement prediction [15, 29, 57]. Nevertheless, general regression and classification methods have a significant drawback in that they are not directly optimized towards the target of investment (i.e., profit maximization) [16, 46], which may lead to abnormal results such that accurate prediction models earn less profit than inaccurate models. Figure 1 shows an example of how the problem occurs. To overcome this drawback, some researchers have proposed to employ reinforcement learning methods [7, 32, 51] to improve model profits by capturing trading signals in a dynamic prediction. Other researchers have developed recommendation methods to rank stocks with return ratios based on the comparison among multiple stocks [17]. In this case, models are trained to select top-k stocks with maximum expected profits to ensure their consistency with the investment target. Accordingly, various stock-recommendation models have been proposed and shown a promising prospect in the stock-prediction domain [16, 17, 46, 54].

Fig. 1.

Despite the efforts to maximize profits for investors, many existing stock-recommendation methods still have some limitations in risk control. Most of them mainly focus on developing powerful learning models to improve the investment profit while ignoring effective risk modeling. Such a deficiency may limit their effectiveness in practical stock investing and cause painful consequences. For example, Figure 2(a) presents daily returns of two stock-recommendation models backtested in the NASDAQ stock market from 10/25/2016 to 12/11/2017. Although both models attain nearly the same amount of profit (i.e., the sum of all daily returns), Model 1 is more volatile than Model 2 and suffers from a higher risk of potential losses. When employing Model 1 for stock trading, even if the final profit (\(35.7\%\)) is considerable, the huge paper loss of \(-51.7\%\) on 02/21/2017 can be intolerable to some investors and force them to stop investing halfway to prevent bankruptcy. In other words, the high volatility (risk) of Model 1 is prone to “kill the investor before the dawn.” To avoid such a disaster, it is imperative to reduce risks in addition to profit maximization when performing stock recommendation.

Fig. 2.

In this article, we explore the possibility of leveraging adversarial perturbations [18] to reduce stock-recommendation risks. This motivation leads to Split Variational Adversarial Training (SVAT), a novel adversarial training (AT) framework for risk-aware stock recommendation. The first innovation of our method is the split AT design. Without external information such as financial reports, the risk of potential losses mainly comes from adverse movements of historical stock prices, which are difficult to identify from the stochastic price series. To address the challenge, we propose to capture stock risks through the model’s sensitivity to adversarial examples (AEs) [18]. As depicted in Figure 2(b), for each stock example, we can generate AEs by adding small perturbations on their input features. Unlike conventional AT methods [41] that encourage the model to be robust to all AEs, we split the AT process to make the model robust to AEs of profitable stocks but sensitive to AEs of risky stocks. In this way, the recommendation model can recognize risks by learning from different perturbations, improving the capability of risk control.

Another challenge is how to craft representative AEs as risk indicators. Stock prices can be affected by multiple risk factors such as company performance and macroeconomics, which cannot be effectively modeled by traditional gradient-based AT methods. Hence, motivated by Variational Autoencoders [24] and Adversarial Distributional Training [13], we devise a variational perturbation generator (VPG) to learn an adversarial distribution driven by multiple latent risk factors, from which diverse perturbations can be generated for comprehensive risk modeling. Furthermore, in the testing environment, we could roughly quantify the risk of each stock example by generating its AEs from VPG and computing the entropy using different adversarial outputs. The risk quantification shows an additional advantage of interpretability provided by our method.

In this work, we employ STHAN-SR [46] as the backbone recommendation model and combine it with SVAT to achieve the state-of-the-art (SOTA) results. Our contributions are four-fold:

—

We investigate the limitation of risk modeling negligence and highlight the necessity of reducing stock-recommendation risks.

—

We propose a novel Split Variational Adversarial Training method to enhance the risk sensitivity of the stock-recommendation model, with the additional benefit of providing a rough risk quantification for investors.

—

To the best of our knowledge, this is the first work to engage adversarial training for risk modeling in stock recommendation, showing a new possibility for reducing investment risks with adversarial perturbations.

—

We conduct extensive experiments on three real-world datasets and show advantages of our model against state-of-the-art baselines, demonstrating the effectiveness and practicality of the SVAT method.

The remainder of this article is organized as follows: Section 2 introduces the preliminary knowledge about stock recommendation and adversarial training, which forms the building blocks of our method. Section 3 presents our proposed SVAT method. Section 4 describes the extensive experiments we conduct. Finally, we review related work in Section 5 and draw a conclusion in Section 6.

2 Preliminary

2.1 Problem Formulation

We focus on the task of stock recommendation under the learning to rank paradigm. Given a set of N stocks \(\mathcal {S} = \lbrace s_1, s_2, \ldots , s_N\rbrace\), for each stock \(s_i \in \mathcal {S}\) on trading day t, there is an associated close price \(p_{i,t}\) and a 1-day return ratio computed as \(r_{i,t} = \frac{p_{i,t} - p_{i,t-1}}{p_{i,t-1}}\). According to the value of each \(r_{i,t}\), we can determine a ranking list of all stocks sorted by their ranking scores \(\mathbf {y}_t = \lbrace y_{1,t} \gt y_{2,t} \gt \cdots \gt y_{N,t}\rbrace\), where \(y_{i,t} \gt y_{j,t}\) if and only if \(r_{i,t} \gt r_{j,t}\) for any two stocks \(s_i, s_j \in \mathcal {S}\). Therefore, stocks with higher ranking scores indicate higher investment revenue on trading day t. Formally, the goal of stock recommendation is to predict ranking scores \(\hat{\mathbf {y}}_t\) given historical sequential data \(\mathbf {X} = [X_{t-T}, X_{t-T+2}, \ldots , X_{t-1}]\):

\begin{equation} \hat{\mathbf {y}}_t = f(\mathbf {X}; \Theta), \end{equation}

(1)

where \(X_{\tau } \in \mathbb {R}^{N \times d}\) represents the input features of all stocks on trading day \(\tau\) (\(\tau \lt t\)), d is the feature dimension, and T is the length of the lookback window. f is the ranking function with parameters \(\Theta\) to be learned. Following References [16, 46, 54], the loss function for optimizing \(\Theta\) is the combination of a pointwise regression loss and a pairwise ranking loss:

\begin{equation} \mathcal {L} = \sum _{i=1}^{N}(\hat{y}_{i,t} - y_{i,t})^2 + \alpha \sum _{i=1}^N \sum _{j=1}^N \max \lbrace 0, -(\hat{y}_{i,t} - \hat{y}_{j,t})(y_{i,t} - y_{j,t})\rbrace , \end{equation}

(2)

where \(\alpha\) is a hyperparameter to balance the two loss terms. If not otherwise specified, then we usually set the ground-truth ranking score \(y_{i,t} = r_{i,t}\). After obtaining the prediction \(\hat{\mathbf {y}}_t\), we can select the top-k stocks from the ranking list for trading.

2.2 Adversarial Training Definition

Traditional adversarial training aims to improve the adversarial robustness of DNN classifiers by adding perturbations on sample features. Given a dataset \(\mathcal {D} = \lbrace (\mathbf {x}_i, y_i)\rbrace _{i=1}^M\) of M training examples with \(\mathbf {x}_i \in \mathbb {R}^d\) (denoting the input features) and \(y_i\) (denoting the ground-truth label), AT can be formulated as the following minimax optimization problem [34]:

\begin{equation*} \min _{\Theta } \frac{1}{M} \sum _{i=1}^M \max _{\boldsymbol{\delta }_i \in \Delta } \mathcal {L}(f(\mathbf {x}_i + \boldsymbol{\delta }_i; \Theta), y_i), \nonumber \nonumber \end{equation*}

where f is the DNN model with parameters \(\Theta\), \(\mathcal {L}\) is a loss function, \(\Delta = \lbrace \boldsymbol{\delta }: ||\boldsymbol{\delta }||_p \le \epsilon \rbrace\) is a perturbation set with \(\epsilon \gt 0\), and \(||\cdot ||_p\) denotes the p-norm of a vector. To enrich the diversity of perturbations, Adversarial Distributional Training (ADT) [13] is further proposed to learn a perturbation distribution \(p(\boldsymbol{\delta }_i|\mathbf {x}_i)\) by minimizing

\begin{equation*} \min _{\Theta } \frac{1}{M} \sum _{i=1}^M \max _{p(\boldsymbol{\delta }_i|\mathbf {x}_i) \in \mathcal {P}} \mathbb {E}_{p(\boldsymbol{\delta }_i|\mathbf {x}_i)} [\mathcal {L}(f(\mathbf {x}_i + \boldsymbol{\delta }_i; \Theta), y_i)], \nonumber \nonumber \end{equation*}

where \(\mathcal {P} = \lbrace p: \text{supp}(p) \subseteq \Delta \rbrace\) is a set of distributions with support contained in \(\Delta\). ADT solves the above minimax problem by simultaneously optimizing \(p(\boldsymbol{\delta }_i|\mathbf {x}_i)\) and \(\Theta\) in a single inseparable step, which is inapplicable to our split AT design. Hence, we devise a new training algorithm that combines the fast gradient approximation [18] and the variational Bayes [24] to learn \(p(\boldsymbol{\delta }_i|\mathbf {x}_i)\) and \(\Theta\) more flexibly.

3 Methodology

3.1 Motivation

We first explain our motivation for designing SVAT. From the perspective of investors, the main source of stock-recommendation risks comes from that the model may incorrectly assign higher scores to risky stocks with losses (\(r_{i,t} \lt 0\)) than profitable stocks (\(r_{i,t} \gt 0\)), after which risky stocks are recommended to investors, leading to high volatility of investment returns. One way to mitigate this problem is to train the recommendation model to be more fond of profitable stocks and more alert/sensitive to risky stocks. Adversarial training (AT) paves the way to obtain this “split” behavior, since we can indirectly manipulate the model’s sensitivity to different individual examples through their adversarial perturbations [33, 41]. While conventional AT methods aim to encourage the model to be robust to imperceptible perturbations [18, 41], researchers have found that increasing the sensitivity to perturbations could also help the model better capture the diversity of data samples, calling the Inverse Adversarial Training (IAT) [62]. Accordingly, we postulate that the combination of AT and IAT, namely, the split AT shown in Figure 2(b), can help the model better discriminate between profitable and risky stocks by learning from their perturbations in a different way and thus further reduce the probability of recommending risky stocks. This is the main reason why we design two different perturbations for profitable stocks and risky stocks, respectively. In addition, Variational Autoencoder (VAE) [24] is excellent in learning data distribution and can be used to model various stock factors [14]. All considerations above finally converge to our SVAT method.

Although SVAT is designed to reduce stock-recommendation risks, we believe that similar idea could also be applied to reduce ranking uncertainties of other learning-to-rank problems, such as recommender systems where positive items are more preferred than negative items.

3.2 Overview

For better illustration of the risk modeling w.r.t. each stock example, we literally decompose the ranking model f in Equation (1) into N ranking submodules \(f_1, f_2, \ldots , f_N\) sharing the same parameters \(\Theta\), with each \(f_i\) predicting the ranking score of stock \(s_i\):

\begin{align} \begin{split}\hat{y}_{i,t} &= f_i(\tilde{\mathbf {x}}_i; \Theta), \\ \tilde{\mathbf {x}}_i &= \Psi (X_i), \end{split} \end{align}

(3)

where \(\tilde{\mathbf {x}}_i \in \mathbb {R}^D\) is the feature vector transformed from the historical sequential features \(X_i = [\mathbf {x}_{i,t-T}; \ldots ; \mathbf {x}_{i,t-1}] \in \mathbb {R}^{T \times d}\) of stock \(s_i\), and \(\Psi\) is the transformation function that could be simple row concatenation or temporal embedding with RNN architectures [15, 46]. Similar to ADT, we model the adversarial perturbations around each stock example \(\tilde{\mathbf {x}}_i\) by a conditional distribution \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\), whose support is contained in \(\Delta = \lbrace \boldsymbol{\delta } \in \mathbb {R}^D: ||\boldsymbol{\delta }||_2 \le \epsilon \rbrace .\)¹ Next, we can sample a perturbation \(\boldsymbol{\delta }_i\) from \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\) to construct an adversarial example \(\tilde{\mathbf {x}}_i + \boldsymbol{\delta }_i\) and obtain the perturbed output by

\begin{equation} \hat{y}_{i,t}^{\text{adv}} = f_i(\tilde{\mathbf {x}}_i + \boldsymbol{\delta }_i; \Theta). \end{equation}

(4)

During the training phase, the adversarial loss for stock \(s_i\) can be computed as

\begin{equation} \mathcal {L}^{\text{adv}}_i = \left(\hat{y}_{i,t}^{\text{adv}} - y_{i,t}\right)^2 + \alpha \sum _{j=1}^N \max \left\lbrace 0, - \left(\hat{y}_{i,t}^{\text{adv}} - \hat{y}_{j,t}^{\text{adv}}\right)(y_{i,t} - y_{j,t})\right\rbrace , \end{equation}

(5)

and the total adversarial loss is the sum of each \(\mathcal {L}^{\text{adv}}_i\) weighted by the corresponding stock’s return ratio \(r_{i,t}\):

\begin{equation} \mathcal {L}^{\text{adv}} = \sum _{i=1}^N r_{i,t} \mathcal {L}^{\text{adv}}_i. \end{equation}

(6)

When we train the model by minimizing \(\mathcal {L}^{\text{adv}}\), the adversarial loss of stock examples with \(r_{i,t} \gt 0\) is minimized while the adversarial loss of stock examples with \(r_{i,t} \lt 0\) is maximized. In this way, the stock-recommendation model is encouraged to be more robust to adversarial perturbations of profitable stock examples while more sensitive to adversarial perturbations of risky stock examples. This split adversarial training approach better enhances the risk awareness of the model by treating adversarial examples of profitable and risky stocks in an opposite way, which is consistent with the split behavior we described in the previous section.

Besides, when deploying the model to the testing environment, we can quantify the risk of each stock example by sampling multiple adversarial perturbations from \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\). Specifically, for each testing example \(\tilde{\mathbf {x}}_i\), we generate M Monte Carlo perturbation samples \(\boldsymbol{\delta }_i^1, \boldsymbol{\delta }_i^2, \ldots , \boldsymbol{\delta }_i^M\) from \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\) and obtain different perturbed ranking scores \(\hat{y}_{i,t}^{1}, \hat{y}_{i,t}^{2}, \ldots , \hat{y}_{i,t}^{M}\) from Equation (4). Comparing these M scores with other stock examples produces M rankings \(a_{i,t}^1, a_{i,t}^2, \ldots , a_{i,t}^M\), which is used to compute the ranking entropy for \(\tilde{\mathbf {x}}_i\):

\begin{equation} \mathcal {H}(\tilde{\mathbf {x}}_i) = -\sum _{l=1}^N p(a_{i,t} = l) \cdot \log p(a_{i,t} = l), \end{equation}

(7)

where \(p(a_{i,t} = l) = \frac{1}{M} \sum _{m=1}^M \mathbb {I}(a_{i,t}^m = l)\) denotes the frequency of \(\tilde{\mathbf {x}}_i\) being ranked the lth stock and we have \(\mathcal {H}(\tilde{\mathbf {x}}_i) \in [0, +\infty)\). Investors could roughly evaluate the risk of the current stock example according to the ranking entropy, where higher entropy generally indicates higher risk.

Figure 3(a) presents the workflow of our SVAT method. The core of SVAT lies in the learning of the perturbation distribution \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\), which is detailed in the next section.

Fig. 3.

3.3 Variational Perturbation Generator

The key role of the perturbation distribution \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\) is to characterize potential risk factors of stocks and generate representative perturbation samples to help reduce investment risks. Since stock returns are typically affected by a variety of risk factors (e.g., macroeconomics, financial news), we decide to model these factors by some learnable latent variables that implicitly drive \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\):

\begin{equation} p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i) = \int p(\boldsymbol{\delta }_i, \mathbf {z}_i|\tilde{\mathbf {x}}_i)\, d\mathbf {z}_i = \int p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i) p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)\, d\mathbf {z}_i, \end{equation}

(8)

where \(\mathbf {z}_i \in \mathbb {R}^H\) is a learnable random vector containing H risk-relevant latent variables. Although the integral of the marginal likelihood in Equation (8) is intractable, it can be approximated by VAE [24], a reliable framework for neural approximation. In this spirit, we devise a Variational Perturbation Generator (VPG) to approximate \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\) with latent factors. As shown in Figure 3(b), VPG contains three components: perturbation extractor, posterior encoder, and generative decoder.

3.3.1 Perturbation Extractor.

We first employ the fast gradient approximation method [15, 18] as the perturbation extractor to extract a primitive perturbation sample from the gradient of the input features \(\tilde{\mathbf {x}}_i\):

\begin{equation} \boldsymbol{\delta }_i^{\text{post}} = \epsilon \cdot \frac{\nabla _{\tilde{\mathbf {x}}_i} \mathcal {L}}{||\nabla _{\tilde{\mathbf {x}}_i} \mathcal {L}||_2}, \end{equation}

(9)

where \(\mathcal {L}\) is the prediction loss in Equation (2) and \(\epsilon \gt 0\) is the hyperparameter. \(\boldsymbol{\delta }_i^{\text{post}}\) provides the posterior information about real data to guide the approximation learning of VPG.

3.3.2 Posterior Encoder.

The posterior encoder incorporates the primitive perturbation \(\boldsymbol{\delta }_i^{\text{post}}\) and the input features \(\tilde{\mathbf {x}}_i\) to learn a posterior distribution \(q^{\text{post}}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)\), from which posterior risk-relevant latent variables \(\mathbf {z}^{\text{post}}_i\) can be obtained:

\begin{align} \begin{split}\mathbf {h}_i^{\text{post}} &= F^{\text{post}}\left(\left[\boldsymbol{\delta }_i^{\text{post}}, \tilde{\mathbf {x}}_i\right]; \Xi ^{\text{post}}\right), \\ \boldsymbol{\mu }^{\text{post}}_i &= W^{\text{post}}_{\mu } \mathbf {h}_i^{\text{post}} + \mathbf {b}^{\text{post}}_{\mu }, \\ \boldsymbol{\sigma }^{\text{post}}_i &= s^{+}\left(W^{\text{post}}_{\sigma } \mathbf {h}_i^{\text{post}} + \mathbf {b}^{\text{post}}_{\sigma }\right), \\ \mathbf {z}^{\text{post}}_i &\sim q^{\text{post}}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) \triangleq \mathcal {N}\left(\boldsymbol{\mu }^{\text{post}}_i, \text{diag}\left(\left(\boldsymbol{\sigma }^{\text{post}}_i\right)^2\right)\right), \end{split} \end{align}

(10)

where \(F^{\text{post}}\) can be any non-linear neural network such as the multi-layer perceptron (MLP), \(\Xi ^{\text{post}}, W^{\text{post}}_{\mu }, W^{\text{post}}_{\sigma }, \mathbf {b}^{\text{post}}_{\mu }, \mathbf {b}^{\text{post}}_{\sigma }\) are learnable parameters, \(s^{+}(x) = \log (1+e^x)\) denotes the \(\rm {softplus}\) activation, and \(\mathbf {z}^{\text{post}}_i\) is a latent vector sampled from the posterior Gaussian distribution \(q^{\text{post}}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i\!)\).

3.3.3 Generative Decoder.

Finally, combining the information of the latent variables \(\mathbf {z}_i\) and the input features \(\tilde{\mathbf {x}}_i\), we train a generative decoder network \(F^{\text{gen}}\) to approximate the desired perturbation distribution \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\) and generate the risk-indicated perturbation \(\boldsymbol{\delta }_i\):

\begin{equation} \mathbf {g}_i = F^{\text{gen}}([\mathbf {z}_i, \tilde{\mathbf {x}}_i]; \Xi ^{\text{gen}}),\ \ \boldsymbol{\delta }_i = \epsilon \cdot \frac{\mathbf {g}_i}{||\mathbf {g}_i||_2}, \end{equation}

(11)

after which \(\boldsymbol{\delta }_i\) is engaged in Equation (4) to produce an adversarial example. During the training phase, we simply input \(\mathbf {z}_i = \mathbf {z}^{\text{post}}_i\) to \(F^{\text{gen}}\), which is, however, unrealizable in the testing environment, since we cannot obtain the loss and gradients to extract \(\boldsymbol{\delta }_i^{\text{post}}\) of testing examples without knowing their ground-truth labels. Accordingly, we further design another network \(F^{\text{prior}}\) to learn a prior distribution \(p^{\text{prior}}(\mathbf {z}_i|\tilde{\mathbf {x}}_i)\):

\begin{align} \begin{split}\mathbf {h}_i^{\text{prior}} &= F^{\text{prior}}(\tilde{\mathbf {x}}_i; \Xi ^{\text{prior}}), \\ \boldsymbol{\mu }^{\text{prior}}_i &= W^{\text{prior}}_{\mu } \mathbf {h}_i^{\text{prior}} + \mathbf {b}^{\text{prior}}_{\mu }, \\ \boldsymbol{\sigma }^{\text{prior}}_i &= s^{+}\left(W^{\text{prior}}_{\sigma } \mathbf {h}_i^{\text{prior}} + \mathbf {b}^{\text{prior}}_{\sigma }\right), \\ \mathbf {z}^{\text{prior}}_i &\sim p^{\text{prior}}(\mathbf {z}_i|\tilde{\mathbf {x}}_i) \triangleq \mathcal {N}\left(\boldsymbol{\mu }^{\text{prior}}_i, \text{diag}\left(\left(\boldsymbol{\sigma }^{\text{prior}}_i\right)^2\right)\right), \end{split} \end{align}

(12)

and enforce the prior distribution to approximate to the posterior distribution by minimizing

\begin{equation} \mathcal {L}^{\text{KL}}_i = D_{\text{KL}}[q^{\text{post}}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) || p^{\text{prior}}(\mathbf {z}_i|\tilde{\mathbf {x}}_i)], \end{equation}

(13)

where \(D_{\text{KL}}\) is the Kullback-Leibler divergence between two distributions. In this way, we can sample multiple \(\mathbf {z}^{\text{prior}}_i\)s from \(p^{\text{prior}}(\mathbf {z}_i|\tilde{\mathbf {x}}_i)\) and generate representative perturbations for testing examples without computing their gradients, which facilitates the risk quantification in Equation (7) and improves the risk interpretability of the stock-recommendation model.

3.3.4 Explanation for VPG.

As discussed in Section 3.1, we aim to enhance the risk awareness of the stock-recommendation model by encouraging the model to learn perturbations of profitable and risky stocks in a different way. Accordingly, the main goal of VPG is to provide an effective mechanism to generate perturbations for different stocks. As shown in Figure 3(b), VPG follows an encoder-decoder architecture and generates perturbations from a latent space. Since there is a variety of complex risk factors affecting stock prices, it is feasible to encode these risk factors into a latent space. Based on the framework of VAE [24], we assume that the perturbation \(\boldsymbol{\delta }_i\) of each stock can be generated from a latent random variable \(\mathbf {z}_i\) in the risk-factor latent space. We first use an encoder to learn the posterior distribution of \(\mathbf {z}_i\) given \(\boldsymbol{\delta }_i\) and then employ a decoder to learn a perturbation distribution close to the posterior distribution during the training phase. Finally in the testing phase, we could sample representative perturbations of all stocks from the perturbation distribution efficiently without the overhead of computing the gradient of each stock example.

In summary, the VPG can characterize various risk factors in a latent space and generate representative perturbation samples of different stocks, which is critical to enhance the risk awareness of the stock-recommendation model.

3.3.5 Theoretical Justification.

We present the theoretical justification of VPG based on the theoretical framework of VAE [24]. Given N datapoints \(\tilde{\mathbf {x}}_1, \tilde{\mathbf {x}}_2, \ldots , \tilde{\mathbf {x}}_N\) from the training dataset, the goal of VPG is to maximize the sum of the marginal likelihoods \(\sum _{i=1}^N \log p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\), where \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\) is driven by some risk-relevant latent variables \(\mathbf {z}_i\):

\begin{align} \begin{split}p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i) &= \int p(\boldsymbol{\delta }_i, \mathbf {z}_i|\tilde{\mathbf {x}}_i)d\mathbf {z}_i \\ &= \int p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i) p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)d\mathbf {z}_i \\ &= \int p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i) p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)d\mathbf {z}_i \\ &= \mathbb {E}_{\mathbf {z}_i \sim p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)}[p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i)]. \end{split} \end{align}

(14)

Without loss of generality, we assume that \(p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)\) is a Gaussian distribution conditioned on \(\tilde{\mathbf {x}}_i\) and \(p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i)\) is a Gaussian distribution conditioned on both \(\tilde{\mathbf {x}}_i\) and \(\mathbf {z}_i\):

\begin{align} \begin{split}p(\mathbf {z}_i|\tilde{\mathbf {x}}_i) &\triangleq \mathcal {N}(\mu _{\mathbf {z}}(\tilde{\mathbf {x}}_i), \Sigma _{\mathbf {z}}(\tilde{\mathbf {x}}_i)), \\ p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i) &\triangleq \mathcal {N}(\mu _{\boldsymbol{\delta }}(\mathbf {z}_i, \tilde{\mathbf {x}}_i), \Sigma _{\boldsymbol{\delta }}(\mathbf {z}_i, \tilde{\mathbf {x}}_i)),\end{split} \end{align}

(15)

where \(\mu _{\mathbf {z}}(\cdot), \mu _{\boldsymbol{\delta }}(\cdot , \cdot)\) are learnable functions that output the mean of the Gaussian and \(\Sigma _{\mathbf {z}}(\cdot), \Sigma _{\boldsymbol{\delta }}(\cdot , \cdot)\) are learnable functions that output the covariance of the Gaussian, all of which can be approximated by neural networks. In this case, we can sample a large number of \(\mathbf {z}_i\) from \(p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)\) and approximate the expectation in Equation (14) by average:

\begin{align} \begin{split} p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i) = \mathbb {E}_{\mathbf {z}_i \sim p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)}&[p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i)] \approx \frac{1}{K} \sum _{k=1}^K \mathcal {N} \left(\mu _{\boldsymbol{\delta }}\left(\mathbf {z}_i^k, \tilde{\mathbf {x}}_i\right), \Sigma _{\boldsymbol{\delta }}\left(\mathbf {z}_i^k, \tilde{\mathbf {x}}_i\right)\right), \\ \text{where}& \ \ \mathbf {z}_i^k \sim \mathcal {N}(\mu _{\mathbf {z}}(\tilde{\mathbf {x}}_i), \Sigma _{\mathbf {z}}(\tilde{\mathbf {x}}_i)). \end{split} \end{align}

(16)

However, this approach suffers from curse of dimensionality, since the sample number K grows exponentially as the dimension of \(\mathbf {z}_i\) increases. Besides, for any given observations \(\boldsymbol{\delta }_i\) and \(\tilde{\mathbf {x}}_i\), most \(\mathbf {z}_i^k\) will contribute very little to the likelihood.

To solve the problem above, VAE [24] proposes to sample the latent variables \(\mathbf {z}_i\) from the posterior distribution \(p(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)\) and only pick a small number of \(\mathbf {z}_i\) values that contribute a significant amount to the likelihood. Specifically, VAE approximates the ground-truth posterior distribution \(p(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)\) by a learnable model \(q_{\boldsymbol{\phi }}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)\) with parameters \(\boldsymbol{\phi }\). Considering the Kullback-Leibler divergence between \(q_{\boldsymbol{\phi }}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)\) and \(p(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)\), we obtain:

\begin{align} \begin{split} &\quad D_{\text{KL}}(q_{\boldsymbol{\phi }}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) || p(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)) \\ &\triangleq \int q_{\boldsymbol{\phi }}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) \log \left[\frac{q_{\boldsymbol{\phi }}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)}{p(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)}\right] d \mathbf {z}_i \\ &= \mathbb {E}_{\mathbf {z}_i \sim q_{\boldsymbol{\phi }}}[\log q_{\boldsymbol{\phi }}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) - \log p(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)] \\ &= \mathbb {E}_{\mathbf {z}_i \sim q_{\boldsymbol{\phi }}}\left[\log q_{\boldsymbol{\phi }}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) - \log \frac{p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i) p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)}{p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)}\right]\ \ (\text{Bayes rule}) \\ &= \mathbb {E}_{\mathbf {z}_i \sim q_{\boldsymbol{\phi }}}[\log q_{\boldsymbol{\phi }}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) - \log p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i) - \log p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)] + \log p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\\ &= \log p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i) -\mathbb {E}_{\mathbf {z}_i \sim q_{\boldsymbol{\phi }}}[\log p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i)] +\mathbb {E}_{\mathbf {z}_i \sim q_{\boldsymbol{\phi }}}[\log q_{\boldsymbol{\phi }}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) - \log p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)] \\ &= \log p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i) -\mathbb {E}_{\mathbf {z}_i \sim q_{\boldsymbol{\phi }}}[\log p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i)] + D_{\text{KL}}[q_{\boldsymbol{\phi }}(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) || p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)]. \end{split} \end{align}

(17)

Rearranging Equation (17), we finally obtain the evidence lower bound (ELBO) proposed in Reference [24]:

\begin{align} \begin{split} &\quad \log p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i) - D_{\text{KL}}[q_{\phi }(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) || p(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)] \\ &= \mathbb {E}_{\mathbf {z}_i \sim q_{\phi }}[\log p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i)] - D_{\text{KL}}[q_{\phi }(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) || p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)]. \end{split} \end{align}

(18)

Therefore, we can maxmize the marginal likelihood \(\log p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\) and minimize the Kullback-Leibler divergence \(D_{\text{KL}}[q_{\phi }(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) || p(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)]\) (i.e., the LHS of Equation (18)) by equivalently maximizing \(\mathbb {E}_{\mathbf {z}_i \sim q_{\phi }}[\log p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i)]\) and minimizing \(D_{\text{KL}}[q_{\phi }(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i) || p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)]\) (i.e., the RHS of Equation (18)), which is exactly what VPG does. As shown in Figure 3(b), VPG utilizes the Posterior Encoder and the Prior Network to model the approximated posterior distribution \(q_{\phi }(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)\) and the prior distribution \(p(\mathbf {z}_i|\tilde{\mathbf {x}}_i)\), respectively. And we minimize their Kullback-Leibler divergence by minimizing the \(\mathcal {L}_i^{\text{KL}}\) in Equation (13). However, since all perturbations \(\boldsymbol{\delta }_i\) generated from \(p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i)\) are expected to be representative risk indicators that can minimize the \(\mathcal {L}^{\text{adv}}\) in Equation (6), we can maximize the likelihood \(\mathbb {E}_{\mathbf {z}_i \sim q_{\phi }}[\log p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i)]\) by equivalently minimizing the \(\mathcal {L}^{\text{adv}}\). Finally, we conclude that the proposed SVAT loss \(\mathcal {L}^{\text{adv}} + \mathcal {L}^{\text{KL}}\) is consistent with the theoretical framework of VAE.

Note that during the training phase, we only sample one \(\mathbf {z}_i\) from the approximated posterior distribution \(q_{\phi }(\mathbf {z}_i|\boldsymbol{\delta }_i, \tilde{\mathbf {x}}_i)\) in each epoch and approximate the expectation \(\mathbb {E}_{\mathbf {z}_i \sim q_{\phi }}[\log p(\boldsymbol{\delta }_i|\mathbf {z}_i, \tilde{\mathbf {x}}_i)]\) by training the VPG for multiple epochs, thus avoiding a large number of sampling as in Equation (16) and the curse of dimensionality.

3.4 Model Training

We summarize the training process of SVAT as Algorithm 1. The stock-recommendation model f and all components of the VPG are trained end-to-end by minimizing the combined loss function of Equations (2), (6), and (13):

\begin{equation*} \mathcal {L}^{\text{com}} = \mathcal {L} + \lambda (\mathcal {L}^{\text{adv}} + \mathcal {L}^{\text{KL}}), \nonumber \nonumber \end{equation*}

where \(\mathcal {L}^{\text{KL}} = \sum _{i=1}^N \mathcal {L}^{\text{KL}}_i\) and \(\lambda\) is a hyperparameter to control the contribution of the SVAT loss. We utilize the Adam [23] algorithm for optimization.

4 Experiments

The core of this section is to evaluate whether our method can effectively reduce stock investment risks for investors. Accordingly, we conduct extensive experiments with the aim of answering the following research questions:

—

RQ1: How is the utility of our proposed SVAT method in a general economic environment? Can SVAT outperform state-of-the-art stock-recommendation models in terms of risk-adjusted profits under normal circumstances?

—

RQ2: How is the utility of our SVAT method in an extreme economic environment such as the financial crisis? Can SVAT protect investors from risks better than other state-of-the-art stock-recommendation models under extreme circumstances?

—

RQ3: Does SVAT capture different signals or recommend stocks different from other baseline methods? To what extent are all methods correlated with each other?

—

RQ4: How is the effectiveness of the split adversarial training design and the variational perturbation generator component of our proposed SVAT method?

—

RQ5: How does our proposed SVAT method perform under different backtesting strategies, different adversarial hyperparameter settings, and different sampling methods?

—

RQ6: How does the adversarial perturbation of SVAT help reduce the risk of stock recommendation? What insights can investors learn from SVAT?

We next conduct different experiments to answer the RQs above, comprehensively demonstrating the effectiveness, practicality, and robustness of our approach.

4.1 Experimental Setting

4.1.1 Datasets.

As shown in Table 1, our experiments are based on six real-world datasets from US and China stock markets, including three normal datasets in a general economic environment (RQ1) and three crisis datasets during the financial crisis period (RQ2):

Table 1.

Dataset	NASDAQ	NYSE	CASE	NASDAQ_08	NYSE_08	CASE_08
Train(Tr) Period	01/2013–12/2015	01/2013–12/2015	03/2016–04/2019	01/2002–12/2006	01/2002–12/2006	01/2002–12/2006
Valid(Va) Period	01/2016–12/2016	01/2016–12/2016	04/2019–04/2020	01/2007–10/2007	01/2007–10/2007	01/2007–10/2007
Test(Te) Period	01/2017–12/2017	01/2017–12/2017	04/2020–03/2022	11/2007–12/2008	11/2007–12/2008	11/2007–12/2008
\(\#\)Days(Tr:Va:Te)	756:252:237	756:252:237	756:252:456	1259:211:295	1259:211:295	1205:201:289
\(\#\)Stocks	1,026	1,737	4,465	656	1,115	1,520

Table 1. Dataset Statistics Detailing Chronological Date Splits of the Six Stock Datasets

—

Normal datasets:

—

NASDAQ [16]: This dataset contains the price data of 1,026 equity stocks in the NASDAQ Global and Capital market from 01/02/2013 to 12/08/2017.

—

NYSE [16]: This dataset consists of the price data of 1,737 equity stocks in the New York Stock Exchange market from 01/02/2013 to 12/08/2017.

—

CASE: This dataset collects the price data of 4,465 equity stocks from the China A-share Stock Exchange market from 03/01/2016 to 03/04/2022.

—

Crisis datasets:

—

NASDAQ_08: This dataset contains the price data of 656 equity stocks in the NASDAQ Global and Capital market from 01/02/2002 to 12/31/2008.

—

NYSE_08: This dataset consists of the price data of 1,115 equity stocks in the New York Stock Exchange market from 01/02/2002 to 12/31/2008.

—

CASE_08: This dataset collects the price data of 1,520 equity stocks from the China A-share Stock Exchange market from 01/04/2002 to 12/31/2008.

All the stock datasets above are collected every one day (i.e., daily price data), within which each data point consists of five features (i.e., the opening price, highest price, lowest price, closing price, and trading volume of the stock for the day). Among these datasets, NASDAQ and NYSE have been widely used in most previous work [16, 46, 54] and we also use them here to ensure fair comparison with other state-of-the-art models. However, we collect the data of CASE and CASE_08 from RiceQuant,² the data of NASDAQ_08 and NYSE_08 from Yahoo Finance,³ respectively. In particular, NASDAQ_08, NYSE_08, and CASE_08 datasets contain the stock data covering the entire 2007–2008 global financial crisis period, which is crucial to test the anti-risk ability of stock-prediction models.

4.1.2 Baselines.

Since we focus on reducing investment risks of stock recommendation, we compare our method with the stock market composite index and seven stock-recommendation baseline methods as follows:

—

Buy&Hold: This is the simplest trading strategy where we buy the composite index of all stocks and hold. The results of this buy-and-hold index represent the average benchmark performance of the stock market.

—

ARIMA [2]: This method is the traditional Autoregressive Integrated Moving Average model for time-series prediction. We use it to directly predict the return ratio of each stock and recommend stocks with the highest predicted return ratios.

—

LSTM [5]: This method is the vanilla LSTM model that operates on the sequential stock price data and obtains a sequential embedding for stock recommendation. We finally combine it with a fully connected layer to predict the ranking score of each stock.

—

GCN [25]: GCN is the typical and representative graph-based learning method. We use the vanilla GCN architecture to model the stock relation graph and combine it with a fully connected layer to predict the ranking score of each stock.

—

RSR-E [16]: This method develops a temporal GCN model using price movement similarity to weight the relation between different stocks and improves the performance of stock relation learning.

—

RSR-I [16]: This method employs an implicit neural network to adaptively learn the stock relation and improves the adaptability of the temporal GCN model.

—

ANN-SVM [26]: This method incorporates a non-linear Artificial Neural Network (ANN) and a Support Vector Machine (SVM) to perform stock recommendation.

—

STHAN-SR [46]: This method leverages a hypergraph attention network to learn the stock relation and achieve great improvements on stock recommendation.

4.1.3 Evaluation Metrics.

Following the previous work [16, 17, 46, 54], we also adopt a daily buy-hold-sell trading strategy and evaluate all stock-recommendation methods by the following three metrics:

(1)

Investment Return Ratio (IRR):

\begin{equation*} \text{IRR} = \sum _t \text{IRR}^t = \sum _t \sum _{i \in \mathcal {S}^{t-1}} r_{i,t}, \nonumber \nonumber \end{equation*}

where \(\mathcal {S}^{t-1} \subseteq \mathcal {S}\) denotes the set of top-k stocks selected on trading day \(t-1\) and \(r_{i,t}\) is the one-day return ratio of stock \(s_i\) on trading day t. IRR only evaluates the model’s profit without risk consideration.

(2)

Sharpe Ratio (SR):

\begin{equation} \text{SR} = \frac{\mathbb {E}[\text{IRR}^t - R_f]}{\text{STD}[\text{IRR}^t - R_f]}, \end{equation}

(19)

where \(R_f\) is a risk-free return⁴ and STD denotes the standard deviation. SR is a risk-adjusted return metric considering both the profit and the volatility of the model.

(3)

Maximum Daily Drawdown (MDD):

\begin{equation*} \text{MDD} = 100\% \times \left|\min \left\lbrace \min _t \text{IRR}^t, 0\right\rbrace \right|. \nonumber \nonumber \end{equation*}

MDD measures the maximum daily loss of the model in backtesting (e.g., the MDD of Model 1 in Figure 2(a) is \(51.7\%\)), which evaluates to what extent the model can protect investors from the risk of loss.

4.1.4 Training Setup.

We implement the models with PyTorch⁵ except ARIMA, of which we use the Python statsmodels package implementation.⁶ For fair comparison, we use grid search to select optimal hyperparameters regarding SR for each model. For all methods, we tune the length of sequential input T within \(\lbrace 4, 8, 16, 20\rbrace\) and the learning rate \(\eta\) within \(\lbrace 1e-4, 5e-3\rbrace\). For the LSTM and the GCN model, we tune the number of hidden units within \(\lbrace 16, 32, 64, 128\rbrace\). For RSR-E, RSR-I, ANN-SVM, and STHAN-SR models, we conduct the same hyperparameter tuning as reported in their original papers [16, 26, 46]. As for our SVAT method, we employ STHAN-SR [46] as the backbone recommendation model and tune the adversarial constraint \(\epsilon\) within \(\text{range}[0.001, 0.1]\), loss weighting factors \(\alpha , \lambda\) within \(\text{range}[0.1, 1]\). We employ a two-layer MLP with 128 hidden neurons and \(\tanh\) activation to construct \(F^{\text{post}}, F^{\text{gen}}, F^{\text{prior}}\), respectively. We set \(k=5\) and thus select top-five stocks each day for evaluation. Finally, we train all models on a Tesla V100 GPU for \(E=500\) epochs.

4.1.5 Discussion: Predictability of Stock Daily Returns.

The core task of our SVAT method and other stock-recommendation baseline methods is to predict the daily returns of each stock and perform stock ranking, which relies on the basic premise that the stock daily returns are predictable. Indeed, there are some references [3, 20, 39, 40] that provide reliable evidence for the predictability of stock daily returns and thus support the rationality and feasibility for current stock-recommendation researches. However, most of the references above used the stock market data in 1978–2002 to produce their evidence. Since the information transparency and the speed of information spread in 1978–2002 may differ from that in the current Internet era, it is necessary to further verify the predictability of stock daily returns in the Internet era. Unfortunately, we have not yet found any references using recent data to provide relevant evidence so far. Hence, we remain cautious about the predictability of stock daily returns in the Internet era and will explore this topic further in future work.

4.2 RQ1: Performance Comparison of Normal Economic Environment

Table 2 summarizes the experimental results of the buy-and-hold index and all stock-recommendation methods in terms of profitability and risk on the three normal datasets (NASDAQ, NYSE, and CASE). We can obtain the following observations:

Table 2.

Dataset	NASDAQ			NYSE			CASE
Model	IRR\(\uparrow\)	SR\(\uparrow\)	MDD\(\downarrow\)	IRR\(\uparrow\)	SR\(\uparrow\)	MDD\(\downarrow\)	IRR\(\uparrow\)	SR\(\uparrow\)	MDD\(\downarrow\)
Buy&Hold	0.24	2.43	\(2.57\%\)	0.14	1.96	\(1.58\%\)	0.22	0.71	\(4.50\%\)
ARIMA	0.10	0.55	\(8.21\%\)	0.10	0.33	\(7.15\%\)	0.23	0.30	\(8.53\%\)
LSTM	0.22	0.95	\(7.37\%\)	0.12	0.79	5.72%	0.35	0.53	\(7.75\%\)
GCN	0.13	0.46	\(7.91\%\)	0.16	0.72	\(6.20\%\)	0.43	1.03	\(7.33\%\)
RSR-E	0.26	1.12	\(7.35\%\)	0.20	0.88	\(5.86\%\)	0.56	0.96	5.95%
RSR-I	0.39	1.34	\(6.75\%\)	0.21	0.95	\(6.34\%\)	0.58	1.04	\(7.95\%\)
ANN-SVM	0.32	1.28	5.73%	0.33	1.14	\(8.59\%\)	0.26	0.43	\(5.97\%\)
STHAN-SR	0.44	1.42	\(6.37\%\)	0.33	1.12	\(6.09\%\)	0.71	1.09	\(7.80\%\)
SVAT (Ours)	\(\mathbf {0.59}\)	\(\mathbf {3.10}\)	\(\mathbf {3.29\%}\)	\(\mathbf {0.38}\)	\(\mathbf {2.61}\)	\(\mathbf {3.13\%}\)	\(\mathbf {0.79}\)	\(\mathbf {1.50}\)	\(\mathbf {5.88\%}\)

Table 2. Backtesting Results of the Three Normal Datasets

\(\uparrow\) means the larger, the better, while \(\downarrow\) means the smaller, the better. Among the results of all stock-recommendation methods, the best results are in boldface and the second-best results are underlined.

—

Comparing the results of Buy&Hold with those of other stock-recommendation methods, we observe that Buy&Hold achieves the smallest MDD and competitive SR on the three normal datasets, mainly because it reduces the volatility of returns by averaging the stock returns across the whole market. However, the IRR of Buy&Hold is more than \(50\%\) less than other state-of-the-art stock-recommendation models such as RSR, ANN-SVM, STHAN-SR, and our SVAT methods. Therefore, the buy-and-hold index cannot attain a good balance between profits and risks in the stock market, and it is necessary to develop more advanced stock-recommendation methods.

—

Among all stock-recommendation methods, our SVAT method outperforms the other baselines on all datasets, showing the superiority of the split variational adversarial training design for stock recommendation. Specifically, SVAT improves the SR by an average of \(94.96\%\) and reduces the MDD by an average of \(29.68\%\) compared to the second-best results of other methods. Such an improvement answers the RQ1 that SVAT does outperform state-of-the-art stock-recommendation models in terms of risk-adjusted profits under normal circumstances.

—

As for profitability, SVAT improves the cumulative investment return ratio (IRR) by an average of \(20.17\%\) on the three normal datasets. This shows that increasing the risk-sensitivity of the model also helps improve the investment profit, probably by encouraging the model to avoid selecting stock examples with high risks.

—

Compared to the original backbone model STHAN-SR, SVAT greatly improves IRR, SR, MDD by an average of \(20.17\%, 96.32\%, 40.52\%\), respectively, which demonstrates that our method does help the stock-recommendation model to effectively reduce investment risks and control potential losses under a safer and more tolerable risk level.

—

Without explicit risk modeling, the state-of-the-art models RSR-I, ANN-SVM, and STHAN-SR even attain worse MDDs than simple methods LSTM and GCN on the NYSE and/or CASE datasets, which indicates that recent advanced stock-recommendation models may not have achieved improvement in risk control. Hence, it is necessary and promising to design a method like SVAT to enhance the risk awareness of stock recommendation.

Figure 4 presents the curves of daily returns and the cumulative IRR of all models backtested on the three normal datasets. Clearly, the standard deviation (STD) of SVAT’s daily returns is smaller and the IRR curve of SVAT is less volatile than recent advanced stock-recommendation models RSR-E, RSR-I, ANN-SVM, and STHAN-SR. Although the STD of SVAT’s daily returns is slightly larger than simple methods ARIMA/GCN on the NASDAQ/CASE dataset, SVAT obtains much larger IRR than ARIMA/GCN. Again, although the STD of Buy&Hold’s daily returns is the smallest, the IRR curve of Buy&Hold is inferior to other state-of-the-art stock-recommendation models. All of these observations vividly show that our method effectively reduces the volatility of the investment returns and achieves a good balance between profits and risks.

Fig. 4.

4.3 RQ2: Performance Comparison of Financial Crisis Period

Table 3 shows the experimental results of the buy-and-hold index and all stock-recommendation methods in terms of profitability and risk on the three crisis datasets (NASDAQ_08, NYSE_08, CASE_08), from which we observe that:

Table 3.

Dataset	NASDAQ_08			NYSE_08			CASE_08
Model	IRR\(^{\times 10^{-3}}\uparrow\)	SR\(^{\times 10^{-2}}\uparrow\)	MDD\(\downarrow\)	IRR\(^{\times 10^{-3}}\uparrow\)	SR\(^{\times 10^{-2}}\uparrow\)	MDD\(\downarrow\)	IRR\(^{\times 10^{-3}}\uparrow\)	SR\(^{\times 10^{-2}}\uparrow\)	MDD\(\downarrow\)
Buy&Hold	\(-493.53\)	\(-113.76\)	\(9.14\%\)	\(-530.75\)	\(-118.83\)	\(9.73\%\)	\(-913.38\)	\(-177.13\)	\(7.73\%\)
ARIMA	\(-129.24\)	\(-24.33\)	\(11.10\%\)	\(-81.27\)	\(-16.68\)	14.51%	\(-232.89\)	\(-35.75\)	\(10.52\%\)
LSTM	\(-142.63\)	\(-32.17\)	10.62%	\(-61.48\)	\(-8.26\)	\(17.27\%\)	\(-126.65\)	\(-20.66\)	10.12%
GCN	\(-139.23\)	\(-25.54\)	\(14.36\%\)	\(-48.59\)	\(-11.96\)	\(16.28\%\)	\(-159.01\)	\(-25.76\)	\(10.47\%\)
RSR-E	\(-71.21\)	\(-15.74\)	\(13.72\%\)	\(-15.90\)	\(-3.19\)	\(22.14\%\)	\(-21.56\)	\(-5.80\)	\(10.94\%\)
RSR-I	\(-11.13\)	\(-4.50\)	\(14.23\%\)	\(-10.91\)	\(-3.43\)	\(14.74\%\)	\(-21.50\)	\(-5.55\)	\(10.87\%\)
ANN-SVM	\(-54.12\)	\(-7.30\)	\(17.05\%\)	\(-38.34\)	\(-5.36\)	\(19.43\%\)	\(-30.03\)	\(-6.33\)	\(10.28\%\)
STHAN-SR	−8.60	\(\mathbf {-2.78}\)	\(16.53\%\)	−7.35	−2.54	\(23.66\%\)	−13.10	−3.96	\(10.73\%\)
SVAT (Ours)	\(\mathbf {-0.11}\)	−3.14	\(\mathbf {9.62\%}\)	\(\mathbf {-0.0916}\)	\(\mathbf {-2.23}\)	\(\mathbf {13.23\%}\)	\(\mathbf {-2.26}\)	\(\mathbf {-2.91}\)	\(\mathbf {9.39\%}\)

Table 3. Backtesting Results of the Three Crisis Datasets

—

Due to the cruelty of the global financial crisis, all methods suffer investment losses (i.e., IRR \(\lt 0\)) on the three crisis datasets. In particular, although Buy&Hold achieves the smallest MDD across the three crisis datasets, the IRR and SR of Buy&Hold are much worse than other stock-recommendation methods. This observation further demonstrates that a good stock-recommendation model is critical to protecting investors from severe loss during the global financial crisis.

—

Remarkably, among all stock-recommendation methods, our method SVAT achieves the best IRRs and MDDs across all crisis datasets, showcasing that SVAT successfully protects investors from more losses than the other baseline methods. This answers the RQ2 that SVAT can protect investors from risks better than other state-of-the-art stock-recommendation models under the extreme circumstance of the global financial crisis.

—

As for volatility evaluation, SVAT attains the best SR on NYSE_08 and CASE_08 datasets but performs worse than the STHAN-SR model on NASDAQ_08 dataset. Nevertheless, Figure 5(a) shows that the volatility (i.e., the STD of the model profit) of SVAT is lower than STHAN-SR on NASDAQ_08 dataset. This is mainly because, according to Equation (19), the SR metric will decrease as the STD of the model profit decreases when IRR \(\lt 0 \lt R_f\). Hence, the SR metric might be slightly distorted in the environment of global financial crisis, but Figure 5(a) demonstrates that our method can still effectively reduce the volatility of the stock-recommendation model under extreme circumstances.

—

Similarly, although the state-of-the-art models RSR-E, RSR-I, ANN-SVM, and STHAN-SR obtain better IRR and SR than simple methods ARIMA, LSTM, and GCN on the three crisis datasets, they perform poorly on the MDD metric. Instead, SVAT outperforms all stock-recommendation methods on the MDD metric, showing the improvement on anti-risk ability during the global financial crisis period.

Fig. 5.

Again, we present the curves of daily returns and the cumulative IRR of all models backtested on the three crisis datasets in Figure 5. As expected, the curves of the global financial crisis period are more volatile than that of the normal period shown in Figure 4. However, our method still has better stability than other baseline stock-recommendation methods and thus more effectively reduces the risk of losses for investors in the extreme environment of financial crisis. Similarly, although the STD of Buy&Hold’s daily returns is the smallest, the IRR curve of Buy&Hold is intolerable to most investors.

4.4 RQ3: Correlations between All Methods

From Figures 4(b) and 5(b), we observe that the cumulative investment returns of different methods behave differently under normal economic environment while exhibit high correlation with each other during the financial crisis period. Such differences inspire us to further investigate the extent to which all methods are correlated in different economic environments. Hence, in this section, we aim to evaluate the correlation between different methods by the following two metrics:

(1)

Pearson Correlation Coefficient (PCC):

\begin{equation*} \text{PCC}(m_1, m_2) = \frac{1}{T} \sum _{t} PCC^t(m_1, m_2) = \frac{1}{T} \sum _{t} \frac{\mathbb {E}_{\mathcal {S}}[(\hat{Y}^{m_1}_t - \mathbb {E}_{\mathcal {S}}[\hat{Y}^{m_1}_t]) (\hat{Y}^{m_2}_t - \mathbb {E}_{\mathcal {S}}[\hat{Y}^{m_2}_t])]}{\text{STD}_{\mathcal {S}}[\hat{Y}^{m_1}_t] \cdot \text{STD}_{\mathcal {S}}[\hat{Y}^{m_2}_t]}, \nonumber \nonumber \end{equation*}

where \(m_1, m_2 \in \lbrace \text{ARIMA, LSTM, GCN, RSR-E, RSR-I, ANN-SVM, STHAN-SR, SVAT}\rbrace\) denote any two of all the models, T is the length of trading days, \(\hat{Y}^m_t\) is the ranking score predicted by model m on trading day t, and \(\mathbb {E}_{\mathcal {S}}[\cdot ], \text{STD}_{\mathcal {S}}[\cdot ]\) denote the expectation and the standard deviation w.r.t. the set of all stocks \(\mathcal {S}\), respectively. \(\text{PCC} \in [-1, 1]\) measures the linear correlation between any two of all the methods where a PCC value close to 0 indicates a weak linear relationship between the two methods and vice versa.

(2)

Top-k Stocks Difference (\(\text{TSD}^k\)):

\begin{equation*} \text{TSD}^k(m_1, m_2) = \frac{1}{T} \sum _t \frac{|\mathcal {S}^{m_1}_t - \mathcal {S}^{m_2}_t|}{k}, \nonumber \nonumber \end{equation*}

where \(\mathcal {S}^{m}_t \subset \mathcal {S}\) denotes the set of top-k stocks selected by model m on trading day t and thus \(|\mathcal {S}^{m_1}_t - \mathcal {S}^{m_2}_t|\) is the number of different stocks among the top-k stocks selected by models \(m_1\) and \(m_2\). \(\text{TSD}^k \in [0, 1]\) measures the extent of how any two of all the methods recommend stocks different from each other, with a \(\text{TSD}^k\) value close to 1 indicating a weak stock-selection relationship between the two methods and vice versa.

Figures 6 and 7 show the PCC and \(\text{TSD}^5\) between any two of all the methods, from which we can obtain the following observations about model correlation:

Fig. 6.

Fig. 7.

—

According to the PCC values in Figure 6, the four models including GCN, RSR-E, RSR-I, and STHAN-SR are more correlated to each other than other models, mainly because all of them are graph-based learning methods. Although our SVAT method employs STHAN-SR as the backbone model, it presents some different correlations compared to STHAN-SR and other graph-based models on the three normal datasets. Nevertheless, Figure 7 shows that the \(\text{TSD}^5\) values between any two of all the methods are close to 0.9 or above, except the two similar models RSR-E and RSR-I. The high \(\text{TSD}^5\) values demonstrate that most methods have a weak stock-selection relationship and tend to independently recommend stocks different from other methods.

—

Particularly on the three normal datasets, our SVAT method exhibits weak negative correlations with other methods on NASDAQ and CASE datasets, while maintaining weak positive correlations on NYSE dataset. Recalling the fact that NASDAQ and CASE stock markets are more volatile than NYSE stock market [46], we infer that the proposed split variational adversarial training framework is better at capturing different trading signals in more volatile markets.

—

Paradoxically, both the cumulative investment returns in Figure 5(b) and the PCC values in Figure 6(b) show that all methods have relatively high correlations on the three crisis datasets, but the high \(\text{TSD}^5\) values in Figure 7(b) reveal that most methods have weak stock-selection relationships. We speculate that under the extreme situation of the global financial crisis, most stocks suffered from similar drawdowns. Figure 8 presents the cumulative return curves of six stocks (randomly selected) of the three crisis datasets during the global financial crisis, from which we observe that all these stocks experienced similar drawdowns and their cumulative investment returns are highly correlated. Therefore, although different methods will recommend different stocks to investors, they produce highly correlated cumulative investment returns during the global financial crisis.

Fig. 8.

4.5 RQ4: Effectiveness of Model Design

4.5.1 Effects of SVAT on Other Baselines.

We have demonstrated that incorporating SVAT with the backbone model STHAN-SR can achieve the best results against other state-of-the-art baselines. In this section, we further evaluate the overall performance of the SVAT method by inspecting whether SVAT could also improve the performance of other baselines. Hence, we combine the SVAT method with LSTM, GCN, RSR-E, RSR-I, ANN-SVM models for stock recommendation, respectively, and compare their results with original models.

Table 4 shows the comparison results between other baselines and their SVAT-variants on the normal and the crisis datasets. Among the total 90 comparison cases, the SVAT-variants score 85 better results. This further demonstrates that SVAT is a general learning framework that can be incorporated with various stock-recommendation models to enhance their risk awareness.

Table 4.

Dataset	NASDAQ			NYSE			CASE			Better Results
Model	IRR\(\uparrow\)	SR\(\uparrow\)	MDD\(\downarrow\)	IRR\(\uparrow\)	SR\(\uparrow\)	MDD\(\downarrow\)	IRR\(\uparrow\)	SR\(\uparrow\)	MDD\(\downarrow\)	Count
LSTM	0.22	0.95	\(7.37\%\)	0.12	0.79	\(5.72\%\)	0.35	0.53	\(7.75\%\)	0
SVAT+LSTM	\(\mathbf {0.37}\)	\(\mathbf {1.67}\)	\(\mathbf {3.64\%}\)	\(\mathbf {0.25}\)	\(\mathbf {0.92}\)	\(\mathbf {5.05\%}\)	\(\mathbf {0.37}\)	\(\mathbf {0.95}\)	\(\mathbf {6.92\%}\)	9
GCN	0.13	0.46	\(7.91\%\)	0.16	0.72	\(6.20\%\)	0.43	1.03	\(7.33\%\)	0
SVAT+GCN	\(\mathbf {0.24}\)	\(\mathbf {1.26}\)	\(\mathbf {4.19\%}\)	\(\mathbf {0.24}\)	\(\mathbf {0.83}\)	\(\mathbf {5.99\%}\)	\(\mathbf {0.53}\)	\(\mathbf {1.30}\)	\(\mathbf {6.41\%}\)	9
RSR-E	\(\mathbf {0.26}\)	1.12	\(7.35\%\)	0.20	0.88	\(5.86\%\)	\(\mathbf {0.56}\)	0.96	\(5.95\%\)	2
SVAT+RSR-E	0.21	\(\mathbf {1.16}\)	\(\mathbf {3.55\%}\)	\(\mathbf {0.25}\)	\(\mathbf {1.03}\)	\(\mathbf {4.97\%}\)	0.55	\(\mathbf {1.26}\)	\(\mathbf {4.85\%}\)	7
RSR-I	\(\mathbf {0.39}\)	1.34	\(6.75\%\)	0.21	0.95	\(6.34\%\)	0.58	1.04	\(7.95\%\)	1
SVAT+RSR-I	0.36	\(\mathbf {1.79}\)	\(\mathbf {5.72\%}\)	\(\mathbf {0.30}\)	\(\mathbf {1.11}\)	\(\mathbf {5.13\%}\)	\(\mathbf {0.60}\)	\(\mathbf {1.36}\)	\(\mathbf {6.72\%}\)	8
ANN-SVM	0.32	1.28	\(5.73\%\)	0.33	1.14	\(8.59\%\)	0.26	0.43	\(5.97\%\)	0
SVAT+ANN-SVM	\(\mathbf {0.48}\)	\(\mathbf {1.84}\)	\(\mathbf {4.51\%}\)	\(\mathbf {0.41}\)	\(\mathbf {1.17}\)	\(\mathbf {8.40\%}\)	\(\mathbf {0.42}\)	\(\mathbf {1.06}\)	\(\mathbf {4.74\%}\)	9
Dataset	NASDAQ_08			NYSE_08			CASE_08			Better Results
Model	IRR\(^{\times 10^{-3}}\uparrow\)	SR\(^{\times 10^{-2}}\uparrow\)	MDD\(\downarrow\)	IRR\(^{\times 10^{-3}}\uparrow\)	SR\(^{\times 10^{-2}}\uparrow\)	MDD\(\downarrow\)	IRR\(^{\times 10^{-3}}\uparrow\)	SR\(^{\times 10^{-2}}\uparrow\)	MDD\(\downarrow\)	Count
LSTM	\(-142.63\)	\(-32.17\)	\(10.62\%\)	\(-61.48\)	\(-8.26\)	\(17.27\%\)	\(-126.65\)	\(-20.66\)	\(10.12\%\)	0
SVAT+LSTM	\(\mathbf {-79.31}\)	\(\mathbf {-14.84}\)	\(\mathbf {9.85\%}\)	\(\mathbf {-53.00}\)	\(\mathbf {-7.93}\)	\(\mathbf {14.91\%}\)	\(\mathbf {-64.41}\)	\(\mathbf {-11.07}\)	\(\mathbf {9.86\%}\)	9
GCN	\(-139.23\)	\(-25.54\)	\(14.36\%\)	\(-48.59\)	\(-11.96\)	\(16.28\%\)	\(-159.01\)	\(-25.76\)	\(10.47\%\)	0
SVAT+GCN	\(\mathbf {-68.44}\)	\(\mathbf {-8.23}\)	\(\mathbf {13.70\%}\)	\(\mathbf {-34.21}\)	\(\mathbf {-4.44}\)	\(\mathbf {14.84\%}\)	\(\mathbf {-71.21}\)	\(\mathbf {-12.86}\)	\(\mathbf {10.34\%}\)	9
RSR-E	\(-71.21\)	\(-15.74\)	\(13.72\%\)	\(-15.90\)	\(\mathbf {-3.19}\)	\(22.14\%\)	\(-21.56\)	\(-5.80\)	\(10.94\%\)	1
SVAT+RSR-E	\(\mathbf {-59.00}\)	\(\mathbf {-12.37}\)	\(\mathbf {9.26\%}\)	\(\mathbf {-10.99}\)	\(-3.60\)	\(\mathbf {14.22\%}\)	\(\mathbf {-4.83}\)	\(\mathbf {-2.64}\)	\(\mathbf {9.98\%}\)	8
RSR-I	\(-11.13\)	\(-4.50\)	\(14.23\%\)	\(-10.91\)	\(-3.43\)	\(14.74\%\)	\(-21.50\)	\(-5.55\)	\(10.87\%\)	0
SVAT+RSR-I	\(\mathbf {-3.20}\)	\(\mathbf {-3.63}\)	\(\mathbf {8.60\%}\)	\(\mathbf {-3.03}\)	\(\mathbf {-2.64}\)	\(\mathbf {14.00\%}\)	\(\mathbf {-4.17}\)	\(\mathbf {-2.61}\)	\(\mathbf {10.01\%}\)	9
ANN-SVM	\(-54.12\)	\(\mathbf {-7.30}\)	\(17.05\%\)	\(-38.34\)	\(-5.36\)	\(19.43\%\)	\(-30.03\)	\(-6.33\)	\(10.28\%\)	1
SVAT+ANN-SVM	\(\mathbf {-35.13}\)	\(-8.05\)	\(\mathbf {12.27\%}\)	\(\mathbf {-26.12}\)	\(\mathbf {-4.38}\)	\(\mathbf {17.96\%}\)	\(\mathbf {-17.41}\)	\(\mathbf {-4.89}\)	\(\mathbf {9.98\%}\)	8

Table 4. Comparison between Other Baselines and Their SVAT-variants on the Normal and the Crisis Datasets

Better results are in boldface.

4.5.2 Ablation Study of Key Components.

Our method consists of two key components: the split adversarial training mechanism (Equation (6)) and the variational perturbation generator (VPG), as shown in Figure 3. In this experiment, we aim to evaluate whether these two components are necessary to reduce stock-recommendation risks. Therefore, to show the effectiveness of each component, we compare SVAT with two variants:

—

SVATw/oS: We change the adversarial loss in Equation (6) to be \(\mathcal {L}^{adv} = \sum _{i=1}^N \mathcal {L}^{adv}_i\) without weighting by stocks’ return ratios, which reduces SVAT to conventional adversarial training without the split effect.

—

SVATw/oV: We remove the VPG from SVAT and only employ Equation (9) to generate the perturbation for each stock example.

Table 5 presents the results of comparison on the normal and the crisis datasets. We can observe that SVAT attains higher IRR and SR than all variants on most datasets. However, the MDD of SVAT is relatively inferior to its variants on NASDAQ, NYSE, and NYSE_08 datasets. We postulate the reason is that either the split effect or the VPG mainly works to reduce the risk of maximum daily loss without significant profit improvement, while combining them into the complete SVAT framework can achieve a better balance between profits and risks. As a result, both the split adversarial training mechanism and the variational perturbation generator are necessary to construct the complete SVAT architecture for better stock recommendation.

Table 5.

Dataset	NASDAQ			NYSE			CASE
Model	IRR\(\uparrow\)	SR\(\uparrow\)	MDD\(\downarrow\)	IRR\(\uparrow\)	SR\(\uparrow\)	MDD\(\downarrow\)	IRR\(\uparrow\)	SR\(\uparrow\)	MDD\(\downarrow\)
SVATw/oS	0.47	2.84	\(3.68\%\)	0.12	1.26	\(\mathbf {1.77\%}\)	0.69	1.26	\(6.69\%\)
SVATw/oV	0.42	2.20	\(\mathbf {3.19\%}\)	0.21	1.75	\(1.88\%\)	0.68	1.02	\(6.56\%\)
SVAT	\(\mathbf {0.59}\)	\(\mathbf {3.10}\)	\(3.29\%\)	\(\mathbf {0.38}\)	\(\mathbf {2.61}\)	\(3.13\%\)	\(\mathbf {0.79}\)	\(\mathbf {1.50}\)	\(\mathbf {5.88\%}\)
Dataset	NASDAQ_08			NYSE_08			CASE_08
Model	IRR\(^{\times 10^{-3}}\uparrow\)	SR\(^{\times 10^{-2}}\uparrow\)	MDD\(\downarrow\)	IRR\(^{\times 10^{-3}}\uparrow\)	SR\(^{\times 10^{-2}}\uparrow\)	MDD\(\downarrow\)	IRR\(^{\times 10^{-3}}\uparrow\)	SR\(^{\times 10^{-2}}\uparrow\)	MDD\(\downarrow\)
SVATw/oS	\(-2.46\)	\(-2.41\)	\(11.76\%\)	\(-89.13\)	\(-15.35\)	\(16.68\%\)	\(-14.29\)	\(-4.42\)	\(9.97\%\)
SVATw/oV	\(-0.74\)	\(\mathbf {-2.39}\)	\(15.76\%\)	\(-14.04\)	\(-5.66\)	\(\mathbf {8.35\%}\)	\(-15.62\)	\(-4.43\)	\(9.97\%\)
SVAT	\(\mathbf {-0.11}\)	\(-3.14\)	\(\mathbf {9.62\%}\)	\(\mathbf {-0.0916}\)	\(\mathbf {-2.23}\)	\(13.23\%\)	\(\mathbf {-2.26}\)	\(\mathbf {-2.91}\)	\(\mathbf {9.39\%}\)

Table 5. Comparison between SVAT and Other Variants on the Normal and the Crisis Datasets

The best results are in boldface.

4.6 RQ5: Parameter Sensitivity Analysis

4.6.1 Performance under Different Backtesting Strategies.

In the scenario of stock recommendation, the core strategy is to select top-k stocks for investment, where different values of k could derive diverse investment strategies. In previous experiments, we follow the previous work [16, 46, 54] and generally set \(k=5\) to make consistent comparison with other baseline methods. In practice, however, investors are free to choose different values of k to develop more profitable investment strategies. Therefore, it is necessary to investigate the performance of our proposed method against other baselines under different backtesting strategies. In this section, we conduct backtesting on all methods with \(k \in \lbrace 1,2,3,4,5,6,7,8,9,10\rbrace\), corresponding to strategies of buying stocks with top-1, top-\(2, \ldots ,\) top-10 highest expected returns, respectively. Figure 9 presents the sharp ratios (SRs) of each model backtesting on the normal and the crisis datasets with different strategies, from which we have the following observations:

Fig. 9.

—

Clearly, our method SVAT achieves the best results under different backtesting strategies on almost all datasets, demonstrating the strong adaptability of SVAT such that it is applicable to a wide variety of strategies.

—

While most baseline methods stably attain good performance across different strategies on the normal datasets, they experience high volatility under different strategies on the crisis datasets. This observation exposes that under the extreme circumstances of the global financial crisis, existing stock-recommendation baselines are prone to be vulnerable and their performance is highly dependent on the investment strategy. Instead, SVAT consistently shows great robustness across different strategies even on the crisis datasets, demonstrating the advantage of incorporating our split variational adversarial training framework for risk-aware stock recommendation.

4.6.2 Impact of the Adversarial Hyperparameter.

For adversarial learning methods, the adversarial hyperparameter \(\epsilon\) in Equation (9) plays an important role in the model performance [10]. In this section, we investigate how the performance of our method SVAT varies with different \(\epsilon \in \lbrace 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1\rbrace\). Figure 10 presents the SR of SVAT backtesting on all datasets with different values of \(\epsilon\). Obviously, the performance of SVAT drops significantly when \(\epsilon \gt 0.1\), since a large \(\epsilon\) may produce excessive perturbations that are harmful for model training. On the contrary, SVAT shows good robustness and stability when \(\epsilon\) is controlled within a reasonable range \([0.001, 0.05]\).

Fig. 10.

4.7 RQ6: Risk Quantification by Ranking Entropy

In this section, we aim to investigate how the adversarial perturbation of SVAT works in reducing stock-recommendation risks and provide some insights about risk quantification for investors. As discussed in Section 3.2, by sampling multiple adversarial perturbations \(\boldsymbol{\delta }_i^1, \boldsymbol{\delta }_i^2, \ldots , \boldsymbol{\delta }_i^M\) from the perturbation distribution \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\) learned by the variational perturbation generator, one can compute the ranking entropy \(\mathcal {H}(\tilde{\mathbf {x}}_i)\) for each stock \(s_i\) using Equation (7). We postulate that the relationship between ranking entropies and stock returns could indirectly reveal the truth about risk reduction. Accordingly, we sample \(M=50\) perturbations for each stock example of the testing datasets and inspect the interactions between their ranking entropies and investment returns. Figure 11 shows an example of the relation between \(\mathcal {H}(\tilde{\mathbf {x}}_i)\) and the return ratio \(r_{i,t}\), where we randomly select 50 stocks from NASDAQ, NYSE, and CASE datasets for illustration. Overall, the ranking entropy and the return ratio present an inverse relation, where:

Fig. 11.

—

For NASDAQ and NYSE datasets, most stocks with \(\mathcal {H}(\tilde{\mathbf {x}}_i) \lt 0.25\) earn profits (\(r_{i,t} \gt 0\)) and most stocks with \(\mathcal {H}(\tilde{\mathbf {x}}_i) \gt 0.75\) indicate risks (\(r_{i,t} \lt 0\)).

—

For CASE datasets, most stocks with \(\mathcal {H}(\tilde{\mathbf {x}}_i) \lt 1.0\) earn profits (\(r_{i,t} \gt 0\)) and most stocks with \(\mathcal {H}(\tilde{\mathbf {x}}_i) \gt 1.5\) indicate risks (\(r_{i,t} \lt 0\)).

Since all perturbations are sampled from the distribution \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\), the ranking entropy \(\mathcal {H}(\tilde{\mathbf {x}}_i)\) actually reflects the uncertainty of \(p(\boldsymbol{\delta }_i|\tilde{\mathbf {x}}_i)\), with higher entropy indicating larger uncertainty and thus higher risk of stock \(s_i\). In this case, Figure 11 shows that by increasing the ranking uncertainties of risky stocks with negative returns, the perturbations produced by SVAT probably help the model better recognize those risky stocks and thus avoid recommending such stocks, which reduces the possibility of potential losses and lowers the risk of stock recommendation. Also, the ranking entropy provides an effective tool for investors to roughly evaluate the risk of each stock before making investment decisions, demonstrating the additional advantage of risk interpretability brought by SVAT.

5 Related Work

Our work is directly related to the recent work on stock prediction, risk management, adversarial training, and variational autoencoder.

5.1 Stock Prediction

Researchers have been studying the stock market for decades, developing various stock-prediction methods to pursue excess investment profits [8]. As we mentioned in Section 1, recent work on stock prediction can be separated into three categories: stock price regression, stock trend classification, and stock recommendation [45]. Stock price regression formulates stock prediction as a pure time-series forecasting problem and predicts the future stock prices/returns by learning from historical stock time series. Traditional econometrics [19] employs simple linear models such as VAR and ARIMA [53] to perform stock price regression prediction, while contemporary methods leverage advanced learning architectures such as SVM [48], Tensor Aggregation [28], RNN [9, 42], GAN [56, 60], and Transformer [35] to better capture non-linear relationships in the stock time series. However, stock trend classification treats stock prediction as a binary up/down classification problem and develops efficient classifiers to perform stock-movement prediction. Advanced classification models include StockNet [57], Adv-ALSTM [15], STLAT [29], and so on. Nonetheless, both stock price regression and stock trend classification have a significant drawback in that they are not directly optimized towards the target of investment (i.e., profit maximization), which limits their practicality [46]. To overcome this drawback, reinforcement learning methods [7, 32, 51] have been proposed to improve model profits by capturing trading signals in a dynamic prediction. And stock recommendation proposes to rank stocks with return ratios based on the comparison among multiple stocks [17]. In this case, models are trained to select top-k stocks with maximum expected profits to ensure their consistency with the investment target. Recent stock-recommendation models combine both the RNN and the GNN architectures to learn the relationships between multiple stocks [16, 17, 46, 54].

Although various stock-recommendation methods have achieved promising results, literature on reducing stock-recommendation risks with adversarial training is still scarce. In this work, we aim to tackle risk concerns of stock recommendation and demonstrate a new way to reduce investment risks with adversarial perturbations.

5.2 Risk Management

Risk management is essential to protect investors in a volatile stock market. Over the decades, numerous studies have been conducted to analyze and mitigate the potential risks associated with stock investments [37, 50]. One prominent approach is the application of modern portfolio theory (MPT) [36], which emphasizes the importance of diversification in reducing overall portfolio risk. Furthermore, researchers have proposed various financial models such as the Capital Asset Pricing Model (CAPM) [49] and Black-Scholes model [6] to better understand and quantify risk factors. Recently, advanced machine learning techniques have also found their significance on stock risk management [31], and some researchers have incorporated stock volatilities into reinforcement learning models to achieve better risk-adjusted profits [30, 58]. However, most of the existing risk-management methods are designed for some specific scenarios or models, which cannot be directly applied to the stock-recommendation model. In this article, we innovatively propose a risk-management approach tailored for stock recommendation and demonstrate the feasibility of adversarial learning in risk management.

5.3 Adversarial Training

Desipite the excellent learning ability, deep neural networks (DNNs) have been found to be vulnerable to adversarial perturbations on data, i.e., adversarial examples [18, 52]. Therefore, lots of adversarial training (AT) methods have been proposed to enhance the adversarial robustness of DNN classifiers [41, 61], such as FGSM [18], PGD [34], ADT [13], and so on. As for stock classification, Adv-ALSTM [15] might be the first work to engage adversarial training for stock-movement prediction. All of the above conventional AT methods aim to improve model generalization by training the model to produce the same output on both original and perturbed examples. In contrast, researchers of dialogue generation have proposed Inverse Adversarial Training (IAT) to encourage the model to be sensitive to perturbations and generate more diverse responses [62]. In this article, we combine advantages of AT and IAT to better enhance the risk awareness of the stock-recommendation model.

5.4 Variational Autoencoder

As one of the mainstream deep generative models, the VAE [24] is good at learning the probability distribution of high-dimensional data through low-dimensional latent representations and generating high-quality data [4, 22]. During the past few years, VAE has received widespread attention in various research fields and achieved promising results in image and audio generation [21, 27, 43], speech processing [12, 44, 47], text processing [38], and biomedical informatics [55, 59]. In financial applications, Reference [11] employs an LSTM-VAE framework to perform multi-step-ahead prediction of the stock closing price, and Reference [57] develops a VAE-based model combining social media text and price signals for stock-movement prediction. Recently, a more advanced model FactorVAE [14] has been proposed to predict cross-sectional stock returns by regarding stock factors as the latent random variables in VAE. Inspired by previous work, we also employ the VAE architecture to generate representative risk indicators for our stock-recommendation model.

6 Conclusion

In this article, we propose a novel adversarial learning framework (SVAT) for risk-aware stock recommendation. In the first level, we design a split adversarial training method to enhance model’s sensitivity to the adversarial perturbations of risky stock examples. In the second level, we devise a variational perturbation generator to model diverse risk factors and generate representative adversarial examples as risk indicators. Besides, the variational architecture enables our method to provide a rough risk quantification for investors, showing an additional advantage of interpretability. Experiments on three real-world datasets demonstrate that our method effectively reduces the volatility of the recommendation model and achieves the best risk-adjusted profit against seven baselines. In addition, we demonstrate the efficiency of each component of the SVAT algorithm through ablative and qualitative experiments.

For future research, we aim to explore the risk-aware adversarial learning for long-term stock prediction and incorporate additional data sources such as financial news for better risk modeling.

Footnotes

We empirically found that the \(l_2\)-norm constraint is better for the stock-recommendation problem.

https://www.ricequant.com/

https://finance.yahoo.com/

⁴

T-Bill rates: https://home.treasury.gov/

⁵

https://pytorch.org/

⁶

https://www.statsmodels.org/stable/index.html

References

[1]

Klaus Adam, Albert Marcet, and Juan Pablo Nicolini. 2016. Stock market volatility and learning. J. Finance 71, 1 (2016), 33–82. DOI:

Abstract

1 Introduction

2 Preliminary

2.1 Problem Formulation

2.2 Adversarial Training Definition

3 Methodology

3.1 Motivation

3.2 Overview

3.3 Variational Perturbation Generator

3.3.1 Perturbation Extractor.

3.3.2 Posterior Encoder.

3.3.3 Generative Decoder.

3.3.4 Explanation for VPG.

3.3.5 Theoretical Justification.

3.4 Model Training

4 Experiments

4.1 Experimental Setting

4.1.1 Datasets.

4.1.2 Baselines.

4.1.3 Evaluation Metrics.

4.1.4 Training Setup.

4.1.5 Discussion: Predictability of Stock Daily Returns.

4.2 RQ1: Performance Comparison of Normal Economic Environment

4.3 RQ2: Performance Comparison of Financial Crisis Period

4.4 RQ3: Correlations between All Methods

4.5 RQ4: Effectiveness of Model Design

4.5.1 Effects of SVAT on Other Baselines.

4.5.2 Ablation Study of Key Components.

4.6 RQ5: Parameter Sensitivity Analysis

4.6.1 Performance under Different Backtesting Strategies.

4.6.2 Impact of the Adversarial Hyperparameter.

4.7 RQ6: Risk Quantification by Ranking Entropy

5 Related Work

5.1 Stock Prediction

5.2 Risk Management

5.3 Adversarial Training

5.4 Variational Autoencoder

6 Conclusion

Footnotes

References

Index Terms

Recommendations

Risks and Risk Control Measures of Risk Investment Projects in Investment Stagey

Graph-Based Stock Recommendation by Time-Aware Relational Attention Network

Personalized Stock Recommendation with Investors' Attention and Contextual Information

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations