1 Introduction

In order to meet the high capacity and performance requirements of 5th generation (5 G) mobile systems and beyond, new spectrum needs to be acquired at higher frequencies [1, 2]. Massive multiple-input multiple-output (MIMO) has been recognized as a promising approach to efficiently exploit the vast spectral resources available at millimeter waves, unlocking the substantial potential of millimeter wave (mm-wave) communications [3, 4]. Massive MIMO technology enables highly directive beamforming by utilizing a large number of transmit (and/or receive) antennas [3]. High beamforming gain is needed for supporting reasonable cell sizes by compensating severe free-space path loss occurring in mm-wave frequencies [5]. In addition, massive MIMO can provide spectrally efficient communications with high data rates through (multi-user) spatial multiplexing [1, 3]. Implementing massive MIMO using conventional digital beamforming is complex and hardware demanding since one radio frequency (RF) chain per antenna is needed [6]. This is a costly and power consuming requirement due to the large number of needed RF components, such as wideband digital-to-analog converters (DACs) [4, 7]. In terms of hardware complexity, analog beamforming is a more feasible approach supporting single-stream transmission and requiring only a single RF chain regardless of the number of antennas [5]. However, analog beamforming cannot exploit the full potential of massive MIMO since spectrally efficient multi-stream transmission is not supported [5]. In this respect, hybrid analog-digital beamforming (HBF) is considered as a promising solution for implementing massive MIMO and providing a compromise between hardware complexity and spectral efficiency [4, 6]. The hybrid architecture splits the whole beamforming process into digital and analog parts enabling multi-stream transmissions with a reduced number of RF chains [4, 6].

Most of the conventional state-of-the-art hybrid methods are based on fully connected RF architecture [8,9,10,11,12,13,14,15,16,17,18,19,20,21], in which each RF chain is connected to all antennas as depicted in Fig. 1. However, due to very challenging and lossy RF signal division and combining processes, fully connected methods are of higher complexity and consume more power. A more practical solution is partially connected RF architecture, in which each RF chain is connected only to one subarray of antennas [22, 23]. Partially connected RF architecture itself can be divided into two categories, i.e., full array-based and subarray-based processing designs, as first introduced in our earlier conference papers [24,25,26]. In full array-based processing, all data streams are conveyed to all subarrays. Thus, each stream is transmitted via its corresponding beam which is generated by all subarrays. This implies that full beamforming gain is potentially available. However, the directions of different beams are interdependent due to the partial connectivity. This sets restrictions to the beam generation and leads to suboptimal beam directions. In subarray-based processing, each subarray transmits only a single data stream i.e., each data stream is conveyed to only one RF chain. This leads to a more efficient and flexible beam design process. However, the beamforming gain is limited by the number of antennas per subarray. While the subarray-based partial connectivity is common in the literature, the full-array-based ones barely exist [27].

Fig. 1
figure 1

a Fully connected and b Partially connected HBF architectures

The majority of the partially connected HBF works in the literature use only phase shifting in the analog beamforming process. However, many highly integrated phased array transceiver solutions have amplitude control for each element, for example, to calibrate beams, control sidelobe levels, tune transmitted power, and perform automatic gain control in the receiver [28]. Hence, employing analog amplitude control in addition to phase shifting can be assumed to be also a feasible assumption [24,25,26, 29, 30]. It is true that varying the antenna-specific amplitudes may complicate, for example, the linearisation of the transmitters, but the literature has shown that it can be done also in an efficient way [31]. Even though this will somewhat increase the hardware complexity, the design of hybrid beamforming weights becomes more flexible and efficient, leading to improved performance. As an example, fully connected HBF with only analog phase shifting can obtain the same performance as fully digital beamforming if the number of RF chains is twice the number of data streams [6]. In comparison, HBF with both analog phase shifting and amplitude control can provide equal performance compared to fully digital beamforming when the number of RF chains and streams is the same [4].

In recent years, there has been significant attention directed toward HBF in the literature. As discussed in [4], research in the HBF domain can be categorized based on different factors such as the system model, the HBF configurations, the problem formulations, the proposed approaches, the level of channel knowledge, and the available bandwidth. This paper focuses on partially connected full array and subarray HBF with analog amplitude/phase control for the rate maximization in frequency flat single-user MIMO (SU-MIMO) and multiuser multiple-input single-output (MU-MISO) systems. The proposed weighted minimum mean squared error (WMMSE) and zero-forcing (ZF) HBF algorithms exploit instantaneous channel state information (CSI). Due to idealistic assumptions, the performance results presented in this paper serve as upper bounds for more practical HBF designs. In the following, the main prior works are briefly introduced. The focus is on the (weighted) minimum mean squared error (MMSE) and ZF-based approaches. While the WMMSE criterion maximizes spectral efficiency, bit error rate (BER) is minimized through the MMSE. In general, the (W)MMSE problems are typically solved by exploiting an iterative alternating optimization-based framework. Most of the works assumed only analog phase shifting. Partially connected HBF studies with analog amplitude/phase control are reviewed last. Each study has its own merits and limitations. A typical limitation of hybrid MMSE and ZF-based algorithms is the assumption of perfect CSI. It is worth noting that all the prior works differ from ours due to different HBF configurations, problem formulations, and/or study targets.

In the literature, a handful of hybrid MMSE and ZF algorithms have been proposed in [14, 15, 32,33,34,35] and [36,37,38], respectively. These works assumed analog phase control only. The study in [14] leveraged the sparsity of multiuser mm-wave channel to solve the hybrid MMSE precoder and combiners. An orthogonal matching pursuit algorithm was used at each iteration of the alternating optimization method. The paper [15] solved a hybrid multiuser MMSE problem by alternately optimizing between the hybrid precoder and combiners. While closed-form expressions were achieved in the digital part, generalized eigen-decomposition was used in the analog domain. In [32], the authors developed an alternating manifold optimization-based hybrid MMSE algorithm for mm-wave MIMO systems, with an extension to the WMMSE design. The target of [35] was to maximize the spectral efficiency through the hybrid WMMSE design in the wideband mm-wave massive MIMO settings, assuming partially connected HBF architecture. The optimal digital precoder and combiner were obtained via closed-form solutions while the analog ones were solved through element iteration and manifold optimization. The work [34] aimed to minimize BER in multiuser MIMO systems by solving the hybrid MMSE precoder and combiners in an iterative manner. The authors in [33] worked on hybrid MMSE beamforming, with a truncated singular value decomposition (SVD)-based analog beamformer and a Lagrange’s method-based digital precoder, in a MU-MISO system. The paper [36] proposed a hybrid ZF precoding design to maximize the sum rate in a multiuser massive MIMO system. The optimal digital precoder was derived through the water-filling power allocation and the Karush-Kurn-Tucker (KKT) condition. The analog precoder was then solved via the conjugate gradient method. The work [37] studied a new class of hybrid regularized ZF beamforming approaches for mm-wave MIMO communications. In [38], the authors studied a hybrid ZF precoder maximizing the weighted sum rate of a MU-MISO orthogonal frequency division multiplexing (OFDM) system.

Only few works in the HBF literature have studied analog amplitude/phase control for fully [39] and partially connected [30, 40] architectures. The work [39] developed a hybrid mm-wave MIMO prototype system and evaluated its performance in indoor and outdoor environments. In [40], the authors proposed a passive architecture for the analog beamforming domain with amplitude and phase control in a partially connected HBF framework, assuming a SU-MIMO setting. In addition, three different subarray antenna configurations were studied, including localized, interleaved, and low-redundancy ones. In [30], the authors focused on developing feedback mechanisms for an initial codebook-based beam alignment phase in order to enable a ZF-based transmission scheme at the BS side in MU-MIMO systems. Compared to a beamsteering method, improved performance comes at the cost of increased feedback overhead. Due to the scarcity of algorithms and performance evaluations in the literature, partially connected HBF with both analog amplitude and phase control needs further studying. Performance comparisons between full array and subarray-based HBF architectures are of particular interest.

In this paper, we consider rate maximization problems in SU-MIMO and MU-MISO systems for partially connected full array-based and subarray-based HBF architectures with amplitude and phase controlled analog domain. We propose alternating optimization-based WMMSE and heuristic ZF HBF algorithms. We are particularly interested in the performance difference between the full array and subarray-based algorithms. To the best of our knowledge, there are no similar works in the literature. Most of the prior studies on partially connected HBF assume only phase shifting in the analog part, leading to different problem formulations. The closest works with amplitude/phase control [30, 40], reviewed earlier, considered only subarray-based partial connectivity, proposing methods that differ significantly from ours. The full array-based study [27] assumed analog phase shifting only. The WMMSE approaches are well-known in the digital beamforming literature [41, 42]. Inspired by them, we propose a similar WMMSE method for our unique HBF problems. The basic principle of the WMMSE method is to reformulate the original non-convex rate maximization problem as the non-convex weighted MSE minimization problem (i.e., the WMMSE problem) which can be solved via an iterative alternating optimization method. In alternating optimization, the WMMSE problem becomes convex for one variable by fixing the others. Each variable is alternately optimized while the other variables are fixed. This iterative process is repeated until a desired level of convergence is achieved. Due to the non-convexity of the original problem, the solution cannot be guaranteed to be globally optimal. Our proposed hybrid ZF algorithms are of lower complexity, aiming to eliminate the inter-stream interference. Numerical simulations are conducted to evaluate the performance of the proposed HBF algorithms, especially the difference between the full array and subarray-based ones. Two different channel models are used, i.e., a simple geometric uniform linear array (ULA) and a more realistic New York University Simulator (NYUSIM) with statistical properties. Since we assume frequency flat channels and the availability of instantaneous CSI, the results serve as upper bounds for more practical HBF algorithms.

For clarity, the main contributions of this article are summarized in the following bullet points.

  • We propose alternating optimization-based WMMSE and heuristic ZF algorithms for partially connected full array and subarray-based HBF architectures with analog domain amplitude/phase control in the SU-MIMO and MU-MISO systems.

  • In the SU-MIMO system, we consider a rate maximization problem with the total transmission power constraint. We propose full array and subarray-based WMMSE algorithms and a lower complexity subarray-based iterative transmit-receive ZF algorithm.

  • In the MU-MISO system, we consider a sum rate maximization problem with the total transmission power constrained. We propose full array and subarray-based WMMSE algorithms and a lower complexity subarray-based ZF algorithm.

  • The performance of the proposed HBF algorithms are evaluated in a traditional geometric ULA channel model and a more practical statistical NYUSIM channel model. The performance difference between the full array and subarray-based algorithms is of particular interest.

The following notations are used throughout this paper. Uppercase boldface and lowercase boldface characters denote matrices and vectors, respectively. \(\textrm{diag}(.)\) denotes a diagonal matrix of its arguments. The superscripts \((.)^{H}\), \((.)^{T}\), and \((.)^{*}\), indicate Hermitian, transpose, and complex conjugate, respectively. The matrix \(\textbf{I}_M\) denotes an \(M \times M\) identity matrix. |.| and \(\textrm{tr}(.)\) are used to represent determinant and trace of a matrix, respectively.

The remainder of the paper is organized as follows. In Sect. 2, the employed system model of SU-MIMO and MU-MISO scenarios are described. Section 3 presents the formulation of the problems in both scenarios. Section 4 and 5 investigate the proposed algorithms for SU-MIMO and MU-MISO scenarios, respectively. The simulation results are presented and discussed in Sect. 6. Finally, Sect. 7 concludes the paper.

2 System Model

This section introduces SU-MIMO and MU-MISO system models assuming partially connected RF architecture at the transmitter side. These models are depicted in Fig. 2. First, the main system details are described, then signal models are presented for both systems separately.

Fig. 2
figure 2

Overall SU-MIMO and MU-MIMSO system models

Consider the systems to be in the downlink mode and both the SU-MIMO and MU-MISO systems share identical configurations at the base station (BS) side. The number of receive antennas/users is assumed to be considerably smaller than the number of transmit antennas. Thus, HBF is considered only at the BS and the receiver is assumed to be fully digital. The BS is equipped with \(N_t\) transmit antennas and a hybrid architecture with \(N_a\) RF chains. In order to find solutions that can perform as upper bounds for partially connected HBF systems with similar configuration, it is assumed that channel state information (CSI) is available at the transmitter as well as the receiver. In the single-user case, the user has \(N_r\) receive antennas and the number of data streams is \(N_s\) which is assumed to be equal to the number of RF chains. In the multi-user case, \(N_u\) single antenna users are served by the BS and the number of data streams \(N_s\) is equal to the number of users and the number of RF chains  i.e. \(N_u = N_s = N_a\). Moreover, partially connected architecture is used in which each RF chain is connected to only one subarray of the antennas. The transmit antenna array is partitioned into \(N_a\) subarrays each with \(n = N_t/N_a\) antennas. In the overall hybrid architecture at the BS, amplitude control is employed in addition to phase shifting in the analog domain.

Fig. 3
figure 3

a Full array- and b subarray-based processing strategies for partially connected HBF

The HBF architecture consists of a digital precoder \(\textbf{D}\) and an analog beamformer \(\textbf{A}\). Based on the digital precoder structure, two different designs, i.e., full array-based and subarray-based processing methods can be considered as depicted in Fig. 3. In the case of full array-based processing, all data streams are connected to all RF chains and the digital precoder \(\textbf{D} \in \mathbb {C}^{N_a \times N_s}\) is given by

$$\begin{aligned} \mathbf{D}&=\left( \begin{array}{cccc} d_{11} &{} d_{12} &{} \ldots &{} d_{1N_s}\\ d_{21} &{} d_{22} &{} \ldots &{} d_{2N_s}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ d_{N_a1} &{} d_{N_a2} &{} \ldots &{} d_{N_aN_s} \end{array} \right) =\left( \begin{array}{cccc} \mathbf{d}_{1}\\ \mathbf{d}_{2}\\ \vdots \\ \mathbf{d}_{N_a} \end{array} \right) \\ {}&\hspace{1mm} = \left( \hspace{3mm} \begin{array}{cccc} {{\bar{\mathbf{d}}}}_{1} &{} \hspace{4mm} {{\bar{\mathbf{d}}}}_{2} &{} \hspace{1mm} \ldots &{} \hspace{2mm} {{\bar{\mathbf{d}}}}_{N_s}\\ \end{array} \hspace{2mm} \right) \end{aligned}$$
(1)

where \(\textbf{d}_{i} = (\,d_{i1}\, d_{i2} \,\ldots \,d_{i N_s}\,)\) and \({{\bar{\textbf{d}}}}_{j} = (\,d_{1j}\, d_{2j} \,\ldots \,d_{N_a j}\,)^T\) are the ith and the jth row and column vectors of the digital precoder, corresponding to the ith and the jth subarray and stream, respectively. In the case of subarray-based processing, where each data stream is connected to only one RF chain (i.e., \(N_a=N_s\)), the digital precoder becomes a diagonal matrix \(\textbf{D} = \textrm{diag}\,(\,d_{11}\, d_{22}\, \ldots \, d_{N_s N_s}\,)\). It is worth mentioning that the subarray-based configuration (with analog amplitude/phase control) allows for more flexible beamforming design compared to the full array-based one since the subarray beams are not interdependent of each other. For example, each subarray can be used to eliminate the inter-stream interference via ZF beamforming. However, the drawback of the subarray architecture is the limited beamforming gain by the number of antennas at each subarray. Thus, there is a trade off between flexibility and beamforming gain when choosing between the subarray and full array-based configurations.

To simplify more, the digital weights can be directly incorporated into the analog beamformer amplitudes and phases of the corresponding subarray. Thus, the digital precoder can be normalized to be identity matrix which only routes data streams to the RF chains. The analog beamformer \(\textbf{A} \in \mathbb {C}^{N_t \times N_a}\) can be expressed as

$$\begin{aligned} \begin{aligned} \textbf{A} =\left( \begin{array}{cccc} \textbf{a}_1 &{} \textbf{0} &{} \ldots &{} \textbf{0}\\ \textbf{0} &{} \textbf{a}_2 &{} \ldots &{} \textbf{0}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \textbf{0} &{} \textbf{0} &{} \ldots &{} \textbf{a}_{N_a} \end{array} \right) \end{aligned} \end{aligned}$$
(2)

where \(\textbf{a}_i \in {{\mathbb {C}}}^{n \times 1}\) is the analog RF beamformer of the ith subarray, and \(\textbf{0} \in {{\mathbb {C}}}^{n \times 1}\) is a zero vector.

2.1 Signal Model for SU-MIMO

In the SU-MIMO scenario, the estimated signal vector at the user is given by

$$\begin{aligned} \begin{aligned} \hat{\textbf{s}}&= \textbf{M}^H \textbf{H} \textbf{A} \textbf{D} \textbf{s} + \textbf{M}^H \textbf{z} \end{aligned} \end{aligned}$$
(3)

where \(\textbf{H} \in {{\mathbb {C}}}^{N_r \times N_t}\) denote the channel matrix, \(\textbf{s} = (s_1, s_2, \ldots , s_{N_s})^T \in {\mathbb C}^{N_s \times 1}\) is the vector of data symbols with \({\mathbb E}[\textbf{ss}^H]=\textbf{I}_{N_s}\), \(\textbf{z} \thicksim \mathcal{C}\mathcal{N}(\textbf{0},N_0\textbf{I}_{N_r})\) stands for additive white Gaussian noise, and \(\textbf{M} = (\,\textbf{m}_1 \; \textbf{m}_2 \; \ldots \; \textbf{m}_{N_s}\,) \in {{\mathbb {C}}}^{N_r \times N_s}\) denotes the digital receive beamformer. The vector \(\textbf{m}_i \in {{\mathbb {C}}}^{N_r \times 1}\) is the ith receive beamformer of the corresponding spatial data stream.

The rate of stream i can be written as

$$\begin{aligned} R_i = \log _2\left( 1 + {|\textbf{m}_i^H \textbf{H} \textbf{v}_i|^2 \over N_0 + \sum \limits _{\begin{array}{c} l=1, l \ne i \end{array}}^{N_s} |\textbf{m}_i^H \textbf{H} \textbf{v}_l|^2} \right) \end{aligned}$$
(4)

where \(\textbf{v}_{i} = (\textbf{a}_1 d_{1i},~ \textbf{a}_2 d_{2i},~ \ldots ,~ \textbf{a}_{N_a} d_{N_ai})^T \in {{\mathbb {C}}}^{N_t \times 1}\) is the overall hybrid precoder of stream i.

2.2 Signal Model for MU-MISO

In the MU-MISO scenario, the received signal of the kth user can be modeled as

$$\begin{aligned} \begin{aligned} y_k&= \textbf{h}_{k}^{H} \textbf{A} {{\bar{\textbf{d}}}}_{k} s_{k} + \textbf{h}_{k}^{H} \sum \limits _{\begin{array}{c} i=1\\ i \ne k \end{array}}^{N_u} \textbf{A} {{\bar{\textbf{d}}}}_{i} s_{i} + z_k\\ \end{aligned} \end{aligned}$$
(5)

where \(\textbf{h}_{k} \in {{\mathbb {C}}}^{N_t \times 1}\) is the channel vector from the transmitter to the kth user, \({{\bar{\textbf{d}}}}_{k} \in {{\mathbb {C}}}^{N_a \times 1}\) is the digital precoder corresponding to the kth user, \(s_{k}\) is the data symbol of the user k, and \(z_k \thicksim \mathcal{C}\mathcal{N}(0,N_0)\) is the additive white Gaussian noise of the kth user.

The rate expression for user k can be written as

$$\begin{aligned} R_k = \log _2\left( 1 + {|m_k \textbf{h}_{k}^{H} \textbf{A} {\bar{\textbf{d}}}_{k}|^2 \over N_0 + \sum \limits _{\begin{array}{c} l=1, l \ne k \end{array}}^{N_u} |m_k \textbf{h}_{k}^{H} \textbf{A} {{\bar{\textbf{d}}}}_{l}|^2}\right) \end{aligned}$$
(6)

where \(m_k\) is the digital receiver of user k which only scales and shifts the phase.

3 Problem Formulation

Our objective is to maximize the data rate of the system while satisfying the total transmission power constraint at the BS. In the following, the rate maximization problem is formulated for SU-MIMO and MU-MISO systems. The optimization problems are equivalently reformulated as weighted MSE minimization that can be solved using iterative alternating optimization efficiently.

3.1 Rate Maximization for SU-MIMO

The optimization objective in the considered SU-MIMO system is to maximize the rate of the user while satisfying the maximum transmission power constraint. This rate maximization problem is expressed as

$$\begin{aligned} \begin{aligned}&\text{maximize}_{\textbf{A}, \textbf{D}, \textbf{M}} \sum _{i=1}^{N_s} R_i \hspace{5mm} \mathrm{s. t.} \hspace{.3cm} \textrm{tr}\,(\textbf{A} \textbf{D} \textbf{D}^{H} \textbf{A}^{H}) \le P \end{aligned} \end{aligned}$$
(7)

where P is the maximum transmission power at the BS. Solving (7) requires digital precoder \(\textbf{D}\), analog beamformer \(\textbf{A}\), and receive beamformer \(\textbf{M}\) to be optimized jointly.

However, this joint optimization problem is non-convex and cannot be optimally solved. To solve (7) suboptimally, we apply the WMMSE approach which is well-known in the digital beamforming literature [41, 42]. In the WMMSE approach, the original problem (7) is first reformulated as a non-convex MSE minimization problem, also known as the WMMSE problem. The resulting WMMSE problem (11) can then be solved via an iterative alternating optimization method. In alternating optimization, the WMMSE problem becomes convex for one variable by fixing the others (among \(\textbf{M}\), \(\textbf{D}\), and \(\textbf{A}\)). Each variable is alternately optimized while the other variables are fixed. This iterative process is repeated until a desired level of convergence is achieved. Due to the non-convexity of the original problem, the solution cannot be guaranteed to be globally optimal. In order to derive (11), we first formulate the error matrix \(\textbf{E}\) and the MMSE receive beamformer M, as described in the following. Note that (14) is equivalent to (11). Both are needed later on in Section 4.1.

The error matrix at the output of the receive beamformer is given by

$$\begin{aligned} \begin{aligned} \textbf{E}&={{\mathbb {E}}}\left[ \left( \textbf{s} - \textbf{M}^H \textbf{y} \right) \left( \textbf{s} - \textbf{M}^H \textbf{y} \right) ^H \right] \\ {}&= \textbf{I} - \textbf{M}^H \textbf{H} \textbf{A} \textbf{D} - \textbf{D}^H \textbf{A}^H \textbf{H}^H \textbf{M} + N_0 \textbf{M}^H \textbf{M} + \textbf{M}^H \textbf{H} \textbf{A} \textbf{D} \textbf{D}^H \textbf{A}^H \textbf{H}^H \textbf{M}. \end{aligned} \end{aligned}$$
(8)

The well known rate optimal digital MMSE receive beamformer can be derived as

$$\begin{aligned} \begin{aligned} \textbf{M}&= (\textbf{H} \textbf{A} \textbf{D} \textbf{D}^H \textbf{A}^H \textbf{H}^H + N_0 \textbf{I}_{N_r})^{-1} \textbf{H} \textbf{A} \textbf{D}. \end{aligned} \end{aligned}$$
(9)

The error matrix after applying the receive beamformer is given by

$$\begin{aligned} \begin{aligned} \textbf{E}&= \left(\textbf{I} + \frac{1}{N_0}{} \textbf{H} \textbf{A} \textbf{D} \textbf{D}^H \textbf{A}^H \textbf{H}^H \right)^{-1} \end{aligned} \end{aligned}$$
(10)

It can be shown that \(R = \log _2 |\textbf{E}^{-1}|\). By successive approximation of the objective function, the rate maximization problem for fully digital beamforming can be iteratively solved via WMMSE optimization [41, 42]. Similarly, for fixed approximation coefficients (weights), our corresponding HBF optimization problem can be written as

$$\begin{aligned} \begin{aligned}&\text{minimize}_{\textbf{A}, \textbf{D}, \textbf{M}} \textrm{tr}\, (\textbf{W} \textbf{E}) \hspace{5mm} \mathrm{s. t.} \hspace{.3cm} \textrm{tr}\,(\textbf{A} \textbf{D} \textbf{D}^{H} \textbf{A}^{H}) \le P \end{aligned} \end{aligned}$$
(11)

where \(\textbf{W} = \textrm{diag}\,(\,w_1\, w_2\, \ldots \, w_{N_s}\,)\) is the weight matrix,

$$\begin{aligned} w_i = e_i^{-1} \end{aligned}$$
(12)

is the weight of stream i, and

$$\begin{aligned} \begin{aligned} e_i&= \left( 1 + \textbf{v}_{i}^H \textbf{H}^H \left( \textbf{H} \sum \limits _{\begin{array}{c} l=1 \\ l\ne i \end{array}}^{N_s} \textbf{v}_l \textbf{v}_{l}^H \textbf{H}^H + N_0 \textbf{I} \right) ^{-1} \textbf{H} \textbf{v}_i \right) ^{-1} \end{aligned} \end{aligned}$$
(13)

is the error term corresponding to stream i. Since \(\sum _{i=1}^{N_s} R_i = \sum _{i=1}^{N_s} \log _2 |e_i^{-1}|\), we can write the WMMSE problem as

$$\begin{aligned} \begin{aligned}&\text{minimize}_{\{\textbf{a}_i\}, \{\textbf{d}_i\}, \{\textbf{m}_i\}} \sum _{i=1}^{N_s} w_i e_i \hspace{5mm} \mathrm{s. t.} \hspace{.3cm} \sum _{i=1}^{N_s} \textrm{tr}\,(\textbf{a}_i \textbf{d}_i \textbf{d}_i^{H} \textbf{a}_i^{H}) \le P. \end{aligned} \end{aligned}$$
(14)

In Sect. 4, we propose HBF algorithms to suboptimally solve the non-convex WMMSE problem using alternating optimization over the receive beamformer, digital precoder, and analog beamformer.

3.2 Sum Rate Maximization for MU-MISO

The MU-MISO optimization problem aims at maximizing the sum rate of the users with a maximum transmission power constraint. The sum rate maximization problem ca be written as

$$\begin{aligned} \begin{aligned}&\text{maximize}_{\textbf{A}, \{{\bar{\textbf{d}}}_k\}} \sum _{k=1}^{N_u} R_k \hspace{5mm} \mathrm{s. t.} \hspace{.3cm} \sum _{k=1}^{N_u} \textrm{tr}\,(\textbf{A} {{\bar{\textbf{d}}}}_k {\bar{\textbf{d}}}_k^{H} \textbf{A}^{H}) \le P. \end{aligned} \end{aligned}$$
(15)

Joint optimization of the digital precoders \(\{{{\bar{\textbf{d}}}}_k\}\) and the analog beamformer \(\textbf{A}\) is non-convex and cannot be optimally solved in its current form. In the following, we reformulate (15) as a WMMSE problem. Similar to SU-MIMO, this problem is still non-convex but it can be solved by using iterative alternating optimization. However, the solution is not guaranteed to be globally optimal.

The error term at the kth user is given by

$$\begin{aligned} \begin{aligned} e_k&={{\mathbb {E}}}\left[ \left( s_k - m_k y_k \right) \left( s_k - m_k y_k \right) ^H \right] \\ {}&= 1 - m_k \textbf{h}_k^H \textbf{A} {{\bar{\textbf{d}}}}_k - {{\bar{\textbf{d}}}}_k^H \textbf{A}^H \textbf{h}_k m_k^{*} + m_k \textbf{h}_k^H \textbf{A} \textbf{D} \textbf{D}^H \textbf{A}^H \textbf{h}_k m_k^{*} + m_k m_k^{*} N_0 \end{aligned} \end{aligned}$$
(16)

The rate optimal MMSE receiver of user k can be derived as

$$\begin{aligned} \begin{aligned} m_k&= {{\bar{\textbf{d}}}}_k^H \textbf{A}^H \textbf{h}_k (\textbf{h}_k^H \textbf{A} \textbf{D} \textbf{D}^H \textbf{A}^H \textbf{h}_k + N_0)^{-1}. \end{aligned} \end{aligned}$$
(17)

After applying the MMSE receiver to (16), the error term at the kth user is given by

$$\begin{aligned} \begin{aligned} e_k&= 1 - {{\bar{\textbf{d}}}}_k^H \textbf{A}^H \textbf{h}_k (\textbf{h}_k^H \textbf{A} \textbf{D} \textbf{D}^H \textbf{A}^H \textbf{h}_k + N_0)^{-1} \textbf{h}_k^H \textbf{A} {{\bar{\textbf{d}}}}_k \end{aligned} \end{aligned}$$
(18)

Similar to the SU-MIMO case, \(R_k = \log _2 |e_k^{-1}|\). For fixed approximation coefficients (weights), our corresponding HBF optimization problem can be written as

$$\begin{aligned} \begin{aligned}&\text{minimize}_{\textbf{A}, \{{\bar{\textbf{d}}}_k\}} \sum _{k=1}^{N_u} w_k e_k \hspace{5mm} \mathrm{s. t.} \hspace{.3cm} \sum _{k=1}^{N_u} \textrm{tr}\,(\textbf{A} {{\bar{\textbf{d}}}}_k {{\bar{\textbf{d}}}}_k^{H} \textbf{A}^{H}) \le P \end{aligned} \end{aligned}$$
(19)

where

$$\begin{aligned} w_k = e_k^{-1} \end{aligned}$$
(20)

is the weight of user k. In Sect. 5, we propose HBF algorithms to suboptimally solve the non-convex WMMSE problem using alternating optimization over the digital precoder and the analog beamformer.

4 HBF Algorithms for SU-MIMO

In this section, three HBF algorithms are proposed for SU-MIMO systems with partially connected RF architecture. One algorithm is developed for full-array based hybrid design and two for sub-array-based processing strategy. The derivations of these algorithms are presented in the following subsections.

4.1 Full Array-Based Hybrid WMMSE

The aim of this algorithm is to solve the rate maximization problem via an equivalent reformulation of weighted MSE minimization. By applying an iterative alternating optimization method, the WMMSE problem can be solved so that the objective value converges with a desired accuracy. This algorithm consists of three main steps. First, (11) is solved with respect to the receive beamformer \(\textbf{M}\) while the digital and analog beamformers are fixed. Then, the analog beamformer \(\textbf{A}\) and the receive beamformer \(\textbf{M}\) are kept fixed and (11) is solved for the digital precoder \(\textbf{D}\). Last step is to optimize the analog beamformer \(\textbf{A}\) while keeping the other two variables fixed. Each of these optimization steps is convex since the objective value and constraint become convex when fixing the other variables. Further details on the convexity of each step in the alternating optimization method can be found in [41, 42]. In the following, the hybrid WMMSE algorithm is described in detail.

Problem (11) is convex with respect to the receive beamformer \(\textbf{M}\). The Lagrangian expression of (11) is given by

$$\begin{aligned} \mathcal {L} = \textrm{tr}\, (\textbf{W} \textbf{E}) + \alpha (\textrm{tr}\,(\textbf{A} \textbf{D} \textbf{D}^{H} \textbf{A}^{H}) - P) \end{aligned}$$
(21)

where \(\alpha\) is the Lagrange multiplier. Using the error equation in (8) and placing it into (21) results in:

$$\begin{aligned} \begin{aligned} \mathcal {L}&= \textrm{tr}\, (\textbf{W}) - \textrm{tr}\, (\textbf{W} \textbf{M}^H \textbf{H} \textbf{A} \textbf{D}) - \textrm{tr}\, (\textbf{W} \textbf{D}^H \textbf{A}^H \textbf{H}^H \textbf{M}) + N_0 \textrm{tr}\, (\textbf{W} \textbf{M}^H \textbf{M})\\ {}&\hspace{5mm} + \textrm{tr}\, (\textbf{W} \textbf{M}^H \textbf{H} \textbf{A} \textbf{D} \textbf{D}^H \textbf{A}^H \textbf{H}^H \textbf{M}) + \alpha (\textrm{tr}\,(\textbf{A} \textbf{D} \textbf{D}^{H} \textbf{A}^{H}) - P) \end{aligned} \end{aligned}$$
(22)

The first order optimality condition with fixed digital and analog beamformers yields the MMSE receive beamformer \(\textbf{M}\) in (9). The next step is to solve (11) for \(\textbf{D}\). The first order optimality with respect to D while other variable are fixed can be written as:

$$\begin{aligned} {\nabla }_\textbf{D} \mathcal{L}= - 2 \textbf{A}^H \textbf{H}^H \textbf{M} \textbf{W} + 2 \textbf{A}^H \textbf{H}^H \textbf{MWM}^H \textbf{H} \textbf{A} \textbf{D} + 2 \alpha \textbf{A}^{H} \textbf{A} \textbf{D} \end{aligned}$$
(23)

The resulting expression from the first order optimality condition is

$$\begin{aligned} \textbf{D} = \left( \textbf{A}^H \textbf{H}^H \textbf{M} \textbf{W} \textbf{M}^H \textbf{H} \textbf{A} + \alpha \textbf{A}^H \textbf{A}\right) ^{-1} \textbf{A}^H \textbf{H}^H \textbf{M} \textbf{W} \end{aligned}$$
(24)

where \(\alpha \ge 0\) is chosen such that the transmit power constraint is satisfied. If \(\alpha = 0\) satisfies the transmit power constraint, the solution is ready. Otherwise \(\alpha > 0\) can be found using one dimensional search techniques such as the bisection method. In the next step, we fix the digital precoder and the receive beamformer and solve (14) with respect to the analog beamformers for different subarrays. Rewriting the received signal as

$$\begin{aligned} \begin{aligned} \textbf{y}&= \sum _{j=1}^{N_a} \textbf{H}_j \textbf{a}_j \textbf{d}_{j} \textbf{s} + \textbf{z} \end{aligned} \end{aligned}$$
(25)

yields the error term corresponding with stream i as

$$\begin{aligned} \begin{aligned} e_i&={{\mathbb {E}}}\left[ \left( s_i - \textbf{m}_i^H \textbf{y} \right) \left( s_i - \textbf{m}_i^H \textbf{y} \right) ^H \right] \\ {}&= 1 - \textbf{m}_i^H \sum _{j=1}^{N_a} \textbf{H}_j \textbf{a}_j d_{ji} - \sum _{j=1}^{N_a} d_{ji}^{*} \textbf{a}_j^H \textbf{H}_j^H \textbf{m}_i\\ {}&\hspace{7mm} + \textbf{m}_i^H \sum _{j=1}^{N_a} \textbf{H}_j \textbf{a}_j \textbf{d}_j \sum _{l=1}^{N_a} \textbf{d}_{l}^H \textbf{a}_l^H \textbf{H}_l^H \textbf{m}_i + N_0 \textbf{m}_i^H \textbf{m}_i. \end{aligned} \end{aligned}$$
(26)

where sub-channel \(\textbf{H}_j \in {{\mathbb {C}}}^{N_r \times n}\) is the matrix of complex channel gains between transmit antennas of the jth subarray and \(N_r\) receive antennas. The channel matrix \(\textbf{H}\) can be written as \(\textbf{H} = (\,\textbf{H}_1 \; \textbf{H}_2 \; \ldots \; \textbf{H}_{N_a}\,)\). Consequently, the Lagrangian corresponding to (14) can be expressed as

$$\begin{aligned} \begin{aligned} \mathcal {L}&= \sum _{i=1}^{N_s} w_i e_i + \alpha (\sum _{i=1}^{N_s} \textrm{tr}\,(\textbf{a}_i \textbf{d}_i \textbf{d}_i^{H} \textbf{a}_i^{H}) - P). \end{aligned} \end{aligned}$$
(27)

Similar to the steps to derive the digital precoder \(\textbf{D}\), we use the error term corresponding with stream i in (26) and place it into (27). Then the first order optimality condition with respect to \(\textbf{a}_i\) yields

$$\begin{aligned} \textbf{a}_i = (\textbf{H}_i^H \textbf{M} \textbf{W} \textbf{M}^H \textbf{H}_i + \alpha \textbf{I}_n)^{-1} \textbf{H}_i^H \bigg ( \sum _{j=1}^{N_s} \textbf{m}_j w_j d_{ij}^{*} - \textbf{M} \textbf{W} \textbf{M}^H \sum \limits _{\begin{array}{c} l=1 , l\ne i \end{array}}^{N_s} \textbf{H}_l \textbf{a}_l \textbf{d}_l \textbf{d}_i^H \bigg ) (\textbf{d}_i \textbf{d}_i^H)^{-1} \end{aligned}$$
(28)

in which \(\alpha \ge 0\) is chosen via the bisection method such that the transmit power constraint is satisfied. This procedure of alternating optimization continues until a desired level of convergence is achieved. The proposed algorithm converges in terms of objective value since solving a convex problem at each step improves the objective value and the resulting MSE is lower bounded [42]. The optimality of the solution cannot be guaranteed due to the non-convexity of the original problem. Hence, the solution has to be treated as suboptimal unless otherwise proven. Algorithm 1 summarizes the proposed full array-based hybrid WMMSE approach.

Algorithm 1
figure a

Full Array-Based Hybrid WMMSE Algorithm

4.2 Subarray-Based Hybrid WMMSE

The sub-array-based WMMSE method is a simplified version of Algorithm 1, since the digital beamformer is fixed as an identity matrix. Now the weighted MSE minimization problem in (11) can be solved by alternating between optimization of the receive beamformer \(\textbf{M}\) and the analog beamformer \(\textbf{A}\). In the first step, the problem is solved with respect to the receive beamformer \(\textbf{M}\). The first order optimality condition of (11) with respect to \(\textbf{M}\) yields the same MMSE receive beamformer as in (9). Then, the problem is solved for the analog beamformer \(\textbf{A}\). Using the stream specific MSE expressions, the corresponding Lagrangian is given by

$$\begin{aligned} \begin{aligned} \mathcal {L} =&\sum _{i=1}^{N_s} w_i e_i + \alpha (\sum _{i=1}^{N_s} \textrm{tr}\,(\textbf{a}_i \textbf{a}_i^{H}) - P). \end{aligned} \end{aligned}$$
(29)

The first order optimality condition of \(\mathcal {L}\) with respect to each \(\textbf{a}_i\) yields

$$\begin{aligned} \textbf{a}_j = (\textbf{H}_j^H \textbf{M} \textbf{W} \textbf{M}^H \textbf{H}_j + \alpha \textbf{I}_n)^{-1} \textbf{H}_j^H \textbf{m}_j w_j \end{aligned}$$
(30)

where \(\alpha \ge 0\) is chosen via the bisection method while satisfying the transmit power constraint. This alternating optimization procedure is repeated until a desired level of convergence is obtained. The developed subarray-based hybrid WMMSE approach is summarized in Algorithm 2. The convergence and suboptimality properties of Algorithm 1 apply also for Algorithm 2.

4.3 Subarray-Based Transmit-Receive ZF

Transmit-receive ZF aims at nulling the interference between sub-arrays by using ZF beamforming at the transmitter and receiver. Due to zero interference, beamforming and power allocation can be separated. Hence, the well-known water-filling method is employed to allocate per-stream powers. A detailed description of the transmit-receive ZF algorithm is given next.

Algorithm 2
figure b

Subarray-Based Hybrid WMMSE Algorithm

In this algorithm, the overall combiner \(\textbf{M}\) is first initialized with \(N_s\) left singular vectors of the SVD of channel \(\textbf{H}\). Then ZF precoder is calculated seperately for each subarray considering the effective channel \(\textbf{M}^H \textbf{H}_i\) as

$$\begin{aligned} \textbf{V}_i = \textbf{H}_i^{H} \textbf{M} (\textbf{M}^{H} \textbf{H}_i \textbf{H}_i^{H} \textbf{M})^{-1} \end{aligned}$$
(31)

where \(\textbf{V}_i \in {{\mathbb {C}}}^{n \times N_s}\) is the overall precoder corresponding to the ith subarray, and \(\textbf{H}_i \in {{\mathbb {C}}}^{M \times n}\) is the channel matrix between the ith subarray and the user. Since sub-array-based processing is used, the ith column of \(\textbf{V}_i\) is picked and normalized as the precoder of the ith subarray. The normalized per-sub-array precoders are stacked into \({{{\bar{\textbf{V}}}}} \in {{\mathbb {C}}}^{N \times N_s}\), where the entries of other sub-arrays than i are set to zeros in the ith column. The next step is to calculate ZF combiner with effective channel \(\textbf{H}\bar{\textbf{V}}\) as

$$\begin{aligned} \textbf{M}^{H} = (\bar{\textbf{V}}^{H} \textbf{H}^{ H} \textbf{H}\bar{\textbf{V}})^{-1} \bar{\textbf{V}}^{H} \textbf{H}^{H} \end{aligned}$$
(32)

and normalize all columns of \(\textbf{M}\), one by one. Transmit-receive ZF process is repeated until a desired level of convergence is achieved in terms of rate.

To maximize the rate, waterfilling (WF) algorithm is employed to allocate power \(P_i\) to the ith stream as

$$\begin{aligned} P_i = \left(\mu - \frac{N_0}{\epsilon _i}\right)^{+} \end{aligned}$$
(33)

where \(\mu\) is the water level constant, and the channel gain \(\epsilon _i\) for the ith stream can be calculated as

$$\begin{aligned} \epsilon _i = |\textbf{w}_{i}^{H} \textbf{H} {{{\bar{\textbf{v}}}}}_{i}|^2 \end{aligned}$$
(34)

where \({{{\bar{\textbf{v}}}}}_{i}\) and \(\textbf{w}_{i}\) are the ith column of the normalized overall precoder and the digital combiner, respectively. The total transmit power is limited by \(\sum _{i=1}^{N_s} P_i = P\). The overall hybrid precoding matrix is formed as \({\textbf{V}} = \sqrt{\textbf{P}} {{{\bar{{\textbf{V}}}}}}\) where \(\textbf{P} = \textrm{diag}(P_{1}, P_{2}, \ldots , P_{N_s})\). The overall transmit-receive ZF method is summarized in Algorithm 3.

Algorithm 3
figure c

Subarray-Based Transmit-Receive ZF Algorithm

5 HBF Algorithms for MU-MISO

In this section, three partially connected HBF algorithms are developed for MU-MISO systems. The algorithms are designed for both full- and sub-array-based processing strategies. In the following, each algorithm is introduced in detail.

5.1 Full Array-Based Hybrid WMMSE

This algorithm solves the original sum rate maximization problem in by using alternating optimization on the equivalent problem of weighted MSE minimization so that the objective value converges. In this algorithm, first, (19) is solved with respect to the digital precoder \(\{{{\bar{\textbf{d}}}}_k\}\) while the analog beamformer is fixed. Then, the analog beamformer \(\textbf{A}\) is optimized while keeping the digital precoder fixed. In the following, the MU-MISO hybrid WMMSE algorithm is described in detail.

Problem (19) is convex with respect to the digital precoder \(\{{{\bar{\textbf{d}}}}_k\}\). The Lagrangian expression of (19) is given by

$$\begin{aligned} \begin{aligned} \mathcal {L}&= \sum _{k=1}^{N_u} w_k - \sum _{k=1}^{N_u} w_k m_k \textbf{h}_k^H \textbf{A} {{\bar{\textbf{d}}}}_k - \sum _{k=1}^{N_u} w_k {{\bar{\textbf{d}}}}_k^H \textbf{A}^H \textbf{h}_k m_k^{*}\\ {}&\hspace{7mm} + \sum _{k=1}^{N_u} w_k m_k \textbf{h}_k^H \textbf{A} \sum _{j=1}^{N_s} {{\bar{\textbf{d}}}}_j {{\bar{\textbf{d}}}}_j^H \textbf{A}^H \textbf{h}_k m_k^{*}\\ {}&\hspace{7mm} + \sum _{k=1}^{N_u} w_k m_k m_k^{*} N_0 + \alpha \left( \sum _{j=1}^{N_s} \textrm{tr}\,\left( \textbf{A} {{\bar{\textbf{d}}}}_j {{\bar{\textbf{d}}}}_j^{H} \textbf{A}^{H}\right) - P\right) \end{aligned} \end{aligned}$$
(35)

The resulting expression from the first order optimality condition is

$$\begin{aligned} {{\bar{\textbf{d}}}}_k = \left( \textbf{A}^H \sum _{j=1}^{K} \textbf{h}_j m_j^{*} w_j m_j \textbf{h}_j^H \textbf{A} + \alpha \textbf{A}^H \textbf{A}\right) ^{-1} \textbf{A}^H \textbf{h}_k m_k^{*} w_k \end{aligned}$$
(36)

where \(\alpha \ge 0\) is chosen such that the transmit power constraint is satisfied and can be found using the bisection method. In the next step, we fix the digital precoder and solve (19) with respect to the analog beamformers for different subarrays. The received signal of the kth user can be rewritten as

$$\begin{aligned} \begin{aligned} y_k&= \sum \limits _{\begin{array}{c} j=1 \end{array}}^{N_s} \textbf{h}_{jk}^{H} \textbf{a}_j d_{jk} s_{k} + \sum \limits _{\begin{array}{c} j=1 \end{array}}^{N_s} \textbf{h}_{jk}^{H} \textbf{a}_j \sum \limits _{\begin{array}{c} i=1 \\ i \ne k \end{array}}^{N_u} d_{ji} s_{i} + z_k \end{aligned} \end{aligned}$$
(37)

and the error term corresponding to the kth user as

$$\begin{aligned} \begin{aligned} e_k&={{\mathbb {E}}}\left[ \left( s_k - m_k y_k \right) \left( s_k - m_k y_k \right) ^H \right] \\ {}&= 1 - m_k \sum _{j=1}^{N_s} \textbf{h}_{jk}^H \textbf{a}_j d_{jk} - \sum _{j=1}^{N_s} d_{jk}^{*} \textbf{a}_j^H \textbf{h}_{jk} m_k^{*}\\ {}&\hspace{7mm} + m_k \sum _{j=1}^{N_a} \textbf{h}_{jk}^H \textbf{a}_j \textbf{d}_j \sum _{l=1}^{N_s} \textbf{d}_l^{H} \textbf{a}_l^H \textbf{h}_{lk} m_k^{*} + N_0 m_k m_k^{*} \end{aligned} \end{aligned}$$
(38)

where \(\textbf{h}_{ik}\) is the channel vector between ith subarray and user k. Consequently, the Lagrangian expression corresponding to (19) can be expressed as

$$\begin{aligned} \begin{aligned} \mathcal {L}&= \sum _{k=1}^{N_u} w_k - \sum _{k=1}^{N_u} w_k m_k \sum _{j=1}^{N_s} \textbf{h}_{jk}^H \textbf{a}_j d_{jk}\\ {}&\hspace{7mm} - \sum _{k=1}^{N_u} w_k \sum _{j=1}^{N_s} d_{jk}^{*} \textbf{a}_j^H \textbf{h}_{jk} m_k^{*} + \sum _{k=1}^{N_u} w_k N_0 m_k m_k^{*}\\ {}&\hspace{7mm} + \sum _{k=1}^{N_u} w_k m_k \sum _{j=1}^{N_a} \textbf{h}_{jk}^H \textbf{a}_j \textbf{d}_j \sum _{l=1}^{N_s} \textbf{d}_l^{H} \textbf{a}_l^H \textbf{h}_{lk} m_k^{*}\\ {}&\hspace{7mm} + \alpha (\sum _{j=1}^{N_a} \textrm{tr}\,(\textbf{a}_j \textbf{d}_j \textbf{d}_j^{H} \textbf{a}_j^{H}) - P). \end{aligned} \end{aligned}$$
(39)

The first order optimality condition with respect to \(\textbf{a}_i\) yields

$$\begin{aligned} \begin{aligned} \textbf{a}_i&= (\sum _{k=1}^{N_u} \textbf{h}_{ik} m_k^{*} w_k m_k \textbf{h}_{ik}^H + \alpha \textbf{I}_n)^{-1} \sum _{k=1}^{N_u} \textbf{h}_{ik} m_k^{*} w_k \bigg ( d_{ik}^{*} - m_k \sum \limits _{\begin{array}{c} j=1, j\ne i \end{array}}^{N_s} \textbf{h}_{jk}^H \textbf{a}_j \textbf{d}_j \textbf{d}_i^H \bigg ) (\textbf{d}_i \textbf{d}_i^H)^{-1} \end{aligned} \end{aligned}$$
(40)

in which \(\alpha \ge 0\) is chosen via the bisection method such that the transmit power constraint is satisfied. This alternating optimization procedure is continued until achieving a desired level of convergence. The proposed algorithm converges in terms of objective value since solving a convex problem at each step improves the objective value and the resulting MSE is lower bounded [42]. Due to the non-convexity of the original problem, optimality of the solution cannot be guaranteed. Hence, the solution has to be considered as suboptimal unless otherwise proven. Algorithm 4 summarizes the proposed full array-based hybrid WMMSE approach.

Algorithm 4
figure d

Full Array-Based Hybrid WMMSE Algorithm

5.2 Subarray-Based Hybrid WMMSE

The sub-array-based WMMSE scheme is a reduced version of Algorithm 4 due to the digital beamformer being fixed as an identity matrix. In this case, the multi-user WMMSE problem in (19) can be solved for the analog beamformer \(\{\textbf{a}_j\}\). The received signal of the kth user can be rewritten as

$$\begin{aligned} \begin{aligned} y_k&= \textbf{h}_{kk}^{H} \textbf{a}_k s_{k} + \sum \limits _{\begin{array}{c} i=1, i \ne k \end{array}}^{N_s} \textbf{h}_{ik}^{H} \textbf{a}_i s_{i} + z_k \end{aligned} \end{aligned}$$
(41)

and the error term corresponding to the kth user as

$$\begin{aligned} \begin{aligned} e_k&={{\mathbb {E}}}\left[ \left( s_k - m_k y_k \right) \left( s_k - m_k y_k \right) ^H \right] \\ {}&= 1 - m_k \textbf{h}_{kk}^H \textbf{a}_k - \textbf{a}_k^H \textbf{h}_{kk} m_k^{*}\\ {}&\hspace{7mm} + \sum \limits _{\begin{array}{c} i=1 \end{array}}^{N_s} m_k \textbf{h}_{ik}^{H} \textbf{a}_i \textbf{a}_i^{H} \textbf{h}_{ik} m_k^{*} + N_0 m_k m_k^{*} \end{aligned} \end{aligned}$$
(42)

where \(\textbf{h}_{ik}\) is the channel vector between ith subarray and user k. Using the stream specific MSE expressions, the corresponding Lagrangian expression is given by

$$\begin{aligned} \begin{aligned} \mathcal {L} =&\sum _{k=1}^{N_s} w_k - \sum _{k=1}^{N_s} m_k w_k \textbf{h}_{kk}^H \textbf{a}_k - \sum _{k=1}^{N_s} w_k \textbf{a}_k^H \textbf{h}_{kk} m_k^{*}\\ {}&\hspace{7mm} + \sum _{k=1}^{N_s} w_k m_k \sum \limits _{\begin{array}{c} i=1 \end{array}}^{N_s} \textbf{h}_{ik}^{H} \textbf{a}_i \textbf{a}_i^{H} \textbf{h}_{ik} m_k^{*}\\ {}&\hspace{7mm} + N_0 \sum _{k=1}^{N_s} w_k m_k m_k^{*} + \alpha (\sum _{i=1}^{N_s} \textrm{tr}\,(\textbf{a}_i \textbf{a}_i^{H}) - P). \end{aligned} \end{aligned}$$
(43)

Solving this for the kth user receiver \(m_k\) result in

$$\begin{aligned} m_k = \textbf{a}_k^H \textbf{h}_{kk} (\sum _{i=1}^{N_a} \textbf{h}_{ik}^H \textbf{a}_i \textbf{a}_i^H \textbf{h}_{ik} + N_0)^{-1}. \end{aligned}$$
(44)

After applying the MMSE receiver, the error term at the kth user is given by

$$\begin{aligned} \begin{aligned} e_k&= 1 - \textbf{a}_k^H \textbf{h}_{kk} (\sum _{i=1}^{N_s} \textbf{h}_{ik}^H \textbf{a}_i \textbf{a}_i^H \textbf{h}_{ik} + N_0)^{-1} \textbf{h}_{kk}^H \textbf{a}_k \end{aligned} \end{aligned}$$
(45)

The first order optimality condition of \(\mathcal {L}\) with respect to each \(\textbf{a}_i\) yields

$$\begin{aligned} \textbf{a}_i = \left(\sum _{k=1}^{K} \textbf{h}_{ik} m_k^{*} w_k m_k \textbf{h}_{ik}^H + \alpha \textbf{I}_n\right)^{-1} \textbf{h}_{ii} m_i^{*} w_i \end{aligned}$$
(46)

where \(\alpha \ge 0\) is chosen via the bisection method while satisfying the transmit power constraint. This alternating optimization procedure is repeated until a desired level of convergence is obtained. The developed subarray-based hybrid WMMSE approach is summarized in Algorithm 5. The convergence and suboptimality properties of Algorithm 4 apply also for Algorithm 5.

Algorithm 5
figure e

Subarray-Based Hybrid WMMSE Algorithm

5.3 Subarray-Based ZF

In the subarray-based ZF method, each subarray uses ZF beamformer to its corresponding user while nulling the interference to the other users. To cancel the interference successfully in this method the number of antennas of each subarray is required to be more than or equal to the number of users. Stacking the channel vectors, the normalized subarray-based ZF beamformer can be formed as

$$\begin{aligned} {{{\bar{\textbf{V}}}}} =\left( \begin{array}{cccc} \frac{[{\textbf{V}}_1]_1}{\left\| [\textbf{V}_1]_1\right\| } &{} \textbf{0} &{} \ldots &{} \textbf{0}\\ \textbf{0} &{} \frac{[\textbf{V}_2]_2}{\left\| [{\textbf{V}}_2]_2\right\| } &{} \ldots &{} \textbf{0}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \textbf{0} &{} \textbf{0} &{} \ldots &{} \frac{[{\textbf{V}}_K]_K}{\left\| [\textbf{V}_K]_K\right\| } \end{array} \right) \end{aligned}$$
(47)

where \({\textbf{V}}_k = {\textbf{H}}_k^H ({\textbf{H}}_k {\textbf{H}}_k^H)^{-1}\). The matrix \({\textbf{H}}_k \in {{\mathbb {C}}}^{n \times K}\) is the channel between kth subarray and all K users, and \([.]_k\) denotes the kth column of its matrix argument. Then the overall beamformer is formed as \({\textbf{V}} = \sqrt{\textbf{P}} {{{\bar{{\textbf{V}}}}}}\) in which WF power allocation is used to form \(\textbf{P}\). The complexity of this algorithm is considerably less than previous WMMSE based hybrid algorithms.

6 Simulation Results

In this section, the performance of the proposed HBF algorithms is evaluated via a set of numerical examples in SU-MIMO and MU-MISO scenarios. The propagation channel between the BS and each user is constructed using two different channel models. The first one is a geometric uniform linear array (ULA) antenna configuration, also known as Saleh-Valenzuela model [43]. The second one is a geometry-based stochastic channel model, called the New York University Simulator (NYUSIM) [44, 45], which is based on channel measurements at mm-wave frequencies. In the SU-MIMO scenario, the HBF algorithms are compared to optimal fully digital and analog beamforming solutions, i.e., multi- and single-stream SVD-based approaches, respectively. In the MU-MISO case, the proposed HBF schemes are compared to fully digital WMMSE, fully digital ZF, and fully connected hybrid ZF in [36] algorithms. The ZF algorithm in [36] was chosen since it outperforms other fully connected hybrid ZF algorithms (with analog phase control only) in the literature, as shown in [36]. In the simulations, we use a relatively large number of transmit antennas to reflect the nature of a massive MIMO type of setting and the communication channels are considered as frequency flat. System performance is measured in terms of achievable rate (i.e., spectral efficiency). The rate performance is averaged over 100 channel realizations.

6.1 Performance Evaluation in Geometric ULA Channel

This subsection provides a performance study of the HBF methods in a geometric channel model with L propagation paths and ULA antenna configuration. A detailed description of the channel model is given first. The convergence behavior of the hybrid WMMSE algorithms is studied next. Then, all the proposed HBF methods are compared to digital and analog beamforming in SU-MIMO and MU-MISO scenarios with various parameter settings.

The MIMO channel matrix corresponding to the geometric ULA is expressed as

$$\begin{aligned} \textbf{H} = \sqrt{\frac{N_t N_r}{L}} \sum _{l=1}^{L} \alpha _l \textbf{g}_r (\phi _r^l) \textbf{g}_t (\phi _t^l)^H \end{aligned}$$
(48)

where L is the number of paths between the BS and the user, \(\alpha _l \sim \mathcal{C}\mathcal{N}(0,1)\) is the path gain for the lth path, \(\phi _r^l \in [0, 2\pi )\) and \(\phi _t^l \in [0, 2\pi )\) are the angles of arrival and departure for the lth path, respectively. Moreover, \(\textbf{g}_r (\phi _r^l)\) and \(\textbf{g}_t (\phi _t^l)\) are the receive and transmit antenna array response vectors, respectively. These array response vectors are functions of angles of The ULA is considered to have \(L=5\) paths with uniform distribution for both angles of departure and arrival, and the antenna spacing equal to half a wavelength. Moreover, the channel is assumed to be frequency flat with normalized path loss. The array response vector of an \(N_t\)-element ULA can be written as:

$$\begin{aligned} \textbf{g} (\phi ) = \frac{1}{\sqrt{N_t}} \left( 1,\, e^{jkd\,sin(\phi )},\, ...,\, e^{j(N_t -1)kd\,sin(\phi )} \right) ^T \end{aligned}$$
(49)

where \(k=2\pi f\) and d is the antenna inter-element spacing. In this paper, the carrier frequency is considered to be \(f=28\, \textrm{GHz}\) to reflect the mm-wave communications specification.

Fig. 4
figure 4

Convergence behavior of WMMSE algorithms at 5 dB SNR

Figure 4 shows the convergence behavior of the hybrid WMMSE algorithms in ULA channel model. It can be seen that the convergence rate of the algorithms in both SU and MU cases is relatively fast at the first tens of iterations. The subarray based algorithms approximately converge after only tens of iterations. However, the convergence speed of the full-array based algorithms become slow closer to the convergence value.

6.1.1 SU-MIMO

Figure 5 illustrates the average rate versus SNR in geometric ULA channel model. The numbers of transmit antennas, receive antennas, and data streams are considered to be 64, 4, and 4, respectively. As it can be seen, full-array based hybrid WMMSE algorithm achieves the best rate performance among all other hybrid algorithms all over the SNR range. At lower SNRs, its performance is close to the optimal fully digital solution while a moderate gap appears in between them at higher SNRs. This gap is mainly due to the partially connected hybrid architecture with lower analog RF chains and consequently lower array gains. Subarray based HBF algorithm’s rate performance is slightly lower than its full-array based counterpart. Transmit-receive ZF algorithm performs slightly lower than subarray based hybrid scheme all over SNR regime. However, the performance of Transmit-receive ZF algorithm is lower than full-array based hybrid scheme with a moderate gap which is mainly due to subarray based design. Fully analog (one stream) scheme has the lowest performance in all over the SNR range except in very low SNRs in which transmit-receive ZF, and subarray based hybrid algorithms drop below fully analog scheme.

Fig. 5
figure 5

Rate vs. SNR in SU-MIMO system with \(N_t = 64\), \(N_r = 4\), \(N_s = 4\)

Figure 6 presents the average rate versus number of transmit antennas. The numbers of receive antennas and data streams are considered to be 4 and the performance is evaluated at SNR=5 dB. One can see that the rate performance of the full-array based hybrid WMMSE design is superior to all other hybrid algorithms, but inferior to fully digital solution all over the number of transmit antenna range. The second best among hybrid algorithms is subarray based hybrid WMMSE scheme and the last is transmit-receive ZF design, each with a small gap from the curve with better performance.

Fig. 6
figure 6

Rate vs. number of transmit antennas in SU-MIMO system with \(N_r = 4\), \(N_s = 4\) at \(\textrm{SNR}=5\) dB

6.1.2 MU-MISO

Figure 7 shows the average sum rate versus SNR in for MU-MISO algorithms in geometric ULA channel model. The numbers of transmit antennas and users are considered to be 64 and 4, respectively. One can see that the performance of full-array based hybrid WMMSE algorithm is the best among all other hybrid algorithms all over the SNR range. There is a gap between the performance of full array based hybrid scheme and fully digital WMMSE solution which is due to the partially connected design. Fully digital ZF solution has a lower performance compared to fully digital MMSE which is due to the fundamental difference between WMMSE and ZF schemes where WMMSE design can exploit the interference to achieve a better rate performance than ZF. Our full array based WMMSE solution outperforms it for at least 2 dB (up to 4 dB at some SNRs). Subarray based hybrid WMMSE and fully digital ZF algorithms perform approximately the same all over the SNR range. The performance of the fully connected ZF algorithm in [36] is lower than our subarray based WMMSE algorithm by 2–3 dB. Subarray based ZF performance is inferior to all other algorithms which is due to both partial connectivity and ZF nature.

Fig. 7
figure 7

Sum rate vs. SNR in MU-MISO system with \(N_t = 64\), \(N_u = 4\)

Fig. 8
figure 8

Sum rate vs. number of transmit antennas in MU-MISO system with \(N_u = 4\), \(\textrm{SNR}=5\) dB

Figure 8 illustrates the average sum rate versus number of transmit antennas in MU-MISO system. The numbers of users is considered to be 4 and the performance is evaluated at 5 dB SNR. It can be seen that the full-array based hybrid WMMSE design performs better than all other hybrid algorithms, but inferior to fully digital WMMSE solution all over the number of transmit antenna range. In higher numbers of transmit antennas, the sum rate increment pace of all the algorithms gets slower and comes to a saturation. Subarray based hybrid WMMSE and fully digital ZF designs have almost identical performance except all over the range of number of transmit antennas. Subarray based ZF algorithm is inferior to all hybrid algorithms with a considerable gap.

Fig. 9
figure 9

Sum rate vs. number of users in MU-MISO system with \(N_t = 120\), \(\textrm{SNR}=5\) dB

In Fig. 9 the average sum rate performance is presented against number of users. The numbers of transmit antennas are set to 120, and the evaluation is at SNR=5 dB. Similar to the results in [46], As the number of users increases, Average sum rate of fully digital WMMSE rises linearly. In all numbers of users, full-array based hybrid WMMSE algorithm achieves the best performance among all hybrid schemes. The curve slope of the full-array based hybrid WMMSE is decreasing compared to fully digital WMMSE scheme. Subarray based hybrid WMMSE and fully digital ZF schemes have identical performance in lower numbers of users. However, fully digital ZF performs slightly better than Subarray based hybrid WMMSE as the number of users increases. The performance of the subarray-based ZF scheme experiences a substantial decrease as the number of users increases. This decrease can be attributed to the inherent limitations of the subarray-based architecture and the ZF algorithm, particularly when the number of antennas in a subarray is restricted for interference cancellation. The performance of the fully connected ZF algorithm in [36] is lower than both full array- and subarray-based WMMSE algorithms when the number of users is small (i.e. \(N_u \le 6\)). However, as the number of users increases, its performance gets close to full array-based WMMSE solution and goes beyond when \(N_u > 10\).

Fig. 10
figure 10

Energy Efficiency vs. number of users in MU-MISO system with \(N_t = 120\), \(\textrm{SNR}=5\) dB

Figure 10 shows the comparison of energy efficiency against the number of users \(N_u\). We use the energy efficiency definition in [47] as \(\eta = R/(P_T + P_H)\), where \(\eta\) is the energy efficiency, R is the achievable sum rate, \(P_T\) is the transmission power, which is set as \(P_T= 5\) W, \(P_H\) is the power consumed by hardware architecture. For fully connected hybrid architecture as shown in Fig. 1a, we have \(P_H = N_t P_A + N_a N_t P_{PS} + N_a P_{SP} + N_t P_{CO} + N_a P_{RF}\), where \(P_A\), \(P_{PS}\), \(P_{SP}\), \(P_{CO}\), and \(P_{RF}\) are the power consumed by amplifier, phase shifter, power splitter, power combiner, and RF chain, respectively. For partially connected hybrid architecture as shown in Fig. 1b, we have \(P_H = N_t P_A + N_a N_t P_{SW} + N_t P_{SP} + N_t P_{CO} + N_a P_{RF}\), where \(P_{SW}\) is the power consumed by switches. Finally, for fully digital precoding, we have \(P_H = N_t P_A + N_t P_{RF}\). It’s important to note that the power consumption of RF modules typically exhibits significant variability, and this variability is contingent on the specific implementation type and performance criteria [47]. In this paper, we have chosen the conservative (high) values, so the results depicted in Fig. 10 serve as practical lower bounds for energy efficiencies. The above values are adopted from [47] as \(P_A = 20\) mW, \(P_{SP} = P_{CO} = 10\) mW, \(P_{RF} = 250\) mW, \(P_{PS} = 30\) mW, and \(P_{SW} = 5\) mW. Figure 10 shows that the proposed full array-based WMMSE algorithm achieves the best energy efficiency and has a steady performance for all numbers of users. Both full array- and subarray-based WMMSE algorithms can achieve much higher energy efficiency than fully digital WMMSE and ZF solutions when the number of users is not large (e.g., \(Ns \le 8\)). However, when \(Ns > 8\), the performance of subarray-based WMMSE algorithm drops below fully digital WMMSE approach. This can be due to the fact that as \(N_u\) increases, the required number of phase shifters and power amplifiers increase rapidly. The algorithm in [36] has very similar energy efficiency performance to the fully digital ZF solution and they are both much lower than our proposed full array- and subarray-based WMMSE algorithms. The performance of Subarray-based ZF algorithm is higher than fully digital ZF solution and the fully connected ZF algorithm in [36] when \(N_u < 6\). However it drops below as the number of users increases.

6.2 Performance Evaluation in NYUSIM Channel model

This section evaluates the performance of the HBF schemes in the NYUSIM channel model using an urban macrocell (UMa) environment with line-of-sight (LOS) and non-line-of-sight (NLOS) propagation conditions. Various settings of parameters are used for SU-MIMO and MU-MISO scenarios. Before the performance analysis, the NYUSIM channel model with its main parameters is introduced.

NYUSIM is a channel simulator developed by New York University which is based on measurements and analysis of the data obtained from various outdoor environments at frequencies from 28 to 73 GHz [44, 45]. This simulator provides an accurate rendering of both time and space actual channel impulse responses. NYUSIM also provides realistic measured signal levels that can be utilized in realistic physical layer and link layer simulations. NYUSIM is applicable in carrier frequencies from 500 MHz to 100 GHz, and RF bandwidths from 0 (continuous wave) to 800 MHz.

The parameters which are used to evaluate the performance in NYUSIM environment are as follows. The center frequency of 28 GHz is used and total transmit power is 10 W. UMa environment in both outdoor LOS and NLOS scenarios are considered. The transmit and receive antenna array types are considered to be uniform rectangular array (URA) and uniform linear array (ULA). The pathloss model is based on the close-in free space reference distance (CI) with a 1 m anchor point. It also includes an additional attenuation term to account for various atmospheric attenuation factors [45]. For the Uma LOS scenario, the path loss exponent is 2.0, whereas it is 3.3 for the Uma NLOS scenario. The shadowing is characterized by a log-normal distribution with a standard deviation parameter that models the fluctuations in signal strength. The standard deviation of shadow fading is 5.7 for Uma LOS and 7.8 for Uma NLOS. For further in-depth information, please refer to [45].

6.2.1 SU-MIMO

Fig. 11
figure 11

Rate vs. link distance in SU-MIMO system with \(N_t = 64\), \(N_r = 4\), \(N_s = 4\)

Figure 11a and 11b show the average rate performance against link distance in NYUSIM UMa environment for LOS and NLOS scenarios, respectively. The numbers of transmit and receive antennas and data streams are set to 64, 16, and 4, respectively and the evaluation is at 28 GHz carrier frequency. One can see that the average rate decreases while increasing the link distance which is due to the path loss. In both figures, full-array based hybrid WMMSE algorithm achieves the best performance among all hybrid designs. Subarray based hybrid scheme is close to the full-array counterpart in low cell sizes where in higher radii drops slightly. The rate decrease of fully analog scheme when the link distance increases is low. In LOS scenario, the rate performance is considerably higher than the NLOS case which is due to the severe attenuation in mm-wave frequencies. Moreover, the decreasing slope in LOS case is much lower than the NLOS scenario.

6.2.2 MU-MISO

Fig. 12
figure 12

Sum rate vs. link distance in MU-MISO system with \(N_t = 64\), \(N_u = 4\)

Figure 12a and 12b present the average sum rate performance versus link distance in NYUSIM UMa environment for LOS and NLOS scenarios, respectively. The numbers of transmit antennas and users are set to 64 and 4, respectively and the evaluation is at 28 GHz carrier frequency. It can be seen that the average sum rate has a decreasing behavior while the link distance increases due to different attenuation. In both figures, the sum rate performance of full-array based hybrid WMMSE algorithm is the best among all hybrid schemes. The performance of fully digital ZF solution is slightly better than full-array hybrid algorithm in lower cell sizes of LOS case and slightly lower than it all over all cell sizes of NLOS scenario. Subarray based ZF algorithm performs inferior to all other designs. In LOS scenario, the rate performance is considerably higher than the NLOS case which is due to the severe attenuation in mm-wave frequencies. In NLOS scenario, the decrease in sum rate is considerably higher than the LOS case.

7 Conclusion

In this paper, HBF with partially connected RF architecture was studied for mm-wave massive MIMO systems. We proposed several optimization-based HBF algorithms for SU-MIMO and MU-MISO settings. Both full array- and subarray-based processing approaches of partially connected HBF were considered. We first formulated rate maximization problems for single- and multi-user systems as weighted minimum mean square error (WMMSE) and then derived hybrid beamformers as solutions using alternating optimization. Moreover, we proposed sub-array-based zero-forcing algorithms with lower complexities. A simple geometric ULA and a practical NYUSIM channel models were used to compare the rate performance of the hybrid algorithms. Simulation results showed that partially connected HBF can provide a good balance between hardware complexity and performance compared to fully digital and analog beamforming. The subarray-based hybrid WMMSE algorithm achieves comparable performance to that of the full array-based WMMSE in a SU-MIMO scenario, while being inferior in a MU-MISO setting. The hybrid WMMSE results can potentially serve as performance upper bounds for lower complexity algorithms. Future research in the realm of HBF systems should explore channel estimation algorithms tailored to the complexities of frequency-selective and doubly-selective channels, especially in sparse millimeter-wave and sub-terahertz environments. These channels introduce frequency-dependent fading and time-varying characteristics, necessitating adaptive HBF strategies. To address these challenges effectively, innovative approaches to channel estimation and tracking are essential. By understanding the sparse nature of these channels, researchers can develop efficient algorithms that reduce estimation overhead, optimize HBF strategies, and enhance system performance.