WO2024083319A1

WO2024083319A1 - Beam selection

Info

Publication number: WO2024083319A1
Application number: PCT/EP2022/079023
Authority: WO
Inventors: Saeed Reza KHOSRAVIRAD; Tachporn SANGUANPUAK; Jakub SAPIS
Original assignee: Nokia Solutions And Networks Oy
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2024-04-25

Abstract

An apparatus, comprising: at least one processor; and at least one memory storing instructions that when executed by the at least one processor cause the apparatus at least to perform: creating a subset of beams from beams available to a base station communicating with a user equipment; transmitting reference signals for the subset of beams; receiving feedback from the user equipment related to the reference signals for the subset of beams received by the user equipment; and selecting, based on the feedback, a beam from the beams available to the base station determined to be suitable for communication with the user equipment.

Description

BEAM SELECTION

TECHNOLOGICAL FIELD

Various example embodiments relate to an apparatus and method for beam selection.

BACKGROUND

In a wireless telecommunications network, it is possible to select one or more beams for communication between a base station and a user equipment. Although various techniques exist for beam selection, they each have their own shortcomings. Accordingly, it is desired to provide an improved technique for beam selection.

BRIEF SUMMARY

The scope of protection sought for various example embodiments of the invention is set out by the independent claims. The example embodiments and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the invention.

According to various, but not necessarily all, example embodiments of the invention there is provided an apparatus, comprising: at least one processor; and at least one memory storing instructions that when executed by the at least one processor cause the apparatus at least to perform: creating a subset of beams from beams available to a base station communicating with a user equipment; transmitting reference signals for the subset of beams; receiving feedback from the user equipment related to the reference signals for the subset of beams received by the user equipment; and selecting, based on the feedback, a beam from the beams available to the base station determined to be suitable for communication with the user equipment.

The creating may comprise creating, as the subset of beams, a specified maximum number of beams from a total number of the beams available to the base station.

The specified maximum number of beams may be less than the total number of the beams available to the base station.

The subset of beams may include a beam used or recently used by the user equipment. The subset of beams may include beams proportionally selected from each panel grid of beams and/or each Reflective Intelligent Surface grid of beams.

The subset of beams may include beams biased to be selected from a panel grid of beams and/or Reflective Intelligent Surface grid based on location feedback from the user equipment.

The subset of beams may include beams randomly selected from each panel grid of beams and/or each Reflective Intelligent Surface grid of beams.

The subset of beams may include a beam least recently used from each panel grid of beams and/or each Reflective Intelligent Surface grid of beams.

The subset of beams may include every nth beam from each panel grid of beams and/or each Reflective Intelligent Surface grid of beams.

The subset of beams may include beams from a grid of beams having a wider coverage area than the beam.

The subset of beams may be determined using a Deep Reinforcement Learning model.

The subset of beams may be determined using neural network.

The subset of beams may be determined using a Long Short-Term Memory Recurrent Neural Network.

The subset of beams may be determined using an actor-critic methodology.

The actor-critic methodology may be based on a proximal policy optimisation.

The model may use Reference Signal Received Power of user equipment, Reference Signal Received Power of scheduled user equipment, an indication of a current or latest beam being used for communication with user equipment; location of user equipment, a quality indication of beams previously used for communication with user equipment and/or a channel estimate indicating a likelihood of a presence of line-of-sight with user equipment.

The model may include beams having greater than a predicted Reference Signal Received Power threshold in the subset of beams.

The feedback may include received power from user equipment, an indication of at least one preferred beam from user equipment and/or user equipment location.

The feedback may include received power from user equipment for all beams, for a best beam and/or for beams exceeding a received power threshold.

The selecting may comprise selecting the beam from all the beams available to the base station.

The selecting may comprise selecting the beam from other than the subset of beams.

The selecting may comprise selecting the beam from the beams available to the base station determined to have a predicted transmission characteristic.

The selecting may comprise selecting the beam from the beams available to the base station determined to have a highest predicted received power by the user equipment.

The selecting may comprise selecting the beam from the beams available to the base station determined to be predicted to have higher than a threshold received power.

The beam may be determined using a Deep Reinforcement Learning model.

The beam may be determined using a neural network.

The beam may be determined using a Long Short-Term Memory Recurrent Neural Network.

The model may use the indication of at least one preferred beam from user equipment, user equipment location and/or the quality indication of beams previously used for communication with user equipment and/or a channel estimate indicating a likelihood of a presence of line-of-sight with user equipment.

The model may interpolate a coverage space provided by the subset of beams to determine the beam.

The model may utilise reinforcement learning having a reward based on a relationship between a predicted and actual Reference Signal Received Power for the beam.

The likelihood of the presence of line-of-sight with user equipment may be determined from a channel response and relatively locations of the user equipment and the base station.

The likelihood of the presence of line-of-sight with user equipment may be determined based on a ray tracing simulation.

The likelihood of the presence of line-of-sight with user equipment may be determined using a neural network.

The at least one memory may store instructions that when executed by the at least one processor cause the apparatus at least to perform: using the beam for communication with the user equipment.

The apparatus may comprise a base station.

According to various, but not necessarily all, example embodiments of the invention there is provided a method, comprising: creating a subset of beams from beams available to a base station communicating with a user equipment; transmitting reference signals for the subset of beams; receiving feedback from the user equipment related to the reference signals for the subset of beams received by the user equipment; and selecting, based on the feedback, a beam from the beams available to the base station determined to be suitable for communication with the user equipment.

The creating may comprise creating, as the subset of beams, a specified maximum number of beams from a total number of the beams available to the base station. The specified maximum number of beams may be less than the total number of the beams available to the base station.

The subset of beams may include a beam used or recently used by the user equipment.

The subset of beams may include beams proportionally selected from each panel grid of beams and/or each Reflective Intelligent Surface grid of beams.

The subset of beams may be determined using neural network.

The subset of beams may be determined using an actor-critic methodology.

The actor-critic methodology may be based on a proximal policy optimisation. The model may use Reference Signal Received Power of user equipment, Reference Signal Received Power of scheduled user equipment, an indication of a current or latest beam being used for communication with user equipment; location of user equipment, a quality indication of beams previously used for communication with user equipment and/or a channel estimate indicating a likelihood of a presence of line-of-sight with user equipment.

The selecting may comprise selected the beam from the beams available to the base station determined to have a highest predicted received power by the user equipment.

The selecting may comprise selected the beam from the beams available to the base station determined to be predicted to have higher than a threshold received power.

The beam may be determined using a Deep Reinforcement Learning model.

The beam may be determined using a neural network. The beam may be determined using a Long Short-Term Memory Recurrent Neural Network.

The method may comprise using the beam for communication with the user equipment.

A non-transitory computer readable medium comprising program instructions stored thereon for performing at least the method and its optional features set out above.

Further particular and preferred aspects are set out in the accompanying independent and dependent claims. Features of the dependent claims may be combined with features of the independent claims as appropriate, and in combinations other than those explicitly set out in the claims. Where an apparatus feature is described as being operable to provide a function, it will be appreciated that this includes an apparatus feature which provides that function or which is adapted or configured to provide that function.

BRIEF DESCRIPTION

Some example embodiments will now be described with reference to the accompanying drawings in which:

FIG. 1 illustrates a synchronization signal block (SSB) block transmission phase by a gNB for each of the beams in a grid of beams (GoB);

FIG. 2 illustrates a flowchart detail the main steps performed by the gNB according to an example embodiment;

FIG. 3 illustrates a beam sampling and ML-based interpolation processes according to an example embodiment;

FIG. 4 illustrates where the sampling process could use beams that are not from the Ntot beamset according to an example embodiment;

FIG. 5 illustrates index assignment between the sampled set and the total beamset according to an example embodiment;

FIG. 6 illustrates Long Short-Term Memory Recurrent Neural Network (LSTM RNN) for down-selecting a subset of the total available beams (interleaving a sampled beam set) according to an example embodiment;

FIG. 7 illustrates Deep reinforcement learning-based LSTM RNN for down-selecting a subset of the total available beams (interleaving a sampled beam set) according to an example embodiment;

FIG. 8 illustrates offline training for predicting likelihood of LoS/NloS UEs; and FIG. 9 illustrates training (left) and inference (right) flowcharts for the proposed ML entity in FIG. 7.

DETAILED DESCRIPTION

Before discussing the example embodiments in any more detail, first an overview will be provided. Some example embodiments provide an approach to selecting a beam for use for communication between a base station and user equipment. From all the beams that can be configured for communication between the base station and user equipment, a reduced number, subset or subgroup is selected. Typically, that reduced number of beams does not exceed that number which may be used in a measurement process using measurement or reference signals transmitted from the base station to the user equipment using the reduced number of beams. Feedback is received from the user equipment in relation to the signals received by the user equipment using the reduced number of beams. That feedback is used to select a beam that is suitable for communication with the user equipment. The beam is selected from all the beams that can be configured for communication between the base station and user equipment and not just from the reduced number of beams used when transmitting the measurement or reference signals. Typically, unlike in existing approaches, the beam that is selected differs from any of the reduced number of beams used when transmitting the measurement or reference signals. Interpolation of the beam space of the reduced number of beams, typically using a machine-learning model, enables a beam that is most suitable for communication with the user equipment to be selected. In other words, the beam that is selected is typically other than a beam from the reduced number, subset or subgroup of beams. Hence, the beam is typically not one of the reduced number, subset or subgroup of beams. The beam that is selected is typically that beam suited to communication between the base station and user equipment. The suitability can be established in a variety of ways. For example, that beam with the highest predicted Reference Signal Received Power (RSRP), any beam predicted to have higher than a threshold RSRP or any other predicted transmission characteristic can be used to determine a suitable beam.

Some example embodiments invention relate to beam management in wireless access, especially in New Radio Frequency Range 2 (NR FR2 (mmWave)). Typically, a mmWave phased array antenna is equipped with a grid of beam (GoB) which is a set of predefined beam patterns. A beam management procedure is used in 5G NR in order to acquire and maintain a set of beams at the base station (gNB) GoB and/or user equipment (UE) GoB, which can be used for downlink (DL) and uplink (UL) transmission/reception. This is important technology for mmWave, deployed in multi- Transmission/ Reception Point (TRP) operation, as well as single panel access points.

In the DL, synchronization signal blocks (SSB) are transmitted by the gNB at regular intervals (e.g., at periodicities of every 5/10/20/40/80/160 ms). Multiple SSBs are carried successively in an SSB burst. SSBs carry a Primary Synchronization Signal (PSS), Secondary Synchronization Signal (SSS) and a Physical Broadcast Channel (PBCH) with a Demodulation Reference Signal (DMRS). Therefore, it spans over 4 Orthogonal Frequency Division Multiplexing (OFDM) symbols. The maximum number of SSBs is frequency dependant where for mmWave frequencies (e.g., 24-39 GHz) the number is 64. After the SSB burst is transmitted, the UE measures the Reference Signal Received Power (RSRP) from those beams and identifies the best beam for its link with the gNB, and reports that beam index to the gNB.

The beam management process described above is specified by 3GPP for users in idle mode as well as in connected mode and upon a beam failure. The overhead corresponding to the SSB burst increases with the number of beams in the GoB. As a result, 3GPP has specified a maximum 64 size for the SSB burst.

Typically, gNBs may be equipped with a large number of beams in their GoB, e.g., 32 beams multiplied by the number of phased array panels. For example, a gNB with 3 TRPs could have more than 90 beams in their GoB. The number can normally be much larger than that especially if narrow beams are favourable. For example, some gNBs could potentially have beams as narrow as 7 degrees width which could increase the GoB size to over 100 beams. Moreover, for covering the blind spots in a gNB coverage area, reflective intelligent surfaces (RISs) with beam-based operation are deemed as a low- complexity and feasible solution. Integrating one or more RIS with GoB into a gNB could further increase the aggregated size of the beam set for gNB. By adding one or multiple RIS to a gNB coverage area, where each RIS has a certain number of beams in its GoB, the total number of SSBs required for the gNB also increases. For example., a gNB with 3 panels and 3 RIS (each e.g., operating 30 beams) can have 180 beams to cover in the SSB transmission block. Particularly, for the RIS technology it is desirable to have many more of narrow beams in the GoB since those panels are passive and only reflecting the existing radio waves in the air. Therefore, using wide beams with RIS could dramatically decrease their gain.

However, operating with a large number of beams increases overhead and at the moment is not allowed by the specifications (for example, it is not allowed to have more than 64). To alleviate this issue, the set of beams needs to be reduced; some potential solutions exist for this but each could raise issues: Widen the beams and have smaller number of them - this reduces beam gain and overall spectral efficiency; Use a hierarchical beam selection; i.e., first transmitting SSB burst for a set of wide beams. Choosing the best wide beam for a user and then transmitting SSB burst for narrow beams inside the wide beam to choose a narrow beam for the user - this increases overhead and complexity of beam tracking; Down-select to a subset of the total available beams - this reduces coverage area of the gNB and may leave coverage holes. Moreover, depending on the traffic volume in the network and the location of the served users at each time, the usable set of beams at each time could be very different which means an SSB burst for all the beams may not even be necessary.

Some example embodiments provide a sampling and interleaving solution to this problem. In particular, the approach is to sample the beam set; operate SSB burst on the sampled beam set and train a Machine Learning Model (ML) to infer the best beam for each user out of the larger set of beams only given the sampled observation. Two methods are used which are (i) long short-term memory recurrent neural networks LSTM RNN predictive training and (ii) deep reinforcement learning for choosing the proper subset of the beams for gNB and then allocate the beam to each UE from the selected subset of beams. Moreover, offline training is used for predicting the likelihood of Non-Line-of- Sight/Line-of-Sight (NLoS/LoS) users (UEs) and then the obtained likelihood of NLoS/LoS users is fed as one of the inputs for both LSTM RNN and deep reinforcement learning implementations.

Although some techniques already exist, none of those: considers offline training for predicting likelihood of LoS/nLoS users and use the likelihood of LoS/nLoS information as one of the inputs of LSTM RNN method and for online training (deep reinforcement learning); uses LSTM-RNN for interleaving a sampled beam set; uses LSTM-RNN with PRO based distributed deep reinforcement learning for interleaving a sampled beam set; or implements neural network for down-selecting the set of beams.

General Operation

In overview, the main steps taken in some example embodiments are as illustrated in FIGs. 2 and 3. At step S10, in each SSB transmission period, the gNB selects Nr out of the Ntot beams (this is referred to as a ‘sampling process’) and at step S20 the SSB indexes for the selected Nr beams are stored in memory. Nr is chosen to satisfy the specifications limit and guarantee a good sampling of the coverage area. In an embodiment, the gNB can consider the future scheduling plans in selecting the Nr beams; e.g., selecting beams that are being used or latest used by the users on the queue for scheduling. The sampling process could be random or based on any sampling algorithm (e.g., every other beam, every third beam, a preconfigured sampling pattern, etc.). At step S30, instead of SSB transmission for the Ntot beams, the gNB transmits SSB for the Nr selected beams.

At step S40, the gNB gathers the following: beam selection feedback from users (including beam RSRP), information about the position of the users, and beam indices. The RSRP could be per beam-UE, or best beam for each UE (in our case, we assume that the gNB allocates one beam for each UE). At step S50, each RSRP is allocated using the original beam indexing for the beams.

At step S50, the gNB uses a trained ML (e.g., LSTM RNN) where its input includes:

• The reference signal received power (RSRP) of each UE;

• The beam selection feedback (e.g., including RSRP) from user;

• The beam index identifier: logical vector of size Ntot (0/1 depending on beam selection in the Nr beam set or not - in the previous sequence);

• The location information of the user;

• The sequence of previous Channel Quality Indicator (CQI) for the UE (previous sequence of beams used for the user with operation quality value corresponding to each beam in the sequence, where the operation quality value is a metric of how good the selected beams has been for the user);

• In one embodiment, the estimated channel of the user (e.g., from UL channel estimation) and the UE location information is used to detect presence of a dominant path and predict likelihood of LoS/NLoS. The likelihood value is fed also to the trained ML

At step S60, the output of the trained ML includes:

• A likelihood value for all the Ntot beams for the user, representing how useful the beam is going to be for the user (or, alternatively the RSRP predicted for the beams);

• The best beam for a user is then selected based on the output of the ML above; the beam index from Ntot beams with highest likelihood value is selected for the user.

This approach of sampling and interpolation of the beam space provides the analog beamforming with many benefits: Thanks to the limited number of beams in the sampled subset, the operation will be compliant with beam management specifications and the maximum number of allowed SSBs in a burst; The operation doesn’t impose any limitation on the total number of beams Ntot. Technically, Ntot can be arbitrarily large as long as Ntot can be sampled with sufficient precision using 64 (or other specified number) beams. This relaxes the limitation on the number of beams. For example, some beam sets usually have around 32 beams for 90-120 degree of azimuth and around 30 degree of elevation coverage range. The beams are wider than what the phased array can generate, usually so that the coverage area is covered fully by those 32 beams. With the limitation on the total number being relaxed, instead it is possible to generate a large number of very narrow beams which will increase the beamforming gain and efficiency of the gNB. Essentially, the gNB can have an arbitrary number of beams, a mix of narrow and wide, where those beams could have overlapping coverage too, but at each instance, this approach chooses the best out of all the beams for each user; It is important to decouple the beam set of the SSB operation from the data transmission beam set. Data transmission benefits from having a large number of very narrow beams, while SSB transmission requires a small number of beams to curb the overhead level. This approach enables such decoupling. It further enables the network to sample the beam set differently at each time, e.g., depending on the location of the users connected to the network; With the envisioned technologies such as multi-TRP, multiple RRHs and RIS in the near-term to mid-term future, adding one new RRH or RIS to the gNB in essence adds more beams to gNB’s GoB. This approach enables the beam based operation to extend by adding more RRHs and RISs to the gNB, without increasing the SSB burst overhead.

The steps of the approach set out above will now be considered in more detail.

Sampling of the GoB

The sampling process at step S10 considers the following points:

• Maximum allowed burst size in SSB burst.

• Proper sampling of the area (e.g., the Nr beams should be proportionately selected from each panel GoB and each RIS GoB) - optional.

• Future scheduling plans (e.g., gNB can increase the number of selected beams from one of the panels, knowing that the same set of beams will be used for communication with users in certain area (based on their location information) - optional.

• Following a pre-configured sampling algorithm or randomly selecting the Nr beams from the Ntot aggregate GoB

It is helpful to dynamically or semi-statically change the choice of Nr beams over time. Changing the selected subset of Nr beams has the following benefits: • It can improve the performance of the ML entity, especially in case of an online learning such as reinforcement learning

• It omits the chance of creating SSB coverage holes for a long period of time; particularly, if Ntot » Nr, the sampling may leave coverage holes for SSB. However, by varying the set of Nr beams gNB takes advantage of the fact that variations in the environment happen at slower rate than SSB burst periods, therefore, the coverage holes in one burst will be covered by the following SSB burst.

In one example embodiment, some or all of the Nr beams may be chosen from among beams outside of the Ntot beam set (the Nr selected beams still need beams that cover points in the same coverage area). For example, the Nr sampling beams are wide beams while the Ntot beams are all narrow beams. This is illustrated in Fig. 4.

In one example embodiment, the sampling process can be implemented using deep reinforcement learning (DRL). This example embodiment can be performed as follows:

• The gNB can use online training (deep reinforcement learning) to down-select a Nr set of beams dynamically. In the deep reinforcement learning method of an example embodiment, an LSTM RNN for neural network inside DRL is implemented so that the network will have memory for DRL implementation.

• Actor-Critic method, which can be implemented based on proximal policy optimization (PRO) can be used for DRL implementation, not just typical deep Q- learning/double deep Q-learning as in existing approaches. Therefore, LSTM RNN is implemented to help the system to have memory for DRL computation.

For DRL interleaving beam sampling, the inputs include:

• Reference signal received power (RSRP) of each UE: the gNB can use RSRP information of all UEs, alternatively it can limit itself to UEs scheduled in upcoming time slots.

• Beam index identifier: logical vector of size Ntot (0/1 depending on beam selection in the Nr beam set or not - in the previous sequence).

• Location information of the UEs.

• Sequence of previous CQI for the UE: previous sequence of beams used for the user with operation quality value corresponding to each beam in the sequence, where the operation quality value is a metric of how good the selected beams has been for the user (examples are set out in more detail below). • In one example embodiment, the estimated channel of the user (e.g., from Uplink (UL) channel estimation) is used to detect presence of a dominant path and predict likelihood of LoS/NLoS. The likelihood value is fed also to the trained ML. Such information is useful for the ML to give higher likelihood to TRP panel beams for LoS UEs and higher likelihood to RIS and repeater beams for NLoS UEs.

The output includes:

• a likelihood value vector of size Ntot, representing how useful the beam is in the sampling process (or, alternatively, the predicted RSRP for the beams).

SSB burst operation

Instead of SSB transmission for the Ntot beams, at steps S20 and S30, the gNB transmits SSB for the Nr selected beams. Note that there are two sets of indexes for beams:

1 . SSB index which is from 1 to Nr

2. Total gNB GoB beam index which is from 1 to Ntot (with the possibility of varying Ntot over time, e.g., when new beams are added or removed to the GoB).

During the SSB burst operation, the selected Nr beams are going to be used for the burst transmission. The true indexes of those beams (i.e., from the index set #2) and the mapping of those to the SSB indexes (i.e., index set #1) are stored for the time index of the burst transmission. An example embodiment of how such mapping between beam indexes could be stored over different time instances is shown in Fig. 5. The stored mapping is then used by the gNB to find the correct beam index from the total GoB, when the feedback from the users has arrived.

The gNB then collects, at step S40, the beam selection feedback from users and, at step S50, reverts the mapping as described above. Other feedback including RSRP, the position of the users, are collected too, if available. In general, the RSRP feedback from the UE could be per beam, or the UE may send only the RSRP of its best beam. Although both may be used, the former could provide a better interpolation outcome.

Best beam selection using predictive ML

After collecting the feedback from the UE based on the SSB burst over sampled beamset, gNB uses, at step S60, a trained ML (e.g., Long-short term memory (LSTM) recurrent neural network as illustrated in FIG. 6) to derive, at step S70, the best beam index out of the total GoB for the UE. The input to inference process includes:

• the collected beam selection feedback from user

• the location information of the user

• previous sequence of beams used for the user with operation quality value corresponding to each beam in the sequence (operation quality value is a metric of how good the selected beams has been for the user)

The output includes:

• a likelihood value vector of size Ntot for the user, representing how useful each beam is going to be for the user equipment (or the RSRP of the beam).

Using the likelihood value that is computed for the Ntot beams, gNB picks the highest likelihood value and uses that for communications with the UE. Alternatively, if the ML is trained to predict RSRP for each beam, the best beam is chosen out of the Ntot, based on which one is predicted to have the highest RSRP for the user equipment. The inference process is typically done for each UE that is going through the beam selection or beam tracking.

Training of the RNN is typically done offline based on collected data from the operator of the same network. Alternatively, the coverage area can be modelled as digital twin, where ray-tracing and simulated network operation can be used to generate training data in large datasets.

In an example embodiment, the NN is trained in this model to properly ‘interpolate’ the beam space for each user equipment based on the limited sampled input (Nr sample points). Therefore, an offline training module as illustrated in FIG. 6 is trained for a specific coverage area and its dynamics. To use the NN for a different coverage area, it needs to be trained again. However, the NN training isn’t necessarily dependant on the GoB. In fact, one example embodiment of this NN can operate independently from the GoB. At the input, the beam codebooks are fed in instead of the beam indexes. The beam codebook could be fed in in form of a weight matrix (phase shifter codebook), or the gain pattern of the beam over the desired coverage area. In such case, at the output, instead of likelihood values for beams, the NN can be trained to feed out the optimal beam codebook (or gain pattern) for each user equipment. Best beam selection using reinforcement learning

The ML entity described above could also be designed as a reinforcement learning (RL) entity. In such case, the training and inference happens online. The pretrained model can be used, however the deep RL network can be trained during the deployment for the specific network coverage area that it is deployed in.

The observation space is then as defined in FIG. 7. The action space may include likelihood value for the Ntot beams in GoB or a predicted RSRP value for each beam in the GoB. The reward is designed in a way to minimize convergence time and improve prediction accuracy. Therefore, a reward function is based on the selected beam being above the expected RSRP threshold. Therefore, the actual RSRP of the beam (measured during the communications process between the UE and the gNB) can be compared against the predicted RSRP and the delta is used in a cost function as follows: Reward: funct/on(predicted RSRP for a beam - actual RSRP of the beam).

Predicting likelihood of LoS/NLoS

In one example embodiment, the gNB creates a likelihood value for the channel to the UE having a dominant LoS element. The Estimated channel in the time domain and the frequency domain (channel impulse response and channel frequency response) can be used for this purpose. Moreover, in case of beamforming, the impact of the beamforming is deconvolved from the estimated channel first. Then, the gNB uses a simple neural network (e.g., 2 hidden layers) and uses the estimated channel from uplink pilot transmission to create a likelihood value for LoS/NLoS. Alternatively, the output can be set to predict the likelihood of having a dominant path (equivalent to having strong LoS). Such network can be readily trained using ray-tracing based simulations; the environments of the ray-tracing simulations don’t need to be similar or the same as the deployment environment - the reason is that the likelihood of LoS is directly related to the ratio of received power in the initial taps of the sampled channel. It is helpful to train such network using training data from multitude of environment to increase robustness of the training to variations in the environment.

FIG. 8 illustrates the steps of training and of inference of the ML entity shown in FIG. 7. Regarding training, at step S80, the isotropic channel between the UE and the gNB is simulated or measured.

At step S90, the channel impulse response/and or the channel frequency response is extracted.

At step S100, the neural network is trained using the channel impulse response/and or the channel frequency response, with the ground truth as an output.

Regarding inference, at step S110, the channel impulse response/and or the channel frequency response is estimated.

At step S120, beamforming is deconvolved.

At step S130, the resulting isotropic channel impulse response/and or the channel frequency response is fed to the neural network, as is the relative distance or the path loss if available.

At step S140, the likelihood value representing the chance of the channel having a dominant line-of-sight element is extracted.

A person of skill in the art would readily recognize that steps of various above-described methods can be performed by programmed computers. Herein, some embodiments are also intended to cover program storage devices, e.g., digital data storage media, which are machine or computer readable and encode machine-executable or computerexecutable programs of instructions, wherein said instructions perform some or all of the steps of said above-described methods. The program storage devices may be, e.g., digital memories, magnetic storage media such as a magnetic disks and magnetic tapes, hard drives, or optically readable digital data storage media. The embodiments are also intended to cover computers programmed to perform said steps of the above-described methods. The tern non-transitory as used herein, is a limitation of the medium itself (i.e., tangible, not a signal) as opposed to a limitation on data storage persistency (e.g. RAM vs ROM). As used in this application, the term “circuitry” may refer to one or more or all of the following:

(a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry) and

(b) combinations of hardware circuits and software, such as (as applicable):

(i) a combination of analog and/or digital hardware circuit(s) with software/fi rmware and

(ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and

(c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g., firmware) for operation, but the software may not be present when it is not needed for operation.

This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.

Although example embodiments of the present invention have been described in the preceding paragraphs with reference to various examples, it should be appreciated that modifications to the examples given can be made without departing from the scope of the invention as claimed.

Features described in the preceding description may be used in combinations other than the combinations explicitly described.

Although functions have been described with reference to certain features, those functions may be performable by other features whether described or not. Although features have been described with reference to certain embodiments, those features may also be present in other embodiments whether described or not.

Whilst endeavouring in the foregoing specification to draw attention to those features of the invention believed to be of particular importance it should be understood that the Applicant claims protection in respect of any patentable feature or combination of features hereinbefore referred to and/or shown in the drawings whether or not particular emphasis has been placed thereon.

Claims

1 . An apparatus, comprising: at least one processor; and at least one memory storing instructions that when executed by the at least one processor cause the apparatus at least to perform: creating a subset of beams from beams available to a base station communicating with a user equipment; transmitting reference signals for said subset of beams; receiving feedback from said user equipment related to said reference signals for said subset of beams received by said user equipment; and selecting, based on said feedback, a beam from said beams available to said base station determined to be suitable for communication with said user equipment.

2. The apparatus of claim 1 , wherein said subset of beams includes: a beam used or recently used by said user equipment; beams proportionally selected from each panel grid of beams and/or each Reflective Intelligent Surface grid of beams; beams biased to be selected from a panel grid of beams and/or Reflective Intelligent Surface grid based on location feedback from said user equipment; a beam least recently used from each panel grid of beams and/or each Reflective Intelligent Surface grid of beams; and/or beams from a grid of beams having a wider coverage area than said beam.

3. The apparatus of claim 1 or 2, wherein said subset of beams is determined using a Deep Reinforcement Learning model, a neural network or a Long Short-Term Memory Recurrent Neural Network.

4. The apparatus of any preceding claim, wherein said subset of beams is determined using an actor-critic methodology.

5. The apparatus of claim 4, wherein said actor-critic methodology is based on a proximal policy optimisation.

6. The apparatus of any one of claims 3 to 5, wherein said model uses Reference Signal Received Power of user equipment, Reference Signal Received Power of scheduled user equipment, an indication of a current or latest beam being used for communication with user equipment; location of user equipment, a quality indication of beams previously used for communication with user equipment and/or a channel estimate indicating a likelihood of a presence of line-of-sight with user equipment.

7. The apparatus of any one of claims 3 to 6, wherein said model includes beams having greater than a predicted Reference Signal Received Power threshold in said subset of beams.

8. The apparatus of any preceding claim, wherein said feedback includes: received power from user equipment, an indication of at least one preferred beam from user equipment and/or user equipment location.

9. The apparatus of any preceding claim, wherein said feedback includes received power from user equipment for all beams, for a best beam and/or for beams exceeding a received power threshold.

10. The apparatus of any preceding claim, wherein said selecting comprises selecting: said beam from all said beams available to said base station; said beam from other than said subset of beams; said beam from said beams available to said base station determined to have a predicted transmission characteristic; said beam from said beams available to said base station determined to have a highest predicted received power by said user equipment; and/or said beam from said beams available to said base station determined to be predicted to have higher than a threshold received power.

11 . The apparatus of any preceding claim, wherein said beam is determined using a Deep Reinforcement Learning model, a neural network or a Long Short-Term Memory Recurrent Neural Network.

12. The apparatus of claim 11 , wherein said model uses said indication of at least one preferred beam from user equipment, user equipment location and/or said quality indication of beams previously used for communication with user equipment and/or a channel estimate indicating a likelihood of a presence of line-of-sight with user equipment.

13. The apparatus of claim 11 or 12, wherein said model interpolates a coverage space provided by said subset of beams to determine said beam.

14. The apparatus of any one of claims 11 to 13, wherein said model utilises reinforcement learning having a reward based on a relationship between a predicted and actual Reference Signal Received Power for said beam.

15. The apparatus of any one of claims 11 to 14, wherein said likelihood of said presence of line-of-sight with user equipment is determined from a channel response and relatively locations of said user equipment and said base station.

16. The apparatus of any one of claims 11 to 15, wherein said likelihood of said presence of line-of-sight with user equipment is determined using a neural network.

17. The apparatus of any preceding claim, wherein said at least one memory stores instructions that when executed by the at least one processor cause the apparatus at least to perform: using said beam for communication with said user equipment.

18. A method, comprising: creating a subset of beams from beams available to a base station communicating with a user equipment; transmitting reference signals for said subset of beams; receiving feedback from said user equipment related to said reference signals for said subset of beams received by said user equipment; and selecting, based on said feedback, a beam from said beams available to said base station determined to be suitable for communication with said user equipment.

19. A non-transitory computer readable medium comprising program instructions stored thereon for performing at least: creating a subset of beams from beams available to a base station communicating with a user equipment; transmitting reference signals for said subset of beams; receiving feedback from said user equipment related to said reference signals for said subset of beams received by said user equipment; and selecting, based on said feedback, a beam from said beams available to said base station determined to be suitable for communication with said user equipment.